# REGIONAL OUT-OF-ORDER WRITES IN TOTAL STORE ORDER

Sawan Singh Alexandra Jimborean Alberto Ros





Department Computer Engineering and Technology University of Murcia

October 6, 2020

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 1/22

### **OVERVIEW**



### • This talk is about a new store buffer (SB) design

 $\rightarrow$  Hide store-miss latency





- This talk is about a new store buffer (SB) design
  - → Hide store-miss latency
- We relax TSO by performing selective stores Out-of-Order (OoO)
  - $\rightarrow \,$  And study it's affect on performance improvemnt and energy consumption





- This talk is about a new store buffer (SB) design
  - → Hide store-miss latency
- We relax TSO by performing selective stores Out-of-Order (OoO)
  - $\rightarrow \,$  And study it's affect on performance improvemnt and energy consumption
- The result is
  - $\rightarrow$  ROOW, relaxed yet as strong as TSO
  - $\rightarrow$  16 entries SB with 5.64% performance improvement compared to TSO with 56 entries SB
  - → Almost 0 hardware overhead (N+1 bits, N : Entries in SB)

물 이 이 물 이 물 물

**OUTLINE** 









Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT



12



## OUTLINE



# 2 ROOW





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 4 / 22





- store A
- store B
- store C
- store D
- store E

•



Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22

< □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □





- store A
- store B
- store C
- store D
- store E

•



Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22





- store A
- store B
- store C
- store D
- store E

•



Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22





- store A
- store B
- store C
- store D
- store E

•



Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22





| TCO |  |
|-----|--|
| 130 |  |

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22

315







Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22

•





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 5 / 22

<ロ><目><目><目><目><目><目><<□><</li>
 <</li>
 <</li>









ELE DQA













3 3 9 9 9 9





1.2





#### No concurrent accesses by other thread/core

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 7 / 22

315

∃ → < ∃</p>





# No concurrent accesses by other thread/core $\downarrow \downarrow$ Access becomes INVISIBLE until the next synchronization operation

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 7 / 22

## **IDENTIFYING STORES**



## No concurrent accesses by other thread/core ↓ Access becomes INVISIBLE until the next synchronization operation ↓ Can perform Out-of-Order

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 7 / 22

## **IDENTIFYING STORES**





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 7 / 22

## **IDENTIFYING STORES**





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 7 / 22



Sequential consistency guarantee only for DRF programs

315

-

#### COMPUTER & PARALLEL ADMITICORE & STATEMENT

# SC-FOR-DRF

- Sequential consistency guarantee only for DRF programs
- SC-for-DRF requires racy accesses to be confine within synchronization operations

### COMPUTER & PARALLEL ADDITIONNE & SYSTEMS

# SC-FOR-DRF

- Sequential consistency guarantee only for DRF programs
- SC-for-DRF requires racy accesses to be confine within synchronization operations
- ⇒If program does not have data races, the compiler will insert all the necessary fences to preserve the SC





















Sawan Singh, Alexandra Jimborean, Alberto Ros

8/22



# $\rightarrow\,$ Stores in DRF region can perform OoO as they are INVISIBLE until the end of the region

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 9 / 22

글 날



# $\rightarrow\,$ Stores in DRF region can perform OoO as they are INVISIBLE until the end of the region

 $\rightarrow\,$  All stores belonging to sync region will follow TSO

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 9 / 22

### POTENTIAL





-

### POTENTIAL





-

### POTENTIAL





Sawan Singh, Alexandra Jimborean, Alberto Ros

October 6, 2020 10 / 22











Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 11 / 22

◆□▶ ◆□▶ ◆ヨ▶ ◆ヨ▶ ヨヨ やくや





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 12 / 22

315





-





-

▲ 문 → ▲ 문 →





315











RoB commit

setDRF 0

Setting sync region

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

-October 6, 2020 13/22

ELE NOR

< E> <





### Reset region flag

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQQ

3 > < 3





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

3 5

B → < B





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQQ

B → < B

Image: Image:





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQA

글 > - - 글 >

Image: Image:





sync region

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

三日 のへで





On miss

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQQ





#### On miss all stores wait till the miss is resolved

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

1

B → < B



RoB commit

setDRF 1

### Setting DRF region

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE NOR

• = • < =</p>





# Set region flag

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQQ





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

3 5

B → < B





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE DQQ

B → < B





### Store copy region flag to their mode bit

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE NOR

글 > - - 글 >





DRF region

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

三日 のへで

▲ 문 → ▲ 문 →





On miss

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE NOR





#### Store A will wait for the miss to be resolved

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

三日 のへの





### Next store will be allowed to perform

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE NOR

글 > - - 글 >





#### Similarly the next store will perform

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

▶ ∢ ≣ October 6, 2020 13/22

--





Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

ELE NOR





All stores except A already performed the memory operation

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 13 / 22

315

글 > - - 글 >



Store to load forwarding

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 14 / 22

-

• = • < =</p>



Store to load forwarding  $\downarrow$  Loads takes the recent value

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 14 / 22



### Store to load forwarding ↓ Loads takes the recent value ↓ Store enter and exit SB only in program order

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 14 / 22









#### **GUARANTEEING SEQUENTIAL SEMANTICS**



Alias stores should always go in-order

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 15 / 22

4 3 5 4 3 5 5

A B + A B +
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A



## **GUARANTEEING SEQUENTIAL SEMANTICS**

Alias stores should always go in-order ↓ Memory operations should update memory in program order



## **GUARANTEEING SEQUENTIAL SEMANTICS**

Alias stores should always go in-order ↓ Memory operations should update memory in program order











Follow TSO







Sawan Singh, Alexandra Jimborean, Alberto Ros

-

b) (4) (3) (4)



# Exclusive access to memory during a DRF region, thanks to SC-for-DRF

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 16 / 22

-

-



#### Exclusive access to memory during a DRF region, thanks to SC-for-DRF U No coherence problems

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 16 / 22



#### Exclusive access to memory during a DRF region, thanks to SC-for-DRF ↓ No coherence problems ↓ Can stay in SB until it gets full

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 16 / 22



#### Exclusive access to memory during a DRF region, thanks to SC-for-DRF ↓ No coherence problems ↓ Can stay in SB until it gets full ↓ Increase store forwarding

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 16 / 22



# Exclusive access to memory during a DRF region, thanks to SC-for-DRF No coherence problems Can stay in SB until it gets full Increase store forwarding AT 0 COST!!

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 16 / 22











Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 17 / 22

◆□▶ ◆□▶ ◆ヨ▶ ◆ヨ▶ ヨヨ やくや

## SIMULATION ENVIRONMENT



#### Sniper + in-house processor model + GEMS

- 8 out-of-order Skylake-like cores
- Load queue: 72 entries
- Store queue + store buffer: 56-16 entries
- Re-order Buffer: 224 entries



#### • Sniper + in-house processor model + GEMS

- 8 out-of-order Skylake-like cores
- Load queue: 72 entries
- Store queue + store buffer: 56-16 entries
- Re-order Buffer: 224 entries
- Parallel benchmarks
  - Splash-3
  - Parsec-3.0



#### **Processor Stalls**





#### **Processor Stalls**



Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 19 / 22



#### **Processor Stalls**



Few applicaitions encounter very few SB stalls in TSO(baseline) thus ROOW is not very effective

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 19 / 22



#### TSO ROOW 44.732.6 84.3 % Loads forwarded from SB 6:0 cholesky\_ lunc barnes\_ dedup\_ fmm\_ ⊐ ocean radix\_ blackscholes. ŧ fluidanimate. oceannc radiosity. raytrace\_ streamcluster. swaptions volrend\_ Average. waternsq watersp

#### Loads forwarded from stores

Sawan Singh, Alexandra Jimborean, Alberto Ros





#### Loads forwarded from stores



<ロ> <@> < E> < E> El= の





#### Loads forwarded from stores



<ロ> <@> < E> < E> El= の





#### Loads forwarded from stores



<sup>1</sup>Alves et al. Filter caching for free: The untapped potential of the store-buffer, ISCA'46 2019

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 19 / 22

-



#### Sensitivity analysis



#### October 6, 2020 19 / 22

ъ

< □ > < 同



#### Sensitivity analysis



#### October 6, 2020 19 / 22

ъ

-



#### Sensitivity analysis



Sawan Singh, Alexandra Jimborean, Alberto Ros

October 6, 2020 19 / 22

1

-

< 17 ▶



#### Sensitivity analysis



1

< E

< A



#### Sensitivity analysis



Sawan Singh, Alexandra Jimborean, Alberto Ros

315



#### Sensitivity analysis



-

< A











Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 20 / 22

◆□▶ ◆□▶ ◆ヨ▶ ◆ヨ▶ ヨヨ やくや



# New SB design for non speculative store-store reordering → using SC-for-DRF

Sawan Singh, Alexandra Jimborean, Alberto Ros

October 6, 2020 21 / 22

ELE NOR

• = • < =</p>

< < >> < <</>



- New SB design for non speculative store-store reordering
   → using SC-for-DRF
- Analysis of
  - $\rightarrow$  which stores can be reordered
  - $\rightarrow\,$  how to reorder stores

-



- New SB design for non speculative store-store reordering
   → using SC-for-DRF
- Analysis of
  - $\rightarrow~$  which stores can be reordered
  - $\rightarrow$  how to reorder stores
- Efficient implementation with just N+1 bits<sup>2</sup>

<sup>2</sup> N : Number of entries in SB

- New SB design for non speculative store-store reordering
   → using SC-for-DRF
- Analysis of
  - $\rightarrow$  which stores can be reordered
  - $\rightarrow$  how to reorder stores
- Efficient implementation with just N+1 bits<sup>2</sup>
- Results:
  - → 56 ENTRIES SB: performance and energy consumption (+8.13%/-6.01%)
  - $\rightarrow 16 \text{ ENTRIES SB: performance and energy consumption} (+5.64\%/-45.16\%)$

<sup>2</sup> N : Number of entries in SB

**B N 4 B N** 

- New SB design for non speculative store-store reordering
   → using SC-for-DRF
- Analysis of
  - $\rightarrow$  which stores can be reordered
  - $\rightarrow$  how to reorder stores
- Efficient implementation with just N+1 bits<sup>2</sup>
- Results:
  - $\rightarrow$  56 ENTRIES SB: performance and energy consumption (+8.13%/-6.01%)
  - $\rightarrow$  16 ENTRIES SB: performance and energy consumption (+5.64%/-45.16%)
- Can we get better performance?
- <sup>2</sup> N : Number of entries in SB

- New SB design for non speculative store-store reordering
   → using SC-for-DRF
- Analysis of
  - $\rightarrow$  which stores can be reordered
  - $\rightarrow$  how to reorder stores
- Efficient implementation with just N+1 bits<sup>2</sup>
- Results:
  - $\rightarrow 56 \text{ ENTRIES SB:}$  performance and energy consumption (+8.13%/-6.01%)
  - $\rightarrow 16 \text{ ENTRIES SB: performance and energy consumption} (+5.64%/-45.16%)$
- Can we get better performance?

 $\rightarrow$  Yes, by using xDRF compiler<sup>3</sup> (+1%)

<sup>2</sup> N : Number of entries in SB

<sup>3</sup> Jimborean et al. Automatic Detection of Extended Data-Race-Free Regions, CGO

2017

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

## REGIONAL OUT-OF-ORDER WRITES IN TOTAL STORE ORDER

Sawan Singh Alexandra Jimborean Alberto Ros

sawan.singh@um.es CAPS, DITEC University of Murcia

Thank you for your attention!



#### ECHO, ERC Consolidator Grant (No 819134)

This presentation and recording belong to the authors. No distribution is allowed without the authors' permission.

Sawan Singh, Alexandra Jimborean, Alberto Ros

29th PACT

October 6, 2020 22 / 22