> Seq/cst are relatively fast, but at around 20-30 clock cycles Did you mean to ...

gpderetta · on June 15, 2023

> Did you mean to say seq/cst store?

Indeed!

> set the value to X unconditionally and return me the previous value

That would be atomic::exchange that maps to XCHG on x86, which, as all atomic RMW is sequentially consistent.

Incidentally seq-cst stores are also typically lowered to XCHG on x86 as opposed to the more obvious MFENCE+MOV.

There is still room for optimization, as if you can implement your algos with just load/stores and as few strategically placed RMW as you can, it can be a win.

Of course if there is any contention, cache coherence traffic is going to dominate over any atomic cost.

klabb3 · on June 15, 2023

Thanks a ton!

This matches my own micro-benchmarks in golang ish. I see basically either ~4ns for any write op, including store, swap, add, etc. And ~1ns for loads. I assume it’s all seq-cst.