Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Seq/cst are relatively fast, but at around 20-30 clock cycles

Did you mean to say seq/cst store?

Also, what operation is “set the value to X unconditionally and return me the previous value”? Is that possible with a store or something different? (Golang calls this op atomic swap)

In either case, sounds like the room for optimizing for performance with granular memory models on x86 is even narrower than I thought.



> Did you mean to say seq/cst store?

Indeed!

> set the value to X unconditionally and return me the previous value

That would be atomic::exchange that maps to XCHG on x86, which, as all atomic RMW is sequentially consistent.

Incidentally seq-cst stores are also typically lowered to XCHG on x86 as opposed to the more obvious MFENCE+MOV.

There is still room for optimization, as if you can implement your algos with just load/stores and as few strategically placed RMW as you can, it can be a win.

Of course if there is any contention, cache coherence traffic is going to dominate over any atomic cost.


Thanks a ton!

This matches my own micro-benchmarks in golang ish. I see basically either ~4ns for any write op, including store, swap, add, etc. And ~1ns for loads. I assume it’s all seq-cst.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: