The blog mentions about java volatiles (but it would also apply to C++ atomic) to explicitly mention that volatile has no cache coherency implications on a typical MESI (and variants) machine. The fences required to maintain language level memory model guarantees act at a level above the L1 cache, once the data reaches L1 (i.e. the coherence point), the fences have done their job.
[I'm ignoring remote fences which are a specialized and not yet mainstream feature]
[I'm ignoring remote fences which are a specialized and not yet mainstream feature]