Are cache coherence protocols only active when explicitly using certain types in your code?

When I read a description of cache coherence protocols it talks about how separate CPU cores keep track of memory address ranges modified, I think, through methods like bus snooping. The end result of this according to the description is that CPU cores 'know' when other cores have written to a memory address which is currently held in its own cache, and therefore it is marked with different flags (ie., dirty, modified etc). The description given whenever I read about this doesn't conform to my understanding of normal programming in most languages. In most languages I know of when you write to a variable in one thread and access it from another thread you have none of the guarantees that are mentioned when describing the cache coherence protocols, (ie. that the different cores are aware of each others cache modifications). That is of course unless you use synchronization primitives and atomic variables, which is an extra step (not the default) that you as a programmer decide to use, and which add extra instructions to the assembled code.

Therefore is it right to say that these cache coherence protocols in our CPUs aren't in effect when using normal variable accesses, but only when using atomics and synchronization primitives such as mutexes? So by default these mechanisms aren't active?

Answer

Normal types can be optimized by the compiler to be kept in (thread-private) CPU registers instead of memory; that's why languages don't give guarantees.

C++ std::atomic with memory_order_relaxed just uses normal asm load and store instructions for .load and .store, so no special instructions unless you use RMWs like .fetch_add, or on many ISAs, orders like acquire/release. On x86 only special instructions for .store(seq_cst).

All cache lines are coherent, that's why acquire/release synchronization with one atomic variable or mutex can provide visibility for a whole array of plain objects. Without a core having to flush its whole private caches, just order its own accesses to its coherent L1d cache.

Answer

Enjoyed this article?