Computer Architecture
The Anatomy of Modern Processors


Cache Coherence

When multiple processors with on-chip caches are placed on a common bus sharing a common memory, then it's necessary to ensure that the caches are kept in a coherent state.
  • PEA reads location x. Copy of x transferred to PEA's cache.
  • PEB also reads location x. Copy of x transferred to PEB's cache too.
  • PEA adds 1 to x. x is in PEA's cache, so there's a cache hit.
  • If PEB reads x again (perhaps after synchronising with PEA), it will also see a cache hit. However it will read a stale value of x.

Cache coherence hardware

This problem is avoided by adding snooping hardware to the system interface. This hardware monitors the bus for transactions which affect locations cached in this processor.

The cache also needs to generate invalidate transactions when it writes to shared locations.
  • When PEA updates x, the cache generates an invalidate transaction.
  • When PEB's snooping hardware sees the invalidate x transaction, it finds a copy of x in its cache and marks it invalid.
  • Now a read x by PEB will cause a cache miss and initiate a databus transaction to read x from main memory.
Note:
  1. The invalidate transaction is an address-only transaction: it simply communicates the address of a cache line which has been invalidated to all the other processors. This is to save data-bus bandwidth: there's a possibility that no other cache holds x now. Even though it had been read by other PE's and held in their caches, it has since been replaced by them.
  2. We are assuming that the cache is operating in write-back mode. So the updated value of x is held in PEA's cache until that cache line is needed for something else, triggering a write-back of x to main memory.
  • When PEA's snooping hardware sees the memory read for x, it detects the modified copy in its own cache, and emits a retry response, causing PEB to suspend the read transaction.
  • PEA now writes (flushes) the modified cache line to main memory.
  • PEB continues its suspended transaction and reads the correct value from main memory.
Note:
  1. Some systems will permit cache-to-cache transfers (with or without simultaneous write to main memory). Although this would seem to be an obvious improvement - allowing PEB to continue faster. The saving turns out to be minimal and so is not universally implemented.

MESI protocols

Processors providing cache coherence commonly implement a MESI protocol - where the letters of the acronym represent the four states that a cache line may be in:
Invalid
This cache line is not valid
Exclusive
This cache has the only copy of the data. The memory is valid.
Shared
More than one cache is holding a copy of this line. The memory copy is valid.
Modified
The line has been modified. The memory copy is invalid.

Retry

Processors trying to access a location which is modified in another cache are forced to retry the bus transaction after the cache holding the modified copy has written it back to memory.

MESI Protocol

Here is the state diagram of the MESI protocol as implemented on the PowerPC chips:

RH = Read Hit
RMS = Read Miss, Shared
RME = Read Miss, Exclusive
WH = Write Hit
WM = Write Miss
SHR = Snoop Hit, Read Operation
SHW = Snoop Hit, Write Operation
Note the number of coherence generated bus transactions: blue transitions are program-generated and unavoidable. Transitions in magenta are initiated by the snooping hardware on this processor in response to transactions issued by other processors. Operations in circles are bus transactions associated with transactions: those on blue transitions are generated by the program on this processor. The operations on magenta transitions are generated in response to hits on the local cache detected by the snooping hardware. The invalidate transaction (shared -> modified on a write hit) and the pushouts on the magenta arcs are additional bus transactions required to maintain cache coherence.

Additional References

Continue on to Dataflow
Back to the Table of Contents
© John Morris, 1998