The Anatomy of Modern Processors
When multiple processors with on-chip caches are placed on
a common bus sharing a common memory,
then it's necessary to ensure that the caches are kept in
a coherent state.
- PEA reads location x.
Copy of x transferred to PEA's cache.
- PEB also reads location x.
Copy of x transferred to PEB's cache too.
- PEA adds 1 to x.
x is in PEA's cache,
so there's a cache hit.
- If PEB reads x again
(perhaps after synchronising with PEA),
it will also see a cache hit.
However it will read a stale
value of x.
Cache coherence hardware
This problem is avoided by adding
snooping hardware to the
This hardware monitors the bus for transactions which
affect locations cached in this processor.
The cache also needs to generate invalidate transactions
when it writes to shared locations.
- When PEA updates x,
the cache generates an invalidate
- When PEB's snooping hardware
sees the invalidate x transaction, it finds a
copy of x in its cache and marks it invalid.
- Now a read x by PEB will cause a
cache miss and initiate a databus transaction to
read x from main memory.
- The invalidate transaction is an address-only
transaction: it simply communicates the address of
a cache line which has been invalidated to all the other
This is to save data-bus bandwidth:
there's a possibility that no other cache holds x now.
Even though it had been read by other PE's and held in
their caches, it has since been replaced by them.
- We are assuming that the cache is operating in
So the updated value of x is held in PEA's
cache until that cache line is needed for something
else, triggering a write-back of x to main memory.
- When PEA's snooping hardware sees the memory read for x,
it detects the modified copy in its own cache,
and emits a retry response, causing PEB to
suspend the read transaction.
- PEA now writes (flushes) the modified cache line
to main memory.
- PEB continues its suspended transaction and
reads the correct value from main memory.
- Some systems will permit cache-to-cache transfers
(with or without simultaneous write to main memory).
Although this would seem to be an obvious improvement -
allowing PEB to continue faster.
The saving turns out to be minimal and so is not
Processors providing cache coherence commonly implement a
MESI protocol - where the letters of the acronym represent
the four states that a cache line may be in:
This cache line is not valid
This cache has the only copy of the data. The memory
More than one cache is holding a copy of this line.
The memory copy is valid.
The line has been modified. The memory copy is invalid.
Processors trying to access a location which is modified in
another cache are forced to retry the bus transaction
after the cache holding the modified copy has written it
back to memory.
Here is the state diagram of the MESI protocol as implemented
on the PowerPC chips:
RH = Read Hit
RMS = Read Miss, Shared
RME = Read Miss, Exclusive
WH = Write Hit
WM = Write Miss
SHR = Snoop Hit, Read Operation
SHW = Snoop Hit, Write Operation
Note the number of coherence generated bus transactions:
are program-generated and unavoidable.
Transitions in magenta are initiated by
the snooping hardware on this processor in response to
transactions issued by other processors.
Operations in circles are bus transactions associated with
those on blue transitions are generated by the program on
on magenta transitions are generated in response to
hits on the local cache detected by the snooping hardware.
The invalidate transaction (shared -> modified on a write hit)
and the pushouts on the magenta arcs
are additional bus transactions required to maintain cache coherence.
© John Morris, 1998