Computer Architecture
The Anatomy of Modern Processors


Performance

Let's assess the performance of our simple processor. Assume that the whole system is driven by a clock at f MHz. This means that each clock cycle takes

t = 1/f microseconds

Thus a processor with a clock running at 100MHz is operating with 10ns clock cycles. Generally, a processor will execute one step every cycle, thus, for a memory load instruction, our simple processor needs:

StepOperationTime
(cycles)
Notes
1PC to bus1
2Memory responsetac
3Decode and register access1
4ALU operation and latch result to MAR1
5Memory responsetac
6Increment PC-Overlap with step 3
Total3 + 2*tac
If the memory response time is, say, 100ns, then our simple processor needs 3x10+2*100 = 230ns to execute a load instruction. For the add instruction, we make a similar table:
StepOperationTime
(cycles)
Notes
1PC to bus1
2Memory responsetac
3Decode and register access1
4ALU operation and latch result
to destination register
1
5Increment PC-Overlap with step 3
Total3 + tac
So an add instruction requires 3x10+100 = 130ns to execute. A store operation will also need more than 200ns, so instructions will require, on average, about 150ns.

Performance Measures

One commonly used performance measure is MIPS or millions of instructions per second. Our simple processor will achieve:
1/(150x10-9) = ~6.6 x 106 instructions per second
= ~6.6 MIPS
As you will know from reading the popular literature, 100MHz is a very common figure for processors in 1998 (leading edge commercial processors have clocks which are more than 5 times faster!) and a MIPS rating of 6.6 is very ordinary. In fact to be competitive, a 100MHz processor should be achieving of the order of 100MIPS - or one instruction for each machine cycle. One of the main aims of this course is to examine how this is achieved.

Bottlenecks

From the simplistic analysis presented above, it will be obvious that access to main memory is a major limiting factor in the performance of a processor. Management of the memory hierarchy to achieve maximum performance is one of the major challenges for a computer architect. Unfortunately, the hardware maxim

smaller is faster

conflicts with programmers' and users' desires for more and more capabilities and more elaborate user interfaces in their programs - resulting in programs that require megabytes of main memory to run! This has led the memory manufacturers to concentrate on density (improving the number of bits stored in a single package) rather than speed. They have been remarkably successful in this: the growth in capacity of the standard DRAM chips which form the bulk of any computer's semiconductor memory has matched the increase in speed of processors. However the increase in DRAM access speeds has been much more modest - even if we consider recent developments in synchronous RAM and FRAM. Another reason for the manufacturer's concentration on density is that a small increase in DRAM access time has a negligible effect on the effective access time which needs to include overheads for bus protocols. (The 100ns figure used above assumes 60ns of DRAM access time and a - very optimistic - allowance of 40ns for bus overhead.)

Cache memories are the most significant device used to reduce memory overheads and they will be examined in some detail later. However, a host of other techniques such as pipelining, pre-fetching, branch prediction, etc are all used to alleviate the impact of memory fetch times on performance.

Continue on to ALUs Back to the Table of Contents
© John Morris, 1998