Computer Science
Extended Reading List
This is an extended list of readings for CS703. Papers on this list are closely related to topics discussed in class, and are the source for many of the lectures. Most are not required reading, unless you missed the lecture. See the required reading list for those papers with material to be covered directly on the test and the final exam.Additional papers will be included periodically.
Warning! Some files are large!
Hennessy & Patterson, Sections 1.1-1.3 (11.2MB) and Section 1.4 (8.8MB).
Hill, Jouppi & Sohi, Readings in Computer Architecture, Chapter 2: Methods
G.M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, AFIPS, pp. 483-485, Apr. 1967.
J.S. Emer & D.W. Clark, A characterization of processor performance in the VAX-11/780, ISCA-11, pp. 301-310, June 1984.
B.J. Smith, Architecture and applications of the HEP multiprocessor computer system, Proc. International Society for Optical Engineering, pp. 241-248, 1982.
Hill, Jouppi & Sohi, Readings in Computer Architecture, Chapter 6: Memory Systems.
M.V. Wilkes, Slave memories and dynamic storage allocation, IEEE Trans. on Electronic Computers, 14(2), pp. 270-271, 1965.
J.S. Liptay, Structural aspects of the System/360 Model 85, part II: the cache, IBM Systems Journal, 7(1), pp. 15-21, 1968.
N.P. Jouppi, Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ISCA-17, pp. 364-373, May 1990.
D. Kroft, Lockup-free instruction fetch/prefetch cache organization, ISCA-8, pp. 81-87, May 1981.
Hill, Jouppi & Sohi, Readings in Computer Architecture, Chapter 9: Multiprocessors.
W.A. Wulf & S.P. Harbison, Reflections in a pool of processors/an experience report on C.mmp/Hydra, AFIPS (Proceedings of the National Computer Conference), pp. 939-951, June 1978.
J.R. Goodman, Using cache memory to reduce processor-memory traffic, ISCA-10, pp. 124-131, June 1983.
P. Sweazey and A.J. Smith, A class of compatible cache consistency protocols and their support by the IEEE Futurebus, Proc. Thirteenth International Symposium on Computer Architecture (ISCA-13), Tokyo, Japan, pp. 414-423, June 1986.
L.M. Censier & P. Feautrier, A new solution to coherence problems in multicache systems, IEEE Transactions on Computers 27(12), pp. 1112-1118, Dec. 1978.
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, & M.S. Lam, The Stanford Dash multiprocessor, IEEE Computer, 25(3), pp. 63-79, 1992.
L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, IEEE Transactions on Computers 28(9), pp. 690-691, 1979.
Mark Hill, Processors should support simple memory-consistency models, IEEE Computer, 31(8), pp. 28-34, August 1998.
M. Herlihy and J.E.B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, Proc. International Symposium on Computer Architecture (ISCA-93), ACM Press, 1993, pp. 289-300.
J.M. Stone et al., Multiple Reservations and the Oklahoma Update, IEEE Parallel & Distributed Technology 1(6), Nov. 1993, pp. 58-71.
R. Rajwar & J.R. Goodman, Transactional execution: toward reliable, high-performance multithreading, IEEE Micro, 23(6), pp. 117-125, November/December 2003.
R. Rajwar & J.R. Goodman, Speculative Lock Elision: enabling highly concurrent multithreaded execution, 34th Annual International Symposium on Microarchitecture (MICRO-34), December 2001, pp. 294-305.
R. Rajwar & J.R. Goodman, Transactional lock-free execution of lock-based programs, Proceedings of the Tenth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-10) pp. 5-17, 2002.
Ravi Rajwar, Speculation-based techniques for lock-free execution of lock-based programs, PhD Dissertation, University of Wisconsin-Madison, 2002.
K E Moore, J Bobba, M J Moravan, M D Hill and D A Wood, LogTM: Log-based Transactional Memory, International Symposium on High Performance Computer Architecture (HPCA), February 2006.
M.J. Flynn, Very high-speed computing systems, IEEE Proceedings, 54(12), pp. 1901-1909, 1966.
R.M. Russell, The CRAY-I computer system, CACM, 21(1), pp. 63-72, 1978.
C.L. Seitz, The Cosmic Cube, Communications of the ACM 28(1), pp. 22-33, Jan. 1985.
W.A. Wulf, Compilers and computer architecture, IEEE Computer, 14(8), pp. 41-47, 1981.
R.P. Colwell, C.Y. Hitchcock III, E.D. Jensen, H.M. Brinkley Sprung, C.P. Kollar, (4.4MB), Instruction sets and beyond: computers, complexity, and controversy, IEEE Computer, 18(9), pp. 8-19, 1986.
S.A. Mahlke, R.E. Hank, J.E. McCormick, D.I. August, & W.W. Hwu, A comparison of full and partial predicated execution support for ILP processors, ISCA-22, pp. 138-150, June 1995.
D.W. Anderson, F.J. Sparacio & R.M. Tomasulo, The IBM System/360 Model 91: machine philosophy and instruction-handling, IBM Journal of Research and Development, pp. 8-24, Jan. 1967.
J.E. Smith & A.R. Pleszkun, Implementing precise interrupts in pipelined processors, ISCA-12, pp. 36-44, June 1985
J.E. Smith, A study of branch prediction strategies, ISCA-8, pp. 135-148, May 1981.
G.F. Grohoski, Machine organization of the IBM RISC System/6000 processor, IBM Journal of Research and Development 34, pp. 37-58, Jan. 1990.
G.S. Sohi & S. Vayjapayam, Instruction issue logic for high-performance interruptable pipelined processors, ISCA-14, pp. 27-31, June 1987.
B.R. Rau & J.A. Fisher, Instruction-level parallel processing: history, overview, and perspective, appeared in Journal of Supercomputing, 7, 1993. Hewlett-Packard Tech Report HPL-92-132 is almost identical (neither references the other) except for a longer bibliography (225 entries!).
D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, & R.L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, ISCA-23, pp. 191-202, May 1996.
T. Kilburn, D.B.G. Edwards, M.J. Lanigan, and F.H. Sumner, One-level storage system, IRE Transactions on Electronic Computers, 11(2), pp. 223-235, 1962. A reprint of this paper was included in the book, Computer Structures: Principles and Examples by D.P. Siewiorek, C.G. Bell & A. Newell, published by McGraw-Hill in 1982. The book is now out of print, but available in html form online on Gordon Bell's website.
K. Li & P. Hudak, Memory coherence in shared virtual memory system, ACM Trans. on Computer Systems, 7(4), pp. 321-359, 1989.
M. Smotherman, Understanding EPIC architectures and implementations, 40th Annual ACM Southeast Conference, Raleigh, April 2002, pp. 71-78.
D.E. Culler & J.P. Singh,
Parallel Computer Architecture
Chapter 5, pp. 269-367 (pdf: 11MB),
Morgan Kaufmann, 1999.
Subsections of chapter 5 are available:
Section 5, 5.1 (1.8MB):
Introduction to Shared Memory Multiprocessing & Cache Coherence
Section 5.2 (1.0MB):
Memory Consistency
Section 5.3 (1.8MB):
Design Space for Snooping Protocols
Section 5.4 (3.2MB):
Protocol Design Trade-offs
Section 5.5 (2.9MB):
Synchronization
Section 5.6, 5.7 (1.4MB):
Implications for Software; Concluding Remarks
T. Y. Feng, “A survey of interconnection networks,” IEEE Computer, vol. 14(12), pp. 12-27, Dec. 1981.
Saad, Y. and Schultz, M. H., “Topological Properties of Hypercubes,” IEEE Transactions on Computers, Volume 37(1988), 867-872.
Jerry Banks, “Introduction to simulation”, in Proceedings of the 2000 Winter Simulation Conference, pages 9-16, 2000.
Arne Thesen & Laurel Travis, “Introduction to simulation”, in Proceedings of the 1990 Winter Simulation Conference, pages 14-21, 1990.
Richard Fujimoto, “Parallel and Distributed Simulation Systems”, in Proceedings of the 2001 Winter Simulation Conference, pages 147-157, 2001.
James E. Smith, "Characterizing computer performance with a single number," Communications of the ACM, Vol. 31, #10 (October 1988), pp 1202-1206.
LaMarca and R.E. Ladner, "The Influence of Caches on the Performance of Sorting," Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January, 1997, pp. 370-379.
R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," IBM Journal of Research and Development, Vol. 11, January 1967, pp. 25-33.
N.P. Jouppi, Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ISCA-17, pp. 364-373, May 1990.
W.-H. Wang, J.-L Baer, & H.M. Levy, "Organization and performance of a two-level virtual-real cache hierarchy," ISCA-16, pp. 140-148, June 1989.
L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, IEEE Transactions on Computers 28(9), pp. 690-691, 1979.
S.V. Adve & K. Gharachorloo, Shared Memory Consistency Models: A Tutorial, IEEE Computer, 29(12), pp. 66-76, Dec 1996.
M. Hill & A.J. Smith, Evaluating Associativity in Caches, IEEE Computer, 29(12), pp. 66-76, Dec 1996.
A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph & M. Snir, The NYU Ultracomputer--designing an MIMD shared memory parallel computer, IEEE Transactions on Computers 32(2), pp. 175-189, Feb. 1983.
-
Related Programmes