Preface

In The Unknowable I use LISP to compare my work on incompleteness with that of Gödel and Turing, and in The Limits of Mathematics I use LISP to discuss my work on incompleteness in more detail. In this book we'll use LISP to explore my theory of randomness, called algorithmic information theory (AIT).

And when I say ``explore'' I mean it! This book is full of exercises for the reader, ranging from the mathematical equivalent of trivial ``finger warm-ups'' for pianists, to substantial programming projects, to questions I can formulate precisely but don't know how to answer, to questions that I don't even know how to formulate precisely!

I really want you to follow my example and hike off into the wilderness and explore AIT on your own! You can stay on the trails that I've blazed and explore the well-known part of AIT, or you can go off on your own and become a fellow researcher, a colleague of mine! One way or another, the goal of this book is to make you into a participant, not a passive observer of AIT.

In other words, it's too easy to just listen to a recording of AIT, that's not the way to learn music. I'd like you to learn to play it on an instrument yourself, or, better still, to even become a composer!

The common theme of my three Springer-Verlag books is to study H(x), the size in bits of the smallest program for calculating x, and that you cannot really understand an algorithm unless you can see it running on a computer. And in order to program the algorithms in my theory of program size I had to invent my own dialect of LISP.

These three books differ in their emphasis and complement each other, but each hopefully stands alone and can be read independently. The Unknowable discusses the historical context of my work on program-size complexity, The Limits of Mathematics gives a detailed discussion of the metamathematical implications of these ideas, and here I present the technical core of my theory.

What is the technical core of algorithmic information theory? Well, it consists of three basic ideas: First, that programs must be self-delimiting so that information is additive, so that the complexity of a pair is bounded by the sum of the complexity of its parts:

H(x, y) ≤ H(x) + H(y) + c.

The second basic idea is that one must look not just at the size H(x) of the smallest program that calculates x, but also at the probability P(x) that a program picked at random calculates x. In other words, to understand the basic properties of the program-size complexity H(x), you need to study the algorithmic probability P(x), which is the probability that a program generated by coin tossing calculates x. This probability is well defined precisely because I stipulate that programs must be self-delimiting. And it turns out that algorithmic information theory is really a constructive version of measure (probability) theory, and that the size of the smallest program for calculating x is essentially the −log2 of its algorithmic probability, the probability of calculating x by chance:

H(x) = −log2 P(x) + O(1).

And there is yet another, a third key idea at the technical heart of my theory, which is that the relative complexity H(y | x) of y given x must be defined as the size of the smallest program to calculate y if one is given a minimum-size program for x. One is not given x directly; that is the wrong definition. With this more subtle definition of relative complexity we get the following very basic decomposition theorem:

H(x, y) = H(x) + H(y | x) + O(1),

which states that combining a program for x with a program for getting y from x gives you the best possible program for the pair, to within a fixed number of bits.

These three core ideas are clearly presented for the first time in my 1975 ACM Journal paper ``A theory of program size formally identical to information theory'' (Vol. 22, pp. 329-340), [These three key ideas were not in the original version of algorithmic information theory independently promulgated a decade before by A.N. Kolmogorov and by me. Kolmogorov never realized that the version of algorithmic information theory that we originally proposed was mistaken, nor did he sense as I did that the principal application of this theory was to be to metamathematics, not to probability theory.] which is shown running on the computer in Part II of this book.

And in Chapter 7 of my 1987 Cambridge University Press monograph Algorithmic Information Theory I show that in consequence four very natural definitions of the incompressibility or algorithmic randomness of infinite binary sequences (two statistical definitions, due, respectively, to P. Martin-Löf and to R.M. Solovay, and my own definitions based on program size) turn out to be equivalent. This work is also presented here, in Part III, where you can see the equivalence proofs running on examples on the computer.

The fundamental new contribution of this book is that I have now finished transforming AIT into a theory about the size of real computer programs, programs that you can actually run.

In order to do this, as I said, I had to invent a dialect of LISP that was suitable. The first version of the interpreter for my LISP was written in Mathematica. Then I rewrote it in C. And for this book the interpreter has been rewritten as a Java applet and will run directly in your web browser.

This book and its two predecessors, The Unknowable and The Limits of Mathematics, would never have seen the light if it were not for the encouragement and support of my management chain at the IBM T.J. Watson Research Center: Paul Horn, Ambuj Goyal, Jim Russell, and Mark Wegman. Thank you very much!

It was certainly very exciting for me to finish programming all the key algorithms in my theory and to see them running on examples. I was amazed that it only took me a month to write all the programs in this book! Yes, the best way to understand an algorithm is to program it yourself. Nevertheless I hope that the very explicit, constructive proofs in this book will also be of value to others.

And this book celebrates the fact that, to my surprise, it wasn't much harder to program the technical core of my theory than it was to program the easier stuff in my two previous Springer-Verlag books.

In order to set the stage I start off with the transcript of my 2 March 2000 Carnegie Mellon University School of Computer Science Distinguished Lecture ``A century of controversy over the foundations of mathematics.'' [A different talk transcript with the same title may also be of interest. It was published in C. Calude and G. Paun, Finite versus Infinite, Springer-Verlag London, 2000, pp. 75-100. This is a talk that I gave 30 April 1999 at UMass-Lowell.] My thanks to Manuel Blum for inviting me to give this talk!

Next there is a chapter explaining my version of LISP. Then I explain how my self-delimiting universal Turing machine works, and there are several chapters on program size followed by several chapters on randomness. I end with a chapter on problems for future work, and make a few final points in a letter to the reader.

A big ``thank you'' to Isaac Nativ for arranging for me to visit the University of Melbourne in Australia in 1999, and to Verónica Becher for inviting me to give this book as a course at the University of Buenos Aires in Argentina in 2001; these invitations were an important stimulus.

I'm also very grateful to Cristian Calude for publishing these books in his DMTCS series, and to Cris and to George Markowsky for hosting my web sites at their respective institutions: the Centre for Discrete Mathematics and Theoretical Computer Science at the University of Auckland in New Zealand, and the Computer Science Department at the University of Maine in Orono, Maine.

Gregory Chaitin, August 2000

P.S. You can contact me at

chaitin@us.ibm.com
chaitin@watson.ibm.com
and you can get the software for this book at
http://www.cs.umaine.edu/~chaitin/ait
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait