Proof that Martin-Löf randomness is equivalent to Chaitin randomness

[History: C.P. Schnorr communicated this result to me in 1974 when he refereed my 1975 J.ACM paper ``A theory of program size formally identical to information theory,'' in which I defined Chaitin randomness. It was then independently discovered by R.M. Solovay in his unpublished book-length manuscript on AIT dated May 1975. Here I follow the version presented in my 1987 Cambridge University Press monograph Algorithmic Information Theory.]

Introduction

Okay, in the previous chapter I showed you my definitions of randomness based on program-size complexity---four of them actually! In this and the next chapter I'll show you two statistical definitions of a random real that look very different from my program-size irreducibility definition, but which actually turn out to be equivalent. These are P. Martin-Löf's and R.M. Solovay's definitions of a random real, and I'll start with Martin-Löf's definition, which was proposed before Solovay's.

A little bit of measure theory! Measure theory in LISP---What is a computable sequence of coverings An? What's the measure μ{An}?

Following P. Martin-Löf, [``The definition of random sequences,'' Information and Control 9 (1966), pp. 602-619.} let's develop a little bit of constructive measure theory.

The first step is to explain what a sequence of coverings An is. Well, a covering is an infinite set of subintervals of the unit interval, which we'll use to cover sets of reals in the unit interval. [The unit interval is the set of reals between zero and one.] And a computable or constructive formulation of this is as a LISP function (A n), a LISP expression which never halts and which generates and outputs using display an infinite set of bit strings α, β, γ, ...

Each of these bit strings is the prefix for a set of reals, all reals with binary expansions starting with α, β, γ, ... respectively:

.α ...
.β ...
.γ ...

And we'll prohibit these subintervals from overlapping, so that the bit strings generated by (A n) must be a prefix-free set. That condition on the bit strings α, β, γ, ... is equivalent to the geometrical constraint that the subintervals in the covering cannot overlap. The reason for this prefix-free set requirement, is that then it's easy to compute the measure μ{An} of the covering An.

That's defined to be the total length of all the subintervals in the covering, which is, since there's no overlap, simply equal to the sum of two raised to negative powers which are the sizes or lengths of all the generated prefixes. Thus

μ{An} = ∑xAn 2−|x| = 1/2|α| + 1/2|β| + 1/2|γ| + ...

So, for example, if

An = { 111, 000, 101010 , ... }

then

μ{An} = 1/23 + 1/23 + 1/26 + ...

Now what do we do with these parameterized coverings An?

Well, the first thing that Martin-Löf did was to define a set of measure zero, or null set, in a constructive way.

A set of measure zero is a set of reals, a subset of the unit interval with the property that we can cover it with coverings that are arbitrarily small. More precisely, Martin-Löf defines a set X to be of measure zero (constructively) if there is a computable sequence of coverings An all of which cover X, that is,

n [ XAn ],

and with the property that

n [ μ{An} ≤ 2n ].

Finally, Martin-Löf defines a real x to be Martin-Löf random if it is not in any constructive null set. That is, x fails to be Martin-Löf random if and only if there is a computable sequence of coverings An which always contain x and with μ{An} ≤ 2n. Note that since we represent subintervals of the unit interval via binary prefixes, x is in the covering An iff x is an extension of some prefix in An. In other words, we think of An both as a set of reals and as a set of bit strings.

This is a very natural statistical definition of a random real x. It just says that a real is random if it cannot be constructively pushed into an arbitrarily small piece of the measure space. And if a (suitably constructive) property is true with probability one, then a random real must have that property, for sure, not just with probability one. As we shall see in the next chapter, using Solovay randomness, constructive measure zero sets correspond to properties of x that are infinitely unlikely, like not having the same limiting relative frequency of 0's and 1's.

In fact, in this chapter I'll show you just how natural Martin-Löf's definition is: it's equivalent to mine! I'll break the proof into two parts.

Proof that if x isn't Martin-Löf random, then it's not Chaitin random

The program that proves this is martin-lof.l.

First let me explain the proof, then I'll tell you the example that our software works its way through.

Let's suppose that the real number x is not Martin-Löf random. In other words, there is a computable sequence of coverings An which always contain x and with μ{An} ≤ 2n.

From these An, we'll build a computer C, by generating a list of (program, size-of-output) requirements for C.

Let's start by noting that the following series converges and is in fact less than 1:

n = 2,3,4,... 1/2n2n = 1/22 + 1/26 + 1/212 + ...

Note that no negative powers of two are repeated, and so this sum is less than ∑k = 1,2,3,... 2k = 1. This will be our Kraft inequality ΩC ≤ 1, because our set of requirements consists precisely of all (output, size-of-program) pairs defined as follows:

{ (s, |s|−n) : sAn2, n ≥ 2 }.

So everything works, and x is in all of these An2, and therefore for each n ≥ 2 there are prefixes s of x for which HC(s) ≤ |s|−n, and x is not Chaitin random.

That completes the proof. Now what's the example that martin-lof.l works its way through?

The program martin-lof.l goes through the special case in which An is precisely the singleton set consisting of the n-bit string of n 1's. So the measure of An is precisely 2n. And the list of requirements produced by martin-lof.l is

{ (s, |s|−n) : sAn2, n ≥ 2 }   =  
{ (1111, 2), (111111111, 6), (1111111111111111, 12), ... (1n2, n2n), ... }

as it should be.

In fact, take a good look at what we've done. We started with the fact that the infinite binary sequence 11111... isn't Martin-Löf random, and from that we've shown that it isn't Chaitin random either, because

H(1n2) ≤ n2n + c

for all n ≥ 2. So this is a good example of the general idea.

Okay, now let's work in the opposite direction!

Proof that if r isn't Chaitin random, then it's not Martin-Löf random

The program that proves this is martin-lof2.l.

First let me explain the proof, then I'll tell you the example that our software works its way through.

Let's suppose that the infinite binary sequence r isn't Chaitin random. In other words, that for every k, for some n there is an n-bit prefix rn of r such that H(rn) ≤ nk.

Now let's define a computable sequence of coverings Ak as follows: Ak covers all reals r having any n-bit prefix rn with H(rn) ≤ nk.

In other words, Ak is the set of bit strings of any size that can be compressed by at least k bits using U. By hypothesis, r is in each of the Ak. What's the measure of Ak? Well, let's find out.

We saw in the first chapter of Part III that for an n-bit string β

Prob{ H(β) ≤ n + H(n) − k }   ≤  1/2kc.

In other words, there's a c such that (the probability that the complexity of an n-bit string is less than or equal to n + H(n) − k) is less than or equal to 2k+c.

Substituting H(n)+k for k, and substituting rn, the first n bits of r, for β, gives us

Prob{ H(rn) ≤ nk }   ≤  1/2H(n)+kc.

Let's sum this over all positive integers n. That gives us

n Prob{ H(rn) ≤ nk }   ≤  ∑n 1/2H(n)+kc   ≤  2ck   Ω   ≤  2ck.

So the measure of our Ak is bounded by 2ck instead of 2k. But that's easy to fix. The sequence of coverings that we use to show that r isn't Martin-Löf random is actually Ak' = Ak+c, whose measure is bounded by 2k, as it should be.

Hence a real r with prefixes whose complexity dips arbitrarily far below their length will be in all of the Ak' and hence will not be Martin-Löf random.

That completes the proof. Now what's the example that martin-lof2.l works its way through?

Well, we were supposed to generate an Ak' = Ak+c that covers all reals r having an n-bit prefix rn with H(rn) ≤ nkc for any n. Instead we just generate Ak, which covers all reals r having an n-bit prefix rn with HC(rn) ≤ nk for any n. The case C = U is what we need to prove the theorem, but to test our software we've chosen instead a computer C(p) that doubles each bit in its program up to and including the first 1 bit. So, C(0i1) = 02i11, and n-bit programs turn into 2n-bit results.

The program martin-lof2.l generates a finite part of the infinite set A8, that is, of the set of strings s with HC(s) ≤ |s| − 8, the set of strings that can be compressed by at least 8 bits using C:

A8   =   { 02k11 : k ≥ 7 }   =  
{ 0000000000000011, 000000000000000011, 00000000000000000011, ... }.

The gory details

Here's martin-lof.l. Martin-lof2 is a bit large because it has to be prepared to fix up intervals that overlap. So it's not included here---it's best examined in your Web browser.

[[[[[
   Show that a real r is Martin-Lof random
   iff it is Chaitin random.
]]]]]

[First part: not M-L random ===> not Ch random] 

[We create the following set of requirements] 
[(output, size-of-program)]
[ (s, |s|-n) : s in A_{n^2}, n >= 2 ] 

[Stage k>=0, look at all A_{n^2} n = 2 to 2+k for time k.]
[Then have to combine stage 0, stage 1,...]
[and eliminate duplicates]

[infinite computation that displays strings] 
[in cover A_m with measure mu <= 1/2^m]
define (A m) 
   cdr cons
   [test case, A_m = string of m 1's]
   display base10-to-2 - ^ 2 m 1 
   nil

define (is-in? x l) [is x in the list l?]
   if atom l    false
   if = x car l true
   (is-in? x cdr l)

define (convert-to-requirements cover n) [display requirements]
   if  atom cover requirements [finished?]
   let s          car cover [get first string]
   let cover      cdr cover [get rest of strings]
   let requirement 
       cons s cons - length s n nil [form (s, |s|-n)]
   if (is-in? requirement requirements) [duplicate?]
   [yes] (convert-to-requirements cover n)
   [no]  let requirements cons display requirement requirements
         (convert-to-requirements cover n)

define (stage k)
   if = k 4 stop! [[[stop infinite computation!!!]]]
   let (loop i) [i = 0 to k]
      if  > i k (stage + k 1) [go to next stage]
      let n + 2 i [n = 2 + i]
      let expr cons cons "' cons A nil
                    cons * n n nil
      let cover caddr try k expr nil  [caddr = displays]
      let requirements (convert-to-requirements cover n)
      (loop + i 1) [bump i]
   [initialize i]
   (loop 0)

[to remove duplicates]
define requirements ()

[run it]
(stage 0)

Software for this chapter

The source code and runs of the software for this chapter can be found on the web at these URL's:
   http://www.cs.umaine.edu/~chaitin/ait/martin-lof.l
   http://www.cs.umaine.edu/~chaitin/ait/martin-lof.r

   http://www.cs.umaine.edu/~chaitin/ait/martin-lof2.l
   http://www.cs.umaine.edu/~chaitin/ait/martin-lof2.r

   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/martin-lof.l
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/martin-lof.r

   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/martin-lof2.l
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/martin-lof2.r

Exercise

  1. Programming improvements. Avoiding repeated requirements is a nuisance! Would it be better to remove duplicates directly in the Kraft algorithm, not in each of the algorithms that generate requirements? Or should one perhaps leave duplication removal in the requirement generating algorithms, but employ the following strategy: subtract what is discovered at stage n from what is discovered at stage n+1, and then make the new discoveries into requirements? That is, is it better to do something like this:
       (make-requirements-from 
          (set-subtraction (stage + n 1) 
                           (stage n)
          )
       ) 
    
    If not, why not? If so, do it!