So now we only need to show that
In this chapter we'll actually prove that
where c depends only on the computer C, not on the S-expression x. Then the special case C = U will complete our proof that
Why is this result important? Because:
So statements in probability theory dealing with products of probabilities correspond to results in AIT dealing with sums of complexities. And divisions turn into subtractions... This gives you a feeling for what AIT is like, in a purely formal way, if you're familiar with the formulas in probability theory. Of course the meaning of these formulas is completely different! Normal probability theory says nothing about computer programs.
This will imply that the number of minimum-size programs p for the S-expression x is always ≤ 2c:
Why is this? Well, simply because if there were exactly 2c minimum-size programs for x and no other programs for x, then we would have
and then it would be the case that
So more than 2c minimum-size programs for x would certainly push −log2 P(x) + c over the edge and make it less than H(x), which we'll see is impossible.
So we'll get the nice corollary from our main theorem that there cannot be too many minimum-size programs for a given result x. In fact, even more than that is true: the minimum-size program for x is essentially unique, because if we're given any minimum-size program x* for x then we only need be told O(1) bits more in order to get any other minimum-size program p' for x. How so?
Well, given x*, we know x and H(x), and if in addition we're told precisely how many minimum-size programs p for x there are, then by systematically running all H(x) bit programs p on U for more and more time until we find the requisite number of p with U(p) = x, we can determine all the minimum-size programs p for x. Then to select any particular p, we just need to know which one to pick. To do this we only need to know x* and two numbers ≤ 2c, that is, two c + 1 bit numbers. [Actually, two c-bit numbers will do. Since neither of these numbers can be zero, we can just let their maximum possible values of 2c wrap around to zero.]
Indeed, we can restate this much more forcefully if we borrow the notion of mutual information content from the next chapter, where it will be defined. We've just given the heart of the argument that any two minimum-size programs p and p' for x have mutual information content H(p:p') very close to the maximum possible, H(x). That is, essentially all their information is in common.
Exercise for the reader: show that a similar argument works if we fix a k-bit neighborhood around the minimum-size program for x, i.e., consider all programs p for x with |p| ≤ H(x) + k, k fixed.
The approximate idea is as follows. We are given a computer C, and from it we'll construct a computer C' with the property that if
(and we can realize this in a finite amount of time) then there is precisely one k + 1 bit program for C' to produce x. In other words, our approximate goal is that for each j there is either one or no j-bit program for C' to produce x depending on whether or not
But we won't produce all of these programs, which would be infinitely many for each output. We'll actually only produce finitely many for each output.
More precisely, the process is this. We systematically run more and more programs on C for more and more time. At stage k = 0,1,2,... we run each k-bit program on C for time k. And if we find ≥ 2m and < 2m+1 k-bit programs that produce x, [These programs may have trailing bits that were never read by C.] this means that
and we emit the following (output, size-of-program) requirement for C': (x, k−m+1). This is done only if this is a new requirement that we have not emitted previously. So the total probability that C' assigns to x is
and the Kraft inequality condition for constructing C' is satisfied simply because it was satisfied by C; C' inherits Kraft from C.
Thus C' ``concentrates'' the algorithmic probability that C assigns to each S-expression x into a set of programs for x all of which are of different size.
Then we can simulate C' using U, with the result that
which completes our proof.
What is the computer C that I use to illustrate how our algorithm works? Well, I've chosen a C that ignores all the odd bits in its program p, and counts the number i of bits in even positions up to and including the first 1 bit. The value that C returns, C(p), will be ten raised to this power, that is, 10i.
Since odd bits of p are ignored, for each i ≥ 1 we have
Thus we'll produce the requirement (10i, i+1) for constructing C', which requests that there should be a program for C' to produce 10i that is i+1 bits long. That's how C' concentrates C's algorithmic probability into smaller programs, and that's how we show that
Below we define an unending expression all-together for enumerating the entire infinite set of requirements for C', and then we test it with the C described above.
[The case that we're actually interested in in this chapter is C = U, which shows that
but this would be too slow to run, and it would be harder to interpret the output.]
After we define all-together, we try it out. We let it run for a while to show that it works with the C that we've chosen. Within the time limit of 60 units that I give it, our algorithm manages to discover the first three requirements for C', namely: (101, 2), (102, 3), (103, 4). So
and for this C and C' we have
That's it, we're finished!
So our LISP program works, at least in this special case it does. Can you try it out on a C for which our algorithm will produce more than one requirement for each output?
[[[[[ Occam's razor---Concentration process. From computer C construct C' such that if P_C(x) >= 1/2^k, then then C' has a k+1 bit program for x. Hence H(x) <= -log_2 P_C(x) + c where c depends only on C, not on x. ]]]]] define all-together [this is used to avoid duplicate requirements for C'] let previous-requirements nil [test case special-purpose computer C:] [ignores odd bits, multiplies by ten until hits a 1] [this C has many programs that do the same job!] [[to put U here instead, let C be 'eval read-exp]] let C ' let (loop n) let ignore-it [be] read-bit [skip bit] if = 1 read-bit [then return] n [else] (loop * 10 n) (loop 10) [stage n = 0, 1, 2, ... of overall concentration process] [look at all n-bit programs for C, run them for time n] [merge (output,multiplicity) pairs, emit requirements for C'] let (stage n) let previous-requirements (make-requirements debug (how-many? nil debug n)) (stage + n 1) [produce (output,multiplicity) pairs] [by running all n-bit programs on C for time n] let (how-many? p n) if = n length p [run program p for time n] let v try n [U => 'eval read-exp] C p if = success car v [program ran to completion] [indicate that it produces] [its output with multiplicity one] cons cons cadr v cons 1 nil nil [otherwise program failed] nil [empty list of (output,multiplicity) pairs] [otherwise use recursion to combine multiplicities] (merge (how-many? cons 0 p n) (how-many? cons 1 p n) ) [add one (output,multiplicity) pair to a list of such pairs] let (merge1 pair list) if atom list cons pair nil let first-in-list car list let rest-of-list cdr list let output car pair let multiplicity cadr pair let output2 car first-in-list let multiplicity2 cadr first-in-list if = output output2 [= -> combine multiplicities] cons cons output cons + multiplicity multiplicity2 nil rest-of-list [!= -> don't combine multiplicities] cons first-in-list (merge1 pair rest-of-list) [combine two lists of (output,multiplicity) pairs] let (merge list list2) if atom list list2 (merge1 car list (merge cdr list list2)) [exponent in highest power of 2 <= n, n != 0] let (log2 n) let (loop power exponent) let new-power + power power [double it] let new-exponent + 1 exponent [add 1 to it] if > new-power n [then return] exponent [else] (loop new-power new-exponent) (loop [initial power of two] 1 [initial exponent of 2] 0) let (make-requirements list-of-pairs) if atom list-of-pairs previous-requirements let first-pair car list-of-pairs let list-of-pairs cdr list-of-pairs let output car first-pair let multiplicity cadr first-pair let kraft-requirement cons output cons - + n 1 (log2 multiplicity) nil let previous-requirements (make-requirements list-of-pairs) [keep only first appearance of requirement] if (is-in? kraft-requirement previous-requirements) previous-requirements cons debug display kraft-requirement previous-requirements let (is-in? x list) [is x in list?] if atom list false if = x car list true (is-in? x cdr list) [HERE GOES!] (stage 0) try 60 all-together nil
http://www.cs.umaine.edu/~chaitin/ait/occam.l http://www.cs.umaine.edu/~chaitin/ait/occam.r http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/occam.l http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/occam.r
is bounded by a c' that depends only on c, not on n.
Determine an explicit numerical value for c.