Following [1], consider the self-delimiting program-size complexity H(x) of an n-bit string x. As is shown in [1], the maximum possible complexity of an n-bit string is within a constant number of bits of the sum of n and the program-size complexity of the natural number n (or equivalently, the program-size complexity of the binary numeral for n):
max|x|=n H(x) = n + H(n) + O(1).Obviously, at least one n-bit string achieves this maximum possible complexity. The purpose of this note is to show that a substantial fraction of the n-bit strings achieve exactly this maximum possible complexity, in fact, at least 2n−c of them do.
Let hn be the maximum complexity that it is possible for an n-bit string x to have:
hn ≡ max|x|=n H(x).As we pointed out before,
hn = n + H(n) + O(1).Let kn be the n-bit base-two numeral for the number of n-bit strings x whose complexity H(x) is exactly the maximum possible complexity hn:
kn ≡ # { |x| = n : H(x) = hn}.(Note that if all 2n n-bit strings should happen to have the maximum possible complexity, then the n-bit string kn ``overflows'' and consists entirely of 0's.)
Concatenate together the following four bit strings: First (1) a minimum-size self-delimiting program for the natural number n. (This bit string is H(n) bits long.) Followed by (2) a minimum-size self-delimiting program for the absolute value of the difference between n + H(n) and hn. (As this difference is O(1), this bit string is O(1) bits long.) Followed by (3) a single 0 bit if n + H(n) is less than or equal to hn, and a single 1 bit if n + H(n) is greater than hn. (In either case, this will be a single bit.) Followed by (4) a minimum-size self-delimiting program for computing the n-bit base-two numeral kn if we are given a minimum-size self-delimiting program for n. (This bit string is H(kn|n) bits long.)
The size of the result of concatenating together these four bit strings is exactly
H(n) + H( | n + H(n) − hn | ) + 1 + H(kn|n),which is equal to
H(n) + H(kn|n) + O(1)bits.
Suppose that we are given this string of H(n) + H(kn|n) + O(1) bits. From it, we can in turn determine n and H(n), then hn, and finally kn. The set of all true propositions of the form ``H(x) ≤ i'' is recursively enumerable. We enumerate this set until we find exactly (2n − kn) n-bit strings x with complexity H(x) less than hn. Then the remaining kn n-bit strings must have complexity exactly equal to hn. Print out the first of these kn n-bit strings and halt.
What we have just described in words is in fact a self-delimiting program for a special-purpose computer C to compute a string of complexity hn. This program for C is H(n) + H(kn|n) + O(1) bits long, and to simulate this program on our standard universal computer U we need to add at most O(1) bits. Thus there is an (H(n) + H(kn|n) + O(1))-bit program for U to compute a string of complexity hn. Hence it must be the case that
hn ≤ H(n) + H(kn|n) + O(1).Since
hn = n + H(n) + O(1),it follows that
n ≤ H(kn|n) + O(1).Recalling that the numeral kn is exactly n bits long, we see that in fact
H(kn|n) = n + O(1).Since for any n-bit string x we have
H(x) = H(n) + H(x|n) + O(1)(see [1]), it follows that
H(kn) = n + H(n) + O(1) = hn + O(1).In other words, the n-bit base-two numeral for the number of n-bit strings that have exactly the maximum possible complexity, itself has nearly this maximum possible complexity!
From this it is easy to see that kn must be large. For if there are j consecutive 0's at the left end of a bit string, this makes its complexity drop by j − O(log j), because, roughly speaking, we can replace these j 0's by a minimum-size self-delimiting program for j, which is only H(j) = O(log j) bits long.
Here is the detailed argument. Consider an n-bit string 0j1... with j consecutive 0's at the left. Concatenate together a minimum-size self-delimiting program for n, followed by a minimum-size self-delimiting program for j, followed by the n − j bits after the initial run of j 0's. The concatenation of these three bit strings is exactly
H(n) + H(j) + n − jbits long, and this is a self-delimiting string from which we can in turn recover n, then j, and finally the original n-bit string 0j1.... In other words, there is a special-purpose computer C such that
HC(0j1...) ≤ H(n) + H(j) + n − j.What the special-purpose computer C can do, can also be done by our standard universal computer U by adding O(1) bits to the program. Thus
H(0j1...) ≤ H(n) + H(j) + n − j + O(1)But
≤ n + H(n) − j + O(log j) .
H(kn) = n + H(n) + O(1) .Thus there can be at most O(1) consecutive 0's at the left end of the n-bit numeral kn. Hence
kn > 2n−c,which was to be proved.
An earlier, unpublished version of this result was obtained more than twenty years ago in connection with the now obsolete concept of blank-endmarker program-size complexity. We thank Professor Cristian Calude of the University of Auckland for conversations that stimulated us to reformulate this result using the current, self-delimiting version of program-size complexity.
The result proved in this note is a bit bizarre, but we feel that the pretty proof, showing that a number is large because it is random, deserves to be saved from oblivion.