Probability and Program-Size for Functions
Gregory Chaitin, IBM T. J. Watson Research Center
Fundamenta Informaticae 71 (2006), pp. 367-370
Abstract:
We show that unlike the general case of the relationship between algorithmic probability
and program-size for enumerating sets, in the case of the graphs of total functions
these two quantities are closely related.
"Classical" algorithmic information theory [1] deals with the program-size complexity and algorithmic
probabilities for computing individual objects.
This now well-developed theory [2, 3, 4] is in fact
contained within a more general theory [5], [2, Part IV] of the complexity H and probability P
for enumerating
sets of objects, as the special case of singleton sets (sets with a single
element). In the classical well-developed case, computations halt. In the more general
case, computations are unending. In both cases programs are self-delimiting
and one can therefore consider
the probability P(X) that a program enumerates a specific set X,
singleton or otherwise.
Henceforth we shall work in the general case and with unending computations and arbitrary
c.e. sets of output.
Functions can be included in this framework by defining the complexity of a function
H(f) to be
the size in bits of the smallest program for enumerating its graph, i.e., the set of all ordered
pairs 〈x, f(x)〉.
Note that
because programs are self-delimiting, the
program-size complexity for enumerating graphs of functions has
e.g. these elegant formal properties:
H({ψ(x)}) ≤ H(ψ) + H({x}) + c,
H({ψ(φ(x))}) ≤
H(ψ) + H(φ) + H({x}) + c,
H(λx[ψ(φ(x))]) ≤
H(ψ) + H(φ) + c.
Here H of a function denotes H of its graph.
For example, the last inequality in this list actually says that
H(ψ • φ) =
H({〈x, y〉: y = ψ(φ(x))})
≤ H(ψ) + H(φ) + c
=
H({〈x, ψ(x)〉}) +
H({〈x, φ(x)〉}) + c.
In the case of singleton sets there is a beautiful relation [1]
between the probability of computing it and the size of the smallest program for doing that:
H({x}) =
−log2 P({x}) + O(1).
It is shown in [5] that this fails in the general case of arbitrary not singleton
sets.
The purpose of this note is to point out that however things are not too bad in the
case of computations for enumerating graphs of total functions.
So in the general case of an arbitrary c.e. set X,
by definition, but it is not the case that
P(X) ≥ 2−N ⇒
H(X) ≤ N + c.
In fact, the most we currently know [6] is that
P(X) ≥ 2−N ⇒
H(X) ≤ 3N + O(log N).
However, in the case of X which are the graphs of total functions we can do better.
Consider a total function f: N → S.
Here N is the set of natural numbers and
S is the set of LISP S-expressions, as in the formalism used in [2].
Theorem.
If P(f) ≥ 2−N,
then H(f) ≤
N + H({N}) + c = N + O(log N).
Proof.
Given N, at stage K consider the tree of partial functions with
probability ≥ 2−N that is defined as follows:
-
At stage K consider all programs K bits in size and run them all for K seconds.
It is okay if not all K bits are actually read by our standard universal Turing machine.
And we stop running a program if it attempts to read more than K bits of its program.
-
We also need to "filter" the output of our standard universal Turing machine so that once
〈n, x〉 is output any further output of the form 〈n, x'〉
with x' ≠ x is ignored.
This filtering will have absolutely no effect on programs that enumerate the graphs of functions.
-
We are interested in functions fM with finite domain {0, 1, 2, 3, ..., M}
such that ≥ 2K−N of these K-bit programs have (filtered) output including the graph of
fM.
-
In addition, we require such functions fM to be maximal, i.e., not to be contained
in a "longer" function fL with the same property.
fM is not maximal if there is a function fL with finite domain
{0, 1, 2, ..., L} and
L > M such that fL(j) = fM(j) for all j ≤ M
and it is also the case that ≥ 2K−N of the K-bit programs considered at stage K
have (filtered) output including the graph of fL.
-
These functions
fM
may be regarded as forming a branching tree of possibilities, first branching depending on f(0),
then branching depending on f(1), etc.
-
We number (or "index") the branches of this tree in such a manner that once a number is assigned, it never
changes. Thus after a new branch appears, its index remains assigned to its natural or first extension,
and new index numbers must then be given to any side branches that later branch out from it.
(If several new side branches appear simultaneously, the lowest-valued branch, say, will be
considered to be the natural extension and the others will be index numbered from lowest value to highest
value. More generally, one can use lexicographic order to assign new index numbers in a systematic fashion.)
-
When things are done in this manner, a branch of this function tree may grow, or may stay the same,
but it will never shrink or disappear.
And indexes of total functions will remain valid once assigned.
There can be at most 2N branches in this tree of possible partial functions. Some of
these branches
will extend to infinity in the limit of infinite time, therefore yielding total functions, and some of the branches will
remain finite.
Given N and an index i ≤ 2N,
we can enumerate the function that is the ith branch of this tree of possibilities,
the ith branch that separates out as this tree grows.
So given N and i, we can actually compute the corresponding function.
Any total function f with probability
> 2−N will have some i for enumerating its graph in this manner.
Hence
H(f)
≤ (number of bits to calculate N) + (number of bits to calculate i given N)
+ c
≤ H({N}) + N + c'
= O(log N) + N.
Please note however that if the function's probability P(f)
is exactly 2−N we cannot be sure
of eventually discovering that P(f) ≥ 2N.
But then P(f) is > 2−(N+1)
and the theorem is true with N' = N+1,
which yields the same upper bound on H(f) since H({N}) = H({N+1}) + O(1).
Q.E.D.
References
[1] Chaitin, G.J., "A theory of program size formally identical to information theory,"
Journal of the ACM 22 (1975), pp. 329-340.
[2] Chaitin, G.J., Exploring Randomness,
Springer-Verlag, 2001.
[3] Calude, C.S., Information and Randomness, Springer-Verlag, 2002.
[4] Downey, R. and D. Hirschfeldt, Algorithmic Randomness and Complexity, in preparation.
[5] Chaitin, G.J., "Algorithmic entropy of sets,"
Computers & Mathematics with Applications 2 (1976), pp. 233-245.
[6] Solovay, R.M., "On random r.e. sets," in A.I. Arruda, N.C.A. da Costa, and R. Chuaqui,
Non-Classical Logics, Model Theory, and Computability, North-Holland,
Amsterdam, 1977, pp. 283-307.