I'm delighted to be here at IQC, an institution devoted to two of my favorite topics, information and computation.

The field I work in is algorithmic information theory, AIT, and in a funny way AIT is precisely the dual of what the IQC is all about, which is quantum information and quantum computing.

Let me compare and contrast the two fields: First of all, I care about bits of software, not qubits, I care about the size of programs, not about compute times; I look at the information content of individual objects, not at ensembles, and my computers are classical, not quantum. You care about what can be known in physical systems and I care about what can be known in pure mathematics. You care about practical applications, and I care about philosophy.

And strangely enough, we both sort of end up in the same place, because God plays dice both in quantum mechanics and in pure math: You need quantum randomness for cryptography, and I find irreducible complexity in the bits of the halting probability Ω.

So my subject will be algorithmic information, not qubits, and I'd like to show you three different applications of the notion of algorithmic information: in physics, in math and in biology. I'll show you three software models, three toy models of what goes on in physics, in math and in biology, in which you get insight by considering the amount of software, the size of programs, the algorithmic information content.

But first of all, I should say that these are definitely toy models, highly simplified models, of what goes on in physics, in math and in biology. In fact, my motto for today is taken from Picasso, who said, ``Art is a lie that helps us see the truth!'' Well, that also applies to theories:

You see, in order to create a mathematical theory in which you can prove neat theorems, you have to concentrate on the essential features of a situation. The models that I will show you are highly simplified toy models. You have to eliminate all the distractions, brush away all the inessential features, so that you can see the ideal case, the ideal situation. I am only interested in the Platonic essence of a situation, so that I can weave it into a beautiful mathematical theory, so that I can lay bare its inner soul. I have no interest in complicated, realistic models of anything, because they do not lead to beautiful theories.

You will have to judge for yourself if my models are oversimplified, and whether or not they help you to understand what is going on in more realistic situations.

So let's start with physics, with what you might call AIT's version of the Leibniz/Weyl model of the scientific method. What is a law of nature, what is a scientific law? The AIT model of this is given in this diagram:

On the left-hand side we have a scientific theory, which is a computer program, a finite binary sequence, a bit string, for calculating exactly your experimental data, perhaps the whole time-evolution of the universe, which in this discrete model is also a bit string.

In other words, in this model a program is a theory for its output. I am not interested in prediction and I am not interested in statistical theories, only in deterministic theories that explain the data perfectly. And I don't care how much work it takes to execute the program, to run the theory, to actually calculate the world from it, as long as the amount of time required to do this is finite. And I assume the world is discrete, not continuous. Remember Picasso. Remember I'm a pure mathematician!

The best theory is the smallest, the most concise program that calculates precisely your data. And a theory is useless unless it is a compression, unless it has a much smaller number of bits than the data it accounts for, the data it explains. Why? Because there is always a theory with the same number of bits, even if the data is completely random.

In fact, this gives you a way to distinguish between situations where there is a law, and lawless situations, ones where there is no theory, no way to understand what is happening.

Amazingly enough, aspects of this approach go back to Leibniz in 1686 and to Weyl in 1932. Let me tell you about this. In fact, it's all here:

Leibniz, 1686: *Discours de métaphysique* V, VI

**
Hermann Weyl, 1932: The Open World
1949: Philosophy of Mathematics & Natural Science
**

So you see, the concepts of law of nature and of complexity are inseparable.
Remember, 1686 was the year before Newton's *Principia*.
What Leibniz discusses in Sections V and VI of his *Discours* is why is
science possible, how can you distinguish a world in which science applies
from one where it doesn't, how can you tell if the world is lawful or not.
According to AIT,
that's only if there's a theory with a much smaller number of bits than the data.

Okay, that's enough physics, let's go on to mathematics. What if you want to prove that you have the best theory, the most concise program for something? What if you try to use mathematical reasoning for that?

What metamathematics studies are so-called *formal axiomatic theories,*
in which there has to be a proof-checking algorithm, and therefore there is
an algorithm for checking all possible proofs and enumerating all the theorems in
your formal axiomatic theory. And that's the model of an axiomatic theory studied
in AIT. AIT concentrates on the size in bits of the algorithm for running through
all the proofs and producing all theorems, a very slow, never-ending computation,
but one that can be done mindlessly. So how many bits of information does it take
to do this? That is all AIT cares about. To AIT a formal axiomatic theory is just a black
box; the inner structure, the details of the theory, are unimportant.

Axioms, Rules of Inference → *COMPUTER* → Theorem_{1},
Theorem_{2}, Theorem_{3}...

**
How many bits of software are you putting into the computer to get the theorems out?
**

So that's how we measure the information content, the complexity of a mathematical theory. The complexity of a formal axiomatic theory is the size in bits of the program that generates all the theorems. An N-bit theory is one in which the program to generate all the theorems is N bits in size.

And my key result is that it takes an N-bit axiomatic theory to enable you to prove that an N-bit
program is *elegant*, that is, that it is the most concise explanation for its output.
More precisely, a program is elegant if no smaller program written in the same language
produces the same output.

How can you prove this metatheorem?

Well, that's very easy. Consider a formal axiomatic mathematical theory T and this paradoxical program P:

In other words, P runs through all proofs in T, checking them all, until it gets to the first proof that a program Q that is larger than P is elegant, then P runs Q and produces Q's output. But if P finds Q and does that, it produces the same output as a provably elegant program Q that is larger than P, which is impossible, because an elegant program is the most concise program that yields the output it does.

Hence P never finds Q, which means that in T you can't prove that Q is elegant if Q is larger in size than P.

So now the key question is this: How many bits are there in P? Well, just a fixed number of
bits more than there are in T. In other words, there is a constant c such that for any
formal axiomatic theory T, if you can prove in T that a program Q is elegant only if
it is actually elegant, then you can prove in T that Q is elegant only if Q's size in bits, |Q|,
is less than or equal to c plus the complexity of T, which is by definition the number of bits of software that it
takes to enumerate all the theorems of your theory T. **Q.E.D.**

**
⇒
``Q is elegant'' ∈ T only if |Q| ≤ |T| + c.
**

That's the first major result in AIT: that it takes an N-bit theory to prove that an N-bit
program is elegant. The second major result is that the bits of the base-two numerical
value of the halting probability Ω are **irreducible**, because
it takes an N-bit theory
to enable you to determine N bits of Ω.

This I won't prove, but
let me remind you how Ω is defined. And of course Ω = Ω_{L} also depends on
your choice of programming language L, which I discussed in my talk on Monday
at the Perimeter Institute:

0 < Ω_{L} = **Σ**_{p halts} 2^{−|p|} < 1 Ω_{L} = .1101100...

**
N-bit theory ⇒ at most N+c bits of Ω
**

By the way, the fact that in general a formal axiomatic theory can't enable you to prove that individual programs Q are elegant, except in finitely many cases, has as an immediate corollary that Turing's halting problem is unsolvable. Because if we had a general method, an algorithm, for deciding if a program will ever halt, then it would be trivial to decide if Q is elegant: You'd just run all the programs that halt that are smaller than Q to see if any of them produce the same output.

And we have also proved in two different ways that the world of pure math is infinitely complex, that no theory of finite complexity can enable you to determine all the elegant programs or all the bits of the halting probability Ω.

Those are my software models of physics and math.
Now for a software model of evolution!
These are some new ideas of mine.
I've just started working on this.
I call it *metabiology*.

Our starting point is the fact that, as Jack Schwartz used to tell me, DNA is just digital software. So we model organisms only as DNA, that is, we consider software organisms. Remember Picasso! We'll mutate these software organisms, have a fitness function, and see what happens!

The basic ideas are summarized here:

Random Walks in Software Space

Evolution of Mutating Software

Organism = Program

★ Single Organism

Fitness Function = Busy Beaver Problem

**
Mutation Distance[A, B] =
−log _{2} probability of mutating from A to B
**

It will take a while to explain all this, and to tell you how far I've been able to get with this model. Basically, I have only two and a half theorems...

Let me start by reminding you that
Darwin was inspired by the analogy between *artificial selection* by animal and plant
breeders and *natural selection* by Nature.
Metabiology exploits the analogy between natural software (DNA) and artificial software
(computer programs):

Next, I'd like to tell you how I came up with this idea. There are two key components.
One I got by reading David Berlinski's polemical book *The Devil's Delusion* which
discusses some of the arguments against Darwinian evolution, and the other is the
Busy Beaver problem, which gives our organisms something challenging to do.
And I should tell you about Neil Shubin's book *Your Inner Fish.*
Let me explain each of these in turn.

Berlinski has an incisive discussion of perplexities with Darwinian evolution. One is the absence of intermediate forms, the other is major transitions in evolution such as that from single-celled to multicellular organisms. But neither of these is a problem if we consider software organisms.

Darwin himself worried about the eye; he thought that partial eyes were useless. In fact, as a biologist explained to me, eye-like organs have evolved independently many different times.

Anyway, we know very well that small changes in the software can produce drastic changes in the output. A one-bit change can destroy a program! So the absence of intermediate forms is not a problem.

How about the transition from unicellular to multicellular? No problem, that's just the idea of a subroutine. You take the main program and make it into a subroutine that you call many times. Or you fork it and run it simultaneously in many parallel threads.

And Berlinski discusses the neutral theory of evolution and talks about evolution as a random walk.

So that was my starting point.

But to make my software organisms evolve, I need to give them something challenging to do. The Busy Beaver problem to the rescue! That's the problem of finding small, concise names for extremely large positive integers. A program names a positive integer by calculating it and then halting.

And the BB problem can utilize an unlimited amount of mathematical creativity, because it's equivalent to Turing's halting problem, since another way to define BB of N is as follows:

These two definitions of BB(N) are essentially equivalent.

[On naming large numbers, see Archimedes' *The Sand Reckoner,*
described in Gamow, *One, Two, Three... Infinity.* I thank Ilias Kotsireas for
reminding me of this early work on the BB problem.]

The key point is that BB(N) grows faster than any computable function of N, because otherwise there would be an algorithm for solving the halting problem, which earlier in this lecture we showed is impossible using an information-theoretic argument.

At the level of abstraction I am working with in this model, there is no essential difference between mathematical creativity and biological creativity.

Now let's turn to Neil Shubin's book *Your Inner Fish,* that he summarized in
his article in
the January 2009 special issue of *Scientific American* devoted to Darwin's
bicentennial.

I don't know much about biology, but I do have a lot of experience with software. Besides my theoretical work, during my career at IBM I worked on large software projects, compilers, operating systems, that kind of stuff. You can't start a large software project over from scratch, you just patch it and add new function as best you can.

And as Shubin spells it out, that's exactly what Nature does too. Think of yourself as an extremely large piece of software that has been patched and modified for more than a billion years to adapt us to new ecological niches. We were not designed from scratch to be human beings, to be bipedal primates. Some of the design is like that of fish, some is like that of an amphibian.

If you could start over, you could design human beings much better. But you can't start over. You have to make do with what you have as best you can. Evolution makes the minimum changes needed to adapt to a changing environment.

As François Jacob used to emphasize, Nature is a cobbler, a handyman,
a *bricoleur*. We are all random walks in program space!

So those are the main ideas, and now let me present my model. I have a single software organism and I try making random mutations. These could be high-level language mutations (copy a subroutine with a change), or low-level mutations.

Initially, I have chosen point mutations: insert, delete or change one or more bits. As the number of bits that are mutated increases, the probability of the mutation drops off exponentially. Also I favor the beginning of the program. The closer to the beginning a bit is, the more likely it is to change. Again, that drops off exponentially.

So I try a random mutation and I see what is the fitness of the resulting organism. I am only interested in programs that calculate a single positive integer then halt, and the bigger the integer the fitter the program. So if the mutated organism is more fit, it becomes my current organism. Otherwise I keep my current organism and continue trying mutations.

By the way, to actually do this you would need to have an oracle for the halting problem, since you have to skip mutations that give you a program that never halts.

There is a non-zero probability that a single mutation will take us from an organism to any other, which ensures we will not get stuck at a local maximum.

For the details, you can see my article in the February 2009 *EATCS Bulletin,* the
*Bulletin of the European Association for Theoretical Computer Science.*
I don't think that the details matter too much. There are a lot
of parameters that you can vary in this model and probably still get things to work. For example, you can change the programming
language or you can change the mutation model.

As a matter of fact, the programming language I've used is one of the
universal Turing machines in AIT_{1960} that I discussed in my
Monday lecture at Perimeter.
I picked it not because it is the right choice, but because it is a programming language
I know well.

Okay, we have a random walk in software space with increasing fitness. How well will this do? I'm not sure but I'll tell you what I can prove.

First of all, the mutation model is set up in such a way that in time of the
order of 2^{N}
a single mutation will try adding a self-contained N-bit prefix that calculates BB(N)
and ignores the rest of the program;
the time is the number of mutations that have been considered.
The size of the organism in bits will grow at most
as 2^{N}, the fitness will grow as BB(N). So the fitness will grow faster than any
computable function, which shows that biological creativity is taking place;
for if
an organism is improved mechanically via an algorithm and without any creativity,
then its fitness will only increase as a computable function.

With high probability

fitness[time order of 2

which grows faster than any computable function of N.

We can prove that evolution will occur but our proof is not very interesting since the time involved is large enough for evolution to randomly try all the possibilities. And, most important of all, in this proof evolution is not cumulative. In effect, we are starting from scratch each time.

Now, I think that the behavior of this model **will actually be** cumulative
but I can't prove it.

However, I can show that there is what might be called *an ideal evolutionary pathway,*
which is a sequence of organisms having fitness that grows as BB(N)
and size in bits that grows as N, and the mutation distance between
successive organisms is bounded.

That is encouraging. It shows that there are intermediate forms that are fitter and fitter. The only problem is that this pathway is unstable and not a likely random walk; it's an ideal evolutionary pathway.

This is my second theorem, and the organisms are reversed initial segments of the bits of the halting probability Ω (plus a fixed prefix). If you are given an initial portion of Ω, K bits in fact, then you can find which ≤ K bit programs halt and see which one produces the largest positive integer, which is by definition BB(K). Furthermore the mutation distance between reversed initial segments of Ω is not large because you are only adding one bit at a time.

There is a sequence of organisms O

O

fitness[O

mutation-distance[O

(If we could shield the prefix from mutations, and we picked as successor to each organism the fittest organism within a certain fixed mutation distance neighborhood, then this ideal evolutionary pathway would be followed.)

These are my two theorems. The half-theorem is the fact that a sequence of software organisms with increasing fitness and bounded mutation distance does not depend on the choice of universal Turing machine, because adding a fixed prefix to each organism keeps the mutation distance bounded.

That a sequence of organisms O

mutation-distance[O

does not depend on the choice of universal Turing machine.

Okay, so at this point there are only these 2 ½ theorems. Not very impressive. As I said, at this time metabiology is a field with a lovely name but not much content.

However, I am hopeful. I feel that some kind of evolving software model should work. There are a lot of parameters to vary, a lot of knobs to tweak. The question is, how biological will the behavior of these models be? In particular I would like to know if hierarchical structure will emerge.

The human genome is a very large piece of software: about a gigabyte of DNA.

Large computer programs must be structured, they cannot be spaghetti code; otherwise they cannot be debugged and maintained. Software organisms, I suspect, can also benefit from such discipline. Then a useful mutation is likely to be small and localized, rather than involve many coordinated changes scattered throughout the organism, which is much less likely.

Software engineering practice has a lot of experience with large software projects, which may also be relevant to randomly evolving software organisms, and perhaps indirectly to biology.

Clearly, there is a lot of work to be done.

As Dobzhansky said, nothing in biology makes sense except in the light of evolution, and I think that a randomly evolving software approach can give us some insight.

Thank you very much!

[27 Dec 2009]