[[[Former Title: Mathematics and Biology]]]
Mathematics, Biology and Metabiology
G. J. Chaitin, IBM Research
In Memoriam Jacob T. "Jack" Schwartz (1930-2009)
[Lecture given at the Facultad de Ciencias Exactas of the University of Buenos Aires, 14 May 2009,
at the IBM
T. J. Watson Research Center,
Yorktown Heights, 5 June 2009,
and at the Frege Centre for Structural Sciences of the University of Jena, 20 June 2009.
The author thanks Veronica Becher, Pablo Jacovkis, Gustavo Stolovitzky, Olaf Dreyer,
David B. Searls,
and other members of the Buenos Aires, IBM and Jena audiences
for their helpful comments.]
It would be nice to have a mathematical understanding of basic biological concepts
and to be able to prove that life must evolve in very general circumstances.
At present we are far from being able to do this.
But I'll discuss some partial steps in this direction
plus what I regard as a possible future line of attack.
Introduction: Goals of Our Theory
The Nature of Biological Information
- Algorithmic information theory has something to do with biology.
This theory is based on looking at the size in bits of the smallest
program to calculate something.
This is like DNA, which is a kind of program for constructing and running an organism.
Program-size complexity is a better model for biological information than Shannon entropy
even though it ignores run-time
and 9 months is already a long pregnancy let alone 90 years.
It is more or less the case that more complicated organisms have more DNA.
Each base in DNA = two bits of software.
Human = 3 × 109 bases = 6 × 109 bits of software.
- Biology is not predictive: these bits of information are frozen accidents.
- Physics is based on simple, elegant equations:
Theory of Everything on a T-shirt!
- Biology is the domain of the complicated:
No simple equation for your spouse or an ecosystem.
- Pure math is more like physics than like biology, right?
- Pure math contains infinite irreducible complexity:
the bits of the halting probability Ω.
- (Ω is like the DNA for pure math, because
it gives in
compressed form the answer to all individual cases of
DNA as Digital Software (Jack!), Software Organisms
When I visited Jack at the Cold Spring Harbor Laboratory where he was working
on molecular biology,
he emphasized to me that I should just think of DNA
as digital software.
Let's take this idea and run with it: How about modeling organisms as
How about studying random walks in software space?
Two important places where thinking of organisms as digital software is
- Missing Intermediate Forms:
Small changes in software produce large changes in output!
(Amusing example: Humans and Chimps have almost the same DNA.)
- Major Transitions in Evolution such as from
Single-Celled Organisms to Multi-Cellular Organisms:
Just make the main program into a subroutine!
- Population is a single software organism that computes a positive integer.
In order for them to evolve, we need to give our organisms something challenging to do.
- Fitness of the organism is the size of the integer it computes: the bigger the better.
This is a version of the Busy Beaver Problem and is equivalent to Turing's Halting Problem.
These are problems requiring an unlimited amount of mathematical creativity.
There is no mechanical procedure for solving them.
- We make a random mutation on our current organism and throw the result away
unless it computes a bigger positive integer.
- If it computes a bigger integer, the mutated organism replaces our current organism.
We are doing hill-climbing in a biological way, not by exhaustive search.
N → N + N → N × N → NN → NNN.
f(N) → f(N+1) → f2(N+1) → fN(N+1).
(With a specific function f and integer N.)
To avoid getting stuck on a local maximum, there must be a nonzero
probability to go from any program to any program.
But most likely only a single bit is changed, deleted or inserted.
This random walk must be ergodic, that is, cover the entire space of possible programs.
We will tend to evolve concise algorithmic descriptions of extremely large numbers.
- The fitness = size of the integer will grow faster than any computable function of the time
(the time = the total number of mutations that have been considered),
and will also grow faster than any computable function of the size of the program.
- This shows that creativity is taking place.
- If there were no creativity, if changes were made mechanically, the fitness would be a computable function of the time and size.
- More precisely, in time growing exponentially in N the fitness
the Busy Beaver function of N bits, BB(N bits),
the biggest positive integer that can be named by a ≤ N-bit program,
and which grows super-exponentially.
Problem with this proof that evolution works:
It shows that this process is creative, but it doesn't show that it's cumulative.
The proof shows that evolution will progress, if necessary, by starting over
again and again.
I want to show that this very same model will work cumulatively,
not by starting over again and again.
And that large positive integers will be named using a recursive hierarchy
of fast-growing function definitions.
I can't do this yet, but I think I can see how it might be done. [See the analogy below]
This could perhaps enable us to improve our estimate of the rate of evolution.
What Next? The Following Analogy U is to V as X is to Y:
- Darwin's Theory of Evolution of Living Organisms:
- Artificial Selection by Animal and Plant Breeders versus
- Natural Selection, Survival of the Fittest.
- Proposed Theory of Evolution of Software Organisms:
- Evolution of Large Man-Made Software Projects, Maintainability versus
- Evolvability of Large Randomly-Mutating Software Organisms.
- Desirable Traits for Maintainability and Evolvability of Software:
Orthogonal Design, Levels of Abstraction, Encapsulation.
- For both maintainability of man-made software and evolvability of mutating software;
it's important that a change needs to be made in only one limited location.
- Otherwise, man-made software is unmaintainable, and
evolving software will require simultaneous nonlocal coordinated
which is extremely unlikely.
- Evolution of Fortran Language:
Language features never discarded — upward compatibility.
Language contains its entire history — too expensive to start over.
Ontogeny recapitulates phylogeny.
- Development of a compiler for a new programming language:
I worked on such a project for IBM. Badly written code dies because
incomprehensible and it is no longer possible to fix bugs or to add new function.
Summary: What Does this Analogy Buy Us?
- Ontogeny recapitulates phylogeny —
Your Inner Fish!
[Ernst Haeckel, Jena]
The development of the embryo recapitulates the evolutionary history of the organism.
And as Fortran illustrates, large software projects contain their history.
- Hierarchical structure of living organisms = software levels of abstraction.
Makes software easier to write,
because programmer needs to
keep less information
in his/her head at any given moment.
Makes software evolve better when subjected to random mutations,
because useful mutations can be
small and localized,
instead of having to touch many places in the code.
- I think this discussion makes a convincing case that
a theory of the evolution
of randomly mutating software is possible,
and that random walks in software space are worth studying.
- How biologically relevant such a theory may be remains to be seen.
- But that it will be an interesting new field of mathematics is now plausible.
However, carrying out this research will require a great deal of work.
Maybe you would like to work on these problems?
- I propose calling this new field metabiology:
I hope that it will eventually develop into
a field parallel to biology, dealing with the random evolution of artificial software (computer programs)
rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems
or formulate heuristic arguments at the same high level of precision that is common in theoretical physics.
- Whether or not this happens, as the concepts of computation, information and complexity show,
mathematics is moving in a biological direction.
This trend will continue.
[Our random walk model was inspired by the stimulating critique
of Darwinian evolution in
The Devil's Delusion,
Crown Forum, New York, 2008. See especially pp. 192-195.
In a nutshell, model = Jack (digital software) + Berlinski (random walk) + Busy Beaver Problem.
Our model is an attempt to answer Berlinski's criticisms.]
[19 July 2009]