[[[Former Title: Mathematical Philosophy — Philosophy Done Mathematically]]]
Metaphysics, Metamathematics and Metabiology
A mathematical analysis of the scientific method, the axiomatic method,
and Darwin's theory of evolution
G. J. Chaitin, IBM Research
[gjchaitin_at_gmail.com]
In Memoriam Jacob T. "Jack" Schwartz (19302009)
[Outline of a course given August 2009 in the Programa de História das Ciências e das
Técnicas e Epistemologia (HCTE) at the Universidade Federal do Rio de Janeiro (UFRJ),
to be published as a book in Spanish by Midas (Valparaíso).
This was a threeweek course, with two twohour lectures each week.
There are links to handouts and also additional reading given for each topic.]
Sans les mathématiques on ne pénètre point au fond de la
philosophie.
Sans la philosophie on ne pénètre point au fond des
mathématiques.
Sans les deux on ne pénètre au fond de rien.
— Leibniz
[Without mathematics we cannot penetrate deeply into philosophy.
Without philosophy we cannot penetrate deeply into mathematics.
Without both we cannot penetrate deeply into anything.]

First Week: Metaphysics
Lecture 1 — Epistemology: What is a Scientific Theory?

Leibniz's Analysis of the Scientific Method
Discours de métaphysique, 1686, sections V and VI:

Continuous model: experimental data consists of real numbers.

Theory is an equation passing through the data points.

There is always such an equation!

Is a good theory only if the equation is simple, not if is very complicated.

Hermann Weyl's 1932 formulation of this:
If arbitrarily complicated laws are
permitted, then there is always a law, and the concept of law becomes vacuous!

So the concept of a scientific law subsumes the concept of complexity.
But as Weyl points out it is not clear how to measure the complexity of an equation!

The corresponding Algorithmic Information Theory (AIT) version:
[Solomonoff, Kolmogorov, Chaitin, 1960s]

Discrete model: theory and experimental data are now both finite strings of bits..

Theory is a program for calculating the observations and must be a compression.
In other words, the theory must have fewer bits than the data it explains.

Smallest ("elegant") program is best theory.
[Second week we will show that we cannot be sure we have the best theory.]

Measure complexity of theory/program in bits of software.

DNA/Software and Organism/Hardware:

DNA is software for calculating the organism.

Count bases instead of bits!

This gives us a very approximate way to measure the complexity of organisms,
in terms of the amount of information in their DNA.

[Important to get biology into this course right away since the last week is on biology.]

Miracles and Induction/Prediction

Simpler world is more perfect; need for miracles indicates imperfections.

Simplest theory is best.

What to expect after a billion repetitions of 0?
Or after a trillion repetitions of 01?
Suddenly changing the pattern is more complicated.

Does Complexity Depend on Your Choice of Language?

Goodman: Green/Blue, Grue paradox

The Search for the Perfect Language

Optimal, Concise, Succinct Universal Turing Machines: U(π_{C} p) = C(p)

Practical applications: Machine learning, Bayesian/nonparametric statistics
Lecture 2 — Ontology: The Labyrinth of the Continuum

Is the Universe Discrete? Quantum Gravity, Thermodynamics of Black Holes, and the Holographic Principle (Hawking, Bekenstein, Susskind, t'Hooft) suggest that a physical system can
contain at most a finite number of bits of information, which in fact grows
as the surface area of the boundary of the system, not as the volume of the physical system.
[So if this is correct, then threedimensional physical systems are in a sense actually
twodimensional, whence the term "holographic" principle.]

AIT's discrete model of scientific theories works better if the universe is discrete.

In addition to physical arguments against continuity and in favor of discreteness,
there are also mathematical arguments, basically because real numbers contain
an infinite amount of information.
These mathematical arguments may be regarded as modern versions of Zeno's paradoxes
(the paradox that Achilles cannot catch the tortoise, and of the immobile arrow in flight).

[Hermann Weyl, who was the first to discover the importance of Leibniz's ideas on complexity in the Discours de métaphysique,
in the same 1932 book where he presents this, The Open World, gives a
modern Zeno argument. If you really believe that time is infinitely divisible, then
you can perform an infinite number of computations by performing the first step of the computation in 1/2
a second, the second step in 1/4 of a second, the third step in
1/8 of a second, and so forth and so on, so that an infinite number of steps are completed in exactly 1 second! And nobody thinks this is possible. I began studying Leibniz after reading Weyl's The Open World.]

Three [Four!] Proofs
of the Existence of Uncomputable Real Numbers:
 Borel's 1927 knowitall, oracle real:
You need to number all possible Yes/No questions and then you use the Nth digit of a real number
to answer the Nth question.
[Give Borel's simple original version in which each digit answers one question, and also
the more complicated rigorous version in which the Nth digit tells us about the Nth string of symbols in the English alphabet (as
given in my Conversations with a Mathematician).
In the rigorous version of Borel's number,
the Nth digit tells us if the Nth text is a syntactically valid English text, then if
it is a Yes/No question, then if it has an answer ("Is the answer to this question No?" has
no Yes/No answer), then whether the answer is Yes or No. This avoids the problem
with Borel's original formulation of having to be able to enumerate all possible Yes/No questions
(some of which don't even have an answer). It is much easier to enumerate all the
possible finite texts that can be written using a finite set of characters.]
 Turing's 1936 proof:
(Cantor's diagonal method applied to the list of all computable reals)
By construction, the
Nth digit of Turing's uncomputable real is different from the Nth digit of the Nth computable real.
[Borel's 1927 oracle real is much more uncomputable than the uncomputable real
exhibited by Turing in 1936 using Cantor's diagonal method! But Borel does not
give a formal definition of what a computable real number is. That was first done by
Turing in 1936. Borel and Turing unfortunately seemed unaware of each other's work.]
 Probabilistic Proof:
(Proof in the spirit of nondenumerability of the continuum argument
in Courant and Robbins, What is Mathematics?)
Cover the Nth computable real with an interval of size ε/2^{N},
giving a covering of all computable reals with total size ≤ ε × (1/2 +
1/4 + 1/8 + ...) = ε,
which can be made as small as desired by taking ε sufficiently small.
Reals are uncomputable with probability one!
Most real numbers are unreal!
[Indeed, Borel, Les nombres inaccesibles, 1952, points out that with probability one,
a real cannot even be named uniquely (let alone computed).]
 Fourth Proof:
[We will encounter one more uncomputable real number next week, namely the halting probability Ω. It is maximally uncomputable, maximally unknowable.]
Additional Reading:

Nadler, The Best of All Possible Worlds: A Story of Philosophers, God and Evil,
pp. 132133
 Smolin, Three Roads to Quantum Gravity

Chaitin, Thinking about Gödel and Turing: Essays on Complexity, 19702007
[Contains the first three handouts and much more besides.]
Second Week: Metamathematics
Lecture 3 — Limitations of Formal Axiomatic Theories
 Hilbert on formal axiomatic theories and his belief in a TOE (Theory of Everything) for pure mathematics:
Mechanical procedure for checking proofs.
Absolute certainty! Truth is objective, not subjective.
 Important Note:
Running through all possible proofs and checking which ones are correct gives you a mechanical procedure for obtaining all
the theorems in your theory!
So to determine the truth of any assertion A, just calculate until you find a proof of A or
a proof that not A!
This goes back to the medieval search for the perfect language giving access to absolute
truth (and magical powers!) recounted in Umberto Eco's book The Search for the Perfect Language, including Ramon Llull's mechanical combinations of
concepts and Leibniz's characteristica universalis in which an error of reasoning is
an error in calculation.

Let's measure the complexity of a formal axiomatic theory by the size in bits of the
smallest program for generating all the theorems. An Nbit theory is one that requires
an Nbit program to generate all its theorems.
 You can't prove that a program is elegant!
More precisely,
you need an Nbit theory in order to be able to prove that an Nbit program is elegant.
Therefore a formal axiomatic theory can only prove that finitely many programs are
elegant.
[An elegant program is the best theory for its output. More precisely, a program is elegant if no smaller program
written in the same language produces the same output. For every possible output there is at least one
elegant program, therefore there are infinitely many elegant programs.]

Proof: Consider the paradoxical program P that produces the same output as the
first provably elegant program Q with the property that Q is larger in size, has more bits
than P itself does. Q cannot exist, for otherwise we get a contradiction.
 First corollary: The unsolvability of the Halting Problem.
[Question of whether a selfcontained, inputless program will calculate forever
or will eventually stop.]
 Second corollary: The Busy Beaver function grows faster than any computable function.
[Also give direct proof of this. We will use the BB function next week.]
[The BB function of N is the largest positive integer you can name using a program having
N bits or less.]
 Philosophical and practical consequences: No TOE (Theory of Everything) for pure
mathematics, No absolute truth, No certainty, A "quasiempirical" view of math [Lakatos; heuristic proofs; math is not equal to physics, but perhaps it is closer to physics than most people think], Experimental mathematics.

[Despite the fact that there can be no TOE for pure mathematics as Hilbert hoped,
mathematicians remain enamored with formal proof. See the
special issue on formal
proof of the AMS Notices, December 2008.]

[Another ironical fact: Although there is no "perfect language" for mathematical
reasoning (even though ZermeloFraenkel set theory is very popular), there are
universal languages for computing, for calculating. Instead of formal systems enabling us to prove all truths, we have found formal languages for expressing all possible algorithms. This is where Hilbert's formalist
dream has been most successful, not where he expected that it would be.
It became computer technology instead of providing mathematics with absolute
certainty. Indeed, I would argue that the best, most perfect programming languages
are the concise, optimal universal Turing machines used in AIT, because the
fact that U(π_{C} p) = C(p) means that these languages are
maximally expressive.
Furthermore, viewed from the perspective of the Middle Ages, programming languages do
give us the Godlike power to breathe life into (some) inanimate matter:
hardware ≈ body, software ≈ soul!]
Lecture 4 — The Halting Probability Ω

Go from Borel's 1927 knowitall real, to a real whose Nth bit solves the halting problem
for the Nth computer program (which is an extremely redundant oracle for the halting
problem), to the halting probability Ω (which is an extremely concise oracle for
the halting problem).
Key idea: Given N programs, can tell which ones halt if know how many of them halt, and this
is about log_{2} N bits of information, not N bits of information.
As before, our universal Turing machine must have suitably concise, succinct programs:
U(π_{C} p) = C(p).
Furthermore, now programs must be selfdelimiting.
In other words, programs are halves, quarters, eighths, etc. of the unit interval, so the total
measure assigned to all programs is ≤ 1 and we can talk about the probability that
programs do something.

If you could know the first N bits of the numerical value of Ω, that
would enable you to solve the halting problem for all programs up to N bits in size.

Ω is not just uncomputable, it is maximally uncomputable, completely incompressible.

Ω shows that there is
infinite irreducible complexity in pure mathematics.

You need an Nbit theory in order to be able to determine N bits of Ω.

The bits of Ω are necessary truths, but they appear to be contingent, accidental truths,
they appear to be true for no reason, no reason simpler than themselves.
Thus they refute Leibniz's principle of sufficient reason, which states that anything that's true
is true for a reason.

Ω can be viewed pessimistically, as a limit to knowledge, but it can also be
viewed optimistically, as a proof that mathematics is not mechanical and requires
creativity. Indeed, in a sense
Ω is concentrated, distilled, compressed mathematical creativity. And we can measure the power of a formal axiomatic theory
by how many bits of Ω it can determine.
[A similar measure of the power of a formal
axiomatic theory: the size in bits of the smallest (selfcontained, inputless) program for which it can neither be proved that it halts nor that it fails to halt.
A formal axiomatic theory can easily
verify that every program that halts does so. And it can also prove in infinitely many cases
that programs will never halt. But there will always be infinitely many programs that fail
to halt for which our formal axiomatic theory cannot enable us to prove this.]

Pure Mathematics versus Physics & Biology:
Math has infinite complexity and in this precise sense is therefore closer to biology than to physics, where there is still hope of a TOE, a simple set of equations for everything.
[Even though in the previous lecture we suggested that perhaps math should be
done a little bit more like we do theoretical physics.]
[This leads us right into biology and Week 3, the attempt to give a mathematical proof that
Darwin's theory of evolution works, that the complexity of organisms will increase.]
Additional Reading:

Tymoczko, New Directions in the Philosophy of Mathematics
[A defense of quasiempiricism.]

Borwein and Devlin, The Computer as Crucible: An Introduction to Experimental Mathematics

Wolfram, A New Kind of Science
[A massive piece of experimental mathematics: the systematic exploration of
the computational universe.]

Chaitin, Meta Math! The Quest for Ω
[A general reference for the first two weeks of this course, with more on the history of Ω.]
Also in Portuguese:

Eco, The Search for the Perfect Language
[To be read together with Meta Math!:
The prehistory of the quest for absolute knowledge.]
Also in Portuguese:
Third Week: Metabiology
Lecture 5 — Problems with Darwinism that may be solved by a Digital Software Organism Toy Model

Mathematics is very helpful in many areas of physics, but not too helpful in biology.
Biology is terra incognita as far as mathematics is concerned.
Can this be fixed?

Introduction: Summarize material on biology in Lectures 1 and 4.
[DNA as digital software, measuring biological information, proof that math is closer to biology than to physics]

Our methodology: Metabiology is a very theoretical theory!
[So was our analysis of the scientific method. But our analysis of the axiomatic method
was fairly realistic.]

Our goal: To prove mathematically that Darwinian evolution works, that organisms will increase in complexity, by studying an
enormously simplified toy model.
[Picasso — theories are lies that help us to see the truth!]

[We are most definitely not trying to perform detailed simulations of actual biological systems
so that we can predict their behavior as in systems biology.]

Main idea: To exploit the analogy between natural (DNA) and artificial (computer program) software.
This seems
helpful for dealing with the following Darwinian perplexities:

Missing Intermediate Forms — No problem, small change in program can produce large change in output.

The Major Transitions in Evolution — No problem, going from singlecell organisms to multicellular
organisms is just the idea of making the main program into a subroutine.
Furthermore, note the similar
hierarchical structure of living organisms and of large pieces of humanmade software.

Increase in Complexity — Could this be a kind of software bloat? Is it related to the fact that ontogeny (more or less) recapitulates phylogeny?
In large software projects it is easier to add new stuff than to change old stuff.
Large natural (DNA) and artificial (computer program) pieces of software contain their own history!

Emergence of Concise Encodings — Why is there DNA? This leads to greater evolvability if we are doing point
(bitlevel) mutations.
Lecture 6 — Evolution of Randomly Mutating Software, Random Walks in
Software Space

Population of one organism, which is a program that calculates a positive integer

Try mutations at random, use oracle for halting problem to avoid ones that never halt,
pick organism that produces the largest positive integer (BB problem!)

I can prove that the functional complexity will increase indefinitely.
This is easy to do, too easy, in fact.
On the positive side, the useful complexity increases; this is not just software bloat.
We are not just getting the complexity to increase by adding junk to the program.
But I do not know
how to prove that ontogeny recapitulates phylogeny, as it were, even though I conjecture this is
usually the case. I think we need to do that in order for this approach to have any biological relevance or significance. In my proof that the functional complexity will increase,
evolution just starts over, it is not cumulative, which does not seem at all biological.

Behavior of model depends on choice of programming language for digital organisms,
whether mutations are at a highlevel or machine language bit level, and on the
probability distribution of the possible mutations.

Some sort of model like this should work. There is much to be done!
I think this randomly mutating software approach does make Darwinian evolution
more understandable.

I envision metabiology as a field parallel to biology, dealing with the random
evolution of artificial software (computer programs) rather than
natural software (DNA), and simple enough that it is possible to prove rigorous
theorems or formulate heuristic arguments at the same high level of precision that is
common in theoretical physics.
Additional Reading:

Berlinski, The Devil's Delusion: Atheism and its Scientific Pretensions
[An incisive discussion of gaps in Darwin's theory.]

Shubin, Your Inner Fish: A Journey into the 3.5BillionYear History of the Human Body
[How ontogeny recapitulates phylogeny.]
[Additional Things to Think About]
Can compare evolvability of two programming languages A, B with the same source
of mutations using largest positive integer fitness function:
Language A has greater evolvability than language B if with high probability
random mutations make positive integer that is calculated grow at faster rate
when using language A instead of language B.
A key point in evolutionary theory is the existence of paths leading from low
complexity to high complexity organisms passing through a series of small evolutionary
steps each of which is slightly advantageous. In other words, there must be
a chain of plausible intermediate forms.
Example of an interesting path in software space:
Organisms O_{N} consist of a prefix followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed].
Prefix sees how long it takes the computable lower bounds on Ω to get
the first N bits right. This is the smallest K such that (the sum of 1/2 raised to the size in bits of
each program ≤ K bits in size that halts in ≤ K seconds) reproduces the N bits of Ω that we are given. These organisms O_{N} are a very small mutation distance from
each other [md(O_{N}, O_{N+1}) is small], and the integer each one calculates grows faster than any computable
function of N. [Here we are using the programming language and mutation model in Handout #6.]
[[[Here is a better version of this. As before, organism O_{N} consists of a prefix
followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed].
This time the prefix π uses these bits of Ω in the standard way to determine each
selfdelimiting program p ≤ N that halts, and then the greatest positive integer K
with programsize complexity H(K) ≤ N,
which grows faster than any computable function of N.
If we can protect this prefix π from mutations, there is fixedsize mutationdistance neighborhood
{X: md(X, O_{N}) ≤ C}
of each of these organisms in which the fittest organism will be another organism in this sequence
O_{N+C'} that
calculates a bigger positive integer. Assuming we have an oracle for the halting problem,
this gives us a deterministic evolutionary algorithm instead of a random walk model as in Lecture 6 above.
In effect, we are calculating Ω bit by bit.
This alternative model exhibits historicity,
i.e., organisms contain their evolutionary history. But it is unsatisfactory from a biological
point of view because there are no random mutations and because our software organisms do not exhibit the deeplynested
hierarchical structure that is such a conspicuous feature of biological organisms.
Whether organisms can shield part of their DNA from mutations is also a relevant issue,
although allowing this as an option is certainly possible when we are dealing with
artificial software organisms. If DNA can be shielded when we have random mutations,
then we cannot expect to prove that evolution occurs with probability one, only that it
occurs with positive probability. [The random walk model in Lecture 6 evolves with probability one.]
Perhaps this example shows that evolution is too easy if we have an oracle for the halting problem?
How about a version of my randomwalk model for evolving programs which name large positive integers
that works without an oracle?
Maybe then we will get the desired deeplynested hierarchical structure?
Is there any connection with primitive recursive functions and subrecursive hierarchies?
With work by Rózsa Péter?
Must think about this.]]]
Call an infinite sequence of software organisms O_{K} (K = 1, 2, 3, ...)
an evolutionary history if the
fitness function (positive integer that is output) grows and the mutation distance
between successive organisms md(O_{K}, O_{K+1}) is bounded.
Note that the optimal universal Turing machines used in AIT have the property
that evolutionary histories are preserved when a fixed prefix π_{C} is
added at the beginning of each organism O_{K} in the evolutionary history,
because outputs are the same and mutation distances are still bounded.
More precisely, a universal Turing machine U has the most evolutionary histories,
because an evolutionary
history for any Turing machine C will also be an evolutionary history for U.
And any two universal Turing machines U and U' will have exactly the same set of evolutionary histories.
[Again, here we are using the programming language and mutation model in Handout #6.]
[20 August 2009]