Evolutionary increases in information

By Wesley R. Elsberry

A recent development in anti-evolutionary argumentation is the contention that there are no possible ways in which "information" can increase via evolutionary processes. The usual phrasing is something like, "Show me an example of an increase in information in the genetics of any organism."

Usually, those asking the question do not provide the parameters for making a response. They do not say what definition of "information" they might find acceptable, and there are several possible meanings.

Fortunately, though, we can provide examples of "information increase" under several meanings of "information". Some are provided below. The onus is now on those claiming that no such increases are possible to show specifically what meaning of "information" they are using, and especially why their meaning of "information" should be considered to be useful in analyzing the phenomena of interest.

Information as a Shannon-Weaver measure.

Information theory as a scientific discipline really got started with Claude E. Shannon's seminal paper in the Bell Technical Journal back in 1948. Shannon's insight was to separate "information" from "meaning" and to provide a quantifiable measure of information based upon the frequency of symbols in a "message" or "sentence". This resulted in useful applications of the concept in communications, coding, and analysis of data in many scientific disciplines. In other words, Shannon's conception of information is widespread in its usage and rigorous in its formulation.

Shannon's measure. Shannon's equation finds the "entropy" of a message or sentence. Shannon used the term "entropy" on the advice of John von Neumann, who said that no one knew what that meant, and thus Shannon would not be criticized for its deployment. The Shannon measure does utilize logarithms much like the thermodynamic equation that measures entropy, so there is at least one point of correspondence to the term from physics and chemistry.

H = -k sum_i (p_i * log_2(p_i))

H is the "negentropy", our relevant information measure. This measure yields the expected amount of information per symbol of a message from the analyzed ensemble.

k is a positive parameter that may change depending upon the context of the measurement. k will be constant within a series of measurements of the same system.

The logarithm is base 2, as information's basic measure is in bits (binary digits), and thus the negentropy H gives us information imeasured n units of "bits per symbol".

p_i is the probability of the ith symbol appearing within a message.

Interesting features of the Shannon information measure: It reaches a minimum of 0 for a message with only one symbol found within it, which can be repeated any number of times. It reaches a maximum value when there is more than one symbol and each symbol occurs as frequently as every other symbol. It increases with increasing length of message, given that more than one symbol is present.

Tetraploidy in orchids as an increase in Shannon information. Orchids sometimes have offspring whose genome doubles in size from that of the parent plant. This process of doubling a diploid genome is called tetraploidy. The tetraploid offspring are reproductively isolated from their diploid parent species. The value of H, our measure of bits per symbol, does not change between diploid parent species and tetraploid daughter species. But the number of symbols has doubled. The information content of the tetraploid daughter species is strictly larger than that of the diploid parent species.

Algorithmic Information measures.

Another formal approach to measuring information has been given by Kolmogorov and Chaitin. Known as "algorithmic information", the basic concept is that the information content of a message is equivalent to the information content of the minimal Turing machine program which can produce that message. Commonly implemented computer languages can be related to Turing machine equivalents via a factor.

One difficulty in applying a Kolmogorov-Chaitin algorithmic complexity measure is that such measures are theoretically uncomputable. One cannot know with certainty that any proposed Turing machine program or equivalent is actually the minimal length program. Consider two streams of digits which meet basic tests for randomness. In one case, the stream is based upon some random source, like a scintillation counter measuring radioactive decay, and in the other the basis is a pseudo-random number generator (PRNG). The first has a high algorithmic complexity, but the second's algorithmic complexity is actually only as much as the length of the Turing machine program to instantiate the PRNG. If we are not aware, though, that a PRNG is involved, we might not correctly quantify the algorithmic information in the second case. This problem is general in attempting to use algorithmic complexity mesures based upon Kolmogorov-Chaitin.

Alternative formulations of algorithmic information measurement exist, though, which take a computable measure as their basis. Jeffrey Shallit at the University of Waterloo has proposed such a measure, based upon application of a compression algorithm. The less compressible a message is, the higher the information content.

Tetraploidy in orchids and computable algorithmic information measures. When one takes the example of a tetraploid orchid and looks at a computable algorithmic information measure, the conclusion is that the tetraploid daughter species has more information in the genome than the diploid parent. This can be seen by simple experimentation with your compression algorithm of choice. Start with a base file. Compress and note the resulting size. Make a derived file that is composed of two copies of the original file. Run the compression algorithm again. Note that the resulting file size is strictly larger than that for the original file. Information content has increased by the computable algorithmic information measure.

A layman's interpretation of "information".

Common use of information is almost always conflated with "meaning". Information is this sense has to do with the consequences of a change in that information. A traffic light at an intersection provides information to drivers on how they should behave. Adding a "No turns on red" sign modifies the meaning, and thus the information under common usage. The information can be seen to have increased in this case, for the driver approaching such an intersection now knows not only do the usual rules apply there, but also a restriction upon turns during the "red" phase of the traffic light.

Information measure for common usage. There is no rigorous information measure to be employed for common usage of "information". Deploying an example of an increase in information under common usage involves the use of an argument to show that the example conforms to the expectations of that usage.

Tetraploidy in orchids as an increase in "layman's information". Tetraploid orchids carry double the number of chromosomes found in the parent population. They also tend to be larger and have more robust structures. Because the physical appearance of the tetraploid orchid has changed from what the parent looked like, we know that the information which specified its phenotype also has changed. Now, we only need to figure out whether that change corresponds to an increase or a decrease in information. Because the change is accomplished by copies of information, the information can be seen to have increased rather than to have decreased. Consider an analogy to a workout program. Joe gets told to follow the instructions on a page listing various exercises, and to check off each one as he does it. On one day, Joe find five items on his page and does them. The next day, Joe finds that the page has ten items, where the first five are repeated twice, and he does those in the order given. The information content of Joe's exercise program doubled, even though the instructions included repeats. In the same way, the tetraploid orchid suddenly has two loci and four alleles which affect each trait where its parents only had one locus and two alleles. (Each trait could thus be "polygenic".) The tetraploid orchid's internal processes now perform more work in translating and instantiating the instructions of the genome. It makes a difference in the way the tetraploid orchid looks, though perhaps not in exactly the same way that Joe's longer workout makes Joe look different.

Page History

Page Created on 1999/10/25 by Wesley R. Elsberry
Page Maintained from 1999/10/25 to Present by Wesley R. Elsberry