Needless to say, these simple remarks cannot suffice to solve the problem of biological order. One would like not only to establish that the second law (dS 0) is compatible with a decrease in overall entropy (dSi 0), but also to indicate the mechanisms responsible for the emergence and maintenance of coherent states.2Without a doubt, the atoms and molecules which comprise living cells individually obey the laws of chemistry and physics, including the laws of thermodynamics. The enigma is the origin of so unlikely an organization of these atoms and molecules. The electronic computer provides a striking analogy to the living cell. Each component in a computer obeys the laws of electronics and mechanics. The key to the computer's marvel lies, however, in the highly unlikely organization of the parts which harness the laws of electronics and mechanics. In the computer, this organization was specially arranged by the designers and builders and continues to operate (with occasional frustrating lapses) through the periodic maintenance of service engineers.
All these features bring the scientist a wealth of new problems. In the first place, one has systems that have evolved spontaneously to extremely organized and complex forms. Coherent behavior is really the characteristic feature of biological systems.3In this chapter we will consider only the problem of the origin of living systems. Specifically, we will discuss the arduous task of using simple biomonomers to construct complex polymers such as DNA and protein by means of thermal, electrical, chemical, or solar energy. We will first specify the nature and magnitude of the "work" to be done in building DNA and enzymes.
[NOTE: Work in physics normally refers to force times displacement. In this chapter it refers in a more general way to the change in Gibbs free energy of the system that accompanies the polymerization of monomers into polymers].In Chapter 9 we will describe the various theoretical models which attempt to explain how the undirected flow of energy through simple chemicals can accomplish the work necessary to produce complex polymers. Then we will review the experimental studies that have been conducted to test these models. Finally we will summarize the current understanding of this subject.
[NOTE: A sufficient explanation for the origin of life would also require a model for the formation of other critical cellular components, including membranes, and their assembly].
[NOTE: H.P. Yockey, personal communication, 9/29/82. Meaning is extraneous to the sequence, arbitrary, and depends on some symbol convention. For example, the word "gift," which in English means a present and in German poison, in French is meaningless].Only certain sequences of letters correspond to sentences, and only certain sequences of sentences correspond to paragraphs, etc. In the same way only certain sequences of amino acids in polypeptides and bases along polynucleotide chains correspond to useful biological functions. Thus, informational macro-molecules may be described as being and in a specified sequence.5 Orgel notes:
Living organisms are distinguished by their specified complexity. Crystals such as granite fail to qualify as living because they lack complexity; mixtures of random polymers fail to qualify because they lack specificity.6Three sets of letter arrangements show nicely the difference between order and complexity in relation to information:
1. An ordered (periodic) and therefore specified arrangement:Yockey7 and Wickens5 develop the same distinction, that "order" is a statistical concept referring to regularity such as could might characterize a series of digits in a number, or the ions of an inorganic crystal. On the other hand, "organization" refers to physical systems and the specific set of spatio-temporal and functional relationships among their parts. Yockey and Wickens note that informational macromolecules have a low degree of order but a high degree of specified complexity. In short, the redundant order of crystals cannot give rise to specified complexity of the kind or magnitude found in biological organization; attempts to relate the two have little future.
THE END THE END THE END THE END[NOTE: Here we use "THE END" even though there is no reason to suspect that nylon or a crystal would carry even this much information. Our point, of course, is that even if they did, the bit of information would be drowned in a sea of redundancy].Example: Nylon, or a crystal. structure.
2. A complex (aperiodic) unspecified arrangement:
AGDCBFE GBCAFED ACEDFBG
Example: Random polymers (polypeptides).
3. A complex (aperiodic) specified arrangement:
THIS SEQUENCE OF LETTERS CONTAINS A MESSAGE!
Example: DNA, protein.
where S is the entropy of the system, k is Boltzmann's constant, and
corresponds to the number of ways the energy and mass in a system may be
We will use Sth and Sc to refer to the thermal and configurational entropies, respectively. Thermal entropy, Sth, is associated with the distribution of energy in the system. Configurational entropy Sc is concerned only with the arrangement of mass in the system, and, for our purposes, we shall be especially interested in the sequencing of amino acids in polypeptides (or proteins) or of nucleotides in polynucleotides (e.g., DNA). The symbols th and c refer to the number of ways energy and mass, respectively, may be arranged in a system.
Thus we may be more precise by writing
Determining Information: From a Random Polymer to an Informed Polymer
If we want to convert a random polymer into an informational molecule, we can determine the increase in information (as defined by Brillouin) by finding the difference between the negatives of the entropy states for the initial random polymer and the informational molecule:
In this equation, I is a measure of the information content of an aperiodic
(complex) polymer with a specified sequence, Scm represents the
configurational "coding" entropy of this polymer informed with
a given message, and Scr represents the configurational entropy
of the same polymer for an unspecified or random sequence.
[NOTE: Yockey and Wickens define information slightly differently than Brillouin, whose definition we use in our analysis. The difference is unimportant insofar as our analysis here is concerned].Note that the information in a sequence-specified polymer is maximized when the mass in the molecule could be arranged in many different ways, only one of which communicates the intended message. (There is a large Scr from eq. 8-2c since cr is large, yet Scm = 0 from eq. 8-2c since cm = 1.) The information carried in a crystal is small because Sc is small (eq. 8-2c) for a crystal. There simply is very little potential for information in a crystal because its matter can be distributed in so few ways. The random polymer provides an even starker contrast. It bears no information because Scr, although large, is equal to Scm (see eq. 8-3b).
where a decrease in Gibbs free energy for a given chemical reaction near
equilibrium guarantees an increase in the entropy of the universe as demanded
by the second law of thermodynamics.
Now consider the components of the Gibbs free energy (eq. 8-4b) where the change in enthalpy (H) is principally the result of changes in the total bonding energy (E), with the (P V) term assumed to be negligible. We will refer to this enthalpy component (H) as the chemical work. A further distinction will be helpful. The change in the entropy (S) that accompanies the polymerization reaction may be divided into two distinct components which correspond to the changes in the thermal energy distribution (Sth) and the mass distribution (Sc), eq. 8-2. So we can rewrite eq. 8-4b as
It will be shown that polymerization of macromolecules results in a decrease
in the thermal and configurational entropies (Sth 0, Sc
0). These terms effectively increase G, and thus represent additional
components of work to be done beyond the chemical work.
Consider the case of the formation of protein or DNA from biomonomers in a chemical soup. For computational purposes it may be thought of as requiring two steps: (1) polymerization to form a chain molecule with an aperiodic but near-random sequence, and (2) rearrangement to an aperiodic, specified information-bearing sequence.
[NOTE: Some intersymbol influence arising from differential atomic bonding properties makes the distribution of matter not quite random. (H.P. Yockey, 1981. J. Theoret. Biol. 91,13)].The entropy change (S) associated with the first step is essentially all thermal entropy change (Sth), as discussed above. The entropy change of the second step is essentially all configurational entropy reducing change (Sc). In fact, as previously noted, the change in configurational entropy (Sc) = Sc "coding" as one goes from a random arrangement (Scr) to a specified sequence (Scm) in a macromolecule is numerically equal to the negative of the information content of the molecule as defined by Brillouin (see eq. 8-3a).
If some of these symbols are redundant (or identical), then the number
of unique or distinguishable sequences that can be made is reduced to
where n1 + n2 + ... + ni = N and i defines
the number of distinct symbols. For a protein, it is i =20, since a subset
of twenty distinctive types of amino acids is found in living things, while
in DNA it is i = 4 for the subset of four distinctive nucleotides. A typical
protein would have 100 to 300 amino acids in a specific sequence, or N =
100 to 300. For DNA of the bacterium E. coli, N = 4,000,000. In Appendix
1, alternative approaches to calculating c are considered
and eq. 8-7 is shown to be a lower bound to the actual value.
For a random polypeptide of 100 amino acids, the configurational entropy, Scr, may be calculated using eq. 8-2c and eq. 8-7 as follows:
The calculation of equation 8-8 assumes that an equal number of each
type of amino acid, namely 5, are contained in the polypeptide. Since k,
or Boltzmann's constant, equals 1.38 x 10-16 erg/deg, and ln
[1.28 x 10115] = 265,
If only one specific sequence of amino acids could give the proper
function, then the configurational entropy for the protein or specified,
aperiodic polypeptide would be given by
Determining scin Going from a Random Polymer to an Informed
The change in configurational entropy, Sc, as one goes from a random polypeptide of 100 amino acids with an equal number of each amino acid type to a polypeptide with a specific message or sequence is:
The configurational entropy work (-T Sc) at ambient
temperatures is given by
where the protein mass of 10,000 amu was estimated by assuming an average
amino acid weight of 100 amu after the removal of the water molecule. Determination
of the configurational entropy work for a protein containing 300 amino acids
equally divided among the twenty types gives a similar result of 16.8 cal/gm.
In like manner the configurational entropy work for a DNA molecule such as for E. coli bacterium may be calculated assuming 4 x 106 nucleotides in the chain with 1 x 106 each of the four distinctive nucleotides, each distinguished by the type of base attached, and each nucleotide assumed to have an average mass of 339 amu. At 298oK:
It is interesting to note that, while the work to code the DNA molecule
with 4 million nucleotides is much greater than the work required to code
a protein of 100 amino acids (2.26 x 10-7 erg/DNA vs. 1.10 x
10-11 erg/protein), the work per gram to code such molecules
is actually less in DNA. There are two reasons for this perhaps unexpected
result: first, the nucleotide is more massive than the amino acid (339 amu
vs. 100 amu); and second, the alphabet is more limited, with only four useful
nucleotide "letters" as compared to twenty useful amino acid letters.
Nevertheless, it is the total work that is important, which means that synthesizing
DNA is much more difficult than synthesizing protein.
It should be emphasized that these estimates of the magnitude of the configurational entropy work required are conservatively small. As a practical matter, our calculations have ignored the configurational entropy work involved in the selection of monomers. Thus, we have assumed that only the proper subset of 20 biologically significant amino acids was available in a prebiotic oceanic soup to form a biofunctional protein. The same is true of DNA. We have assumed that in the soup only the proper subset of 4 nucleotides was present and that these nucleotides do not interact with amino acids or other soup ingredients. As we discussed in Chapter 4, many varieties of amino acids and nucleotides would have been present in a real ocean---varieties which have been ignored in our calculations of configurational entropy work. In addition, the soup would have contained many other kinds of molecules which could have reacted with amino acids and nucleotides. The problem of using only the appropriate optical isomer has also been ignored. A random chemical soup would have contained a 50-50 mixture of D- and L-amino acids, from which a true protein could incorporate only the Lenantiomer. Similarly, DNA uses exclusively the optically active sugar D-deoxyribose. Finally, we have ignored the problem of forming unnatural links, assuming for the calculations that only CL-links occurred between amino acids in making polypeptides, and that only correct linking at the 3', 5'-position of sugar occurred in forming polynucleotides. A quantification of these problems of specificity has recently been made by Yockey.21
The dual problem of selecting the proper composition of matter and then coding or rearranging it into the proper sequence is analogous to writing a story using letters drawn from a pot containing many duplicates of each of the 22 Hebrew consonants and 24 Greek and 26 English letters all mixed together. To write in English the message,
we must first draw from the pot 2 Hs, 2 Is, 3 Es, 2 Ds, and one each
of the letters W, 0, G, T, and R. Drawing or selecting this specific set
of letters would be a most unlikely event itself. The work of selecting
just these 14 letters would certainly be far greater than arranging them
in the correct sequence. Our calculations only considered the easier step
of coding while ignoring the greater problem of selecting the correct set
of letters to be coded. We thereby greatly underestimate the actual configurational
entropy work to be done.
In Chapter 6 we developed a scale showing degrees of investigator interference in prebiotic simulation experiments. In discussing this scale it was noted that very often in reported experiments the experimenter has actually played a crucial but illegitimate role in the success of the experiment. It becomes clear at this point that one illegitimate role of the investigator is that of providing a portion of the configurational entropy work, i.e., the "selecting" work portion of the total -T Sc work.
It is sometimes argued that the type of amino acid that is present in a protein is critical only at certain positions---active sites---along the chain, but not at every position. If this is so, it means the same message (i.e., function) can be produced with more than one sequence of amino acids.
This would reduce the coding work by making the number of permissible arrangements cm in eqs. 8-9 and 8-10 for Scm greater than 1. The effect of overlooking this in our calculations, however, would be negligible compared to the effect of overlooking the "selecting" work and only considering the "coding" work, as previously discussed. So we are led to the conclusion that our estimate for Sc is very conservatively low.
Calculating the Total Work: Polymerization of Biomacromolecules
It is now possible to estimate the total work required to combine biomonomers into the appropriate polymers essential to living systems. This calculation using eq. 8-5 might be thought of as occurring in two steps. First, amino acids polymerize into a polypeptide, with the chemical and thermal entropy work being accomplished (H -T Sth). Next, the random polymer is rearranged into a specific sequence which constitutes doing configurational entropy work (-T Sc). For example, the total work as expressed by the change in Gibbs free energy to make a specified sequence is
where H - T Sth may be assumed to be 300 kcal/mole to
form a random polypeptide of 101 amino acids (100 links). The work to code
this random polypeptide into a useful sequence so that it may function as
a protein involves the additional component of T Sc "coding"
work, which has been estimated previously to be 15.9 cal/gm, or approximately
159 kcal/mole for our protein of 100 links with an estimated mass of 10,000
amu per mole. Thus, the total work (neglecting the "sorting and selecting"
work) is approximately
with the coding work representing 159/459 or 35% of the total work.
In a similar way, the polymerization of 4 x 106 nucleotides into a random polynucleotide would require approximately 27 x 106 kcal/mole. The coding of this random polynucleotide into the specified, aperiodic sequence of a DNA molecule would require an additional 3.2 x 106 kcal/mole of work. Thus, the fraction of the total work that is required to code the polymerized DNA is seen to be 8.5%, again neglecting the "sorting and selecting" work.
The Impossibility of Protein Formation under Equilibrium Conditions
It was noted in Chapter 7 that because macromolecule formation (such as amino acids polymerizing to form protein) goes uphill energetically, work must be done on the system via energy flow through the system. We can readily see the difficulty in getting polymerization reactions to occur under equilibrium conditions, i.e., in the absence of such an energy flow.
Under equilibrium conditions the concentration of protein one would obtain from a solution of 1 M concentration in each amino acid is given by:
where K is the equilibrium constant and is calculated by
An equivalent form is
We noted earlier that G = 459 kcal/mole for our protein of 101 amino acids.
The gas constant R = 1.9872 cal/deg-mole and T is assumed to be 298oK.
Substituting these values into eqs. 8-15 and 8-16 gives
This trivial yield emphasizes the futility of protein formation under
equilibrium conditions. In the next chapter we will consider various theoretical
models attempting to show how energy flow through the system can be useful
in doing the work quantified in this chapter for the polymerization of DNA
and protein. Finally, we will examine experimental efforts to accomplish