Books : reviews

Antoine Danchin.
The Delphic Boat: what genomes tell us.
Harvard University Press. 2002

rating : 3.5 : worth reading
review : 17 June 2010

This is a very densely written, and important book. It has taken me several months to read, on and off, and I am sure I've missed some of the points. But I have taken away much of interest. The theme is that biology is not merely a consequence of physics, but has a fundamentally important extra property: that of symbolic relationships. The title comes from the idea of relationships: a boat remains a boat if one (or indeed all) of its planks are replaced, even with planks made from different material: it is not a function of the objects that make it up, but of the relationships between those objects.

p174. What makes the Delphic boat float is the nature of the relationship between its planks, not their physicochemical nature. Whether they are made of oak or pine or aluminum or steel is irrelevant to their function.

p280. Our priority should be studying the relationships that make up life, rather than remaining at the level of the objects themselves; and we should do this by looking into the nature of whatever it is that gives these relationships their permanence.

The "symbolic" part is that in biology, those relationships can be arbitrary (for example, the genetic code). They are consistent with the laws of physics, of course, but in some sense independent of them. The encoding, the symbols used, could have been different, and so are not deducible from, or reducible to, physics. Indeed, they have more in common with ideas of computation, or information processing, and Danchin pursues the relationship between cellular mechanisms and Turing Machines to a sophisticated level.

The book is divided into five hefty chapters. The first two give the biological background of the various genome sequencing achievements, probably in more detail than you want unless you are a genome-sequencer. I would probably have given up sometime before chapter three (starting on p109), had I not been reading this because I'd read a shorter, fascinating paper by Danchin, and wanted to delve deeper. In summary, the first two chapters say that the genome is very complicated, very detailed, very big, that it has structure, but that structure is very messy, and that nearly everything you think you know about it (from kindergarten "Ladybird" biology to undergraduate studies) is untrue, by being highly and selectively oversimplified. Then we get on to the interesting stuff. Not that it gets any easier going, mind you. But stick with it; it's well worth the journey.

Danchin reiterates that biology cannot be understood and explained in terms of outdated mechanical physical concepts.

p109. life ... is not a mechanical process, and that even if we do not deny its deterministic character, what we can know about it does not enable us to predict its future. Life is simply the one material process that has discovered that the only way to deal with an unpredictable future is to be able to produce the unexpected itself.

Physics is about identical objects, but by the time biology is reached, distinguishability, identity, relationships become key.

pp246-9. Generally speaking, it is fairly easy to build up a picture of the physical world, and to explain it in terms of a combination of simple principles …, because physics is concerned with reproducible objects that cannot normally be distinguished from each other as individual entities. ...
    Chemistry is more complex than physics, and begins when atoms combine. …
    In chemistry, two individual examples of the same object are usually indistinguishable when they are observed under similar conditions. However, there is one particular characteristic, quite rare in physics but almost universal in chemistry, which clearly illustrates the importance of the relationships between the parts that make up the object in question. There are structures that are identical in every respect but their symmetry, and the link between chemistry and biology was formed after a distinct bias was observed in the symmetries of chemical products produced by living things. ...
    ... It is impossible to distinguish between two atoms of the same object, in the same state, but it is possible to distinguish between two individuals of the same species. A species is a population of individuals, a class of objects each with its own identity. This is true even for microbes like bacteria: a look at the way they swim will show that two individual bacteria, which look the same and are genetically identical, can very often be distinguished by their behavior. It is also true of cells, the "atoms" or units of life.

Biology are characterised not just by the (individual) objects involved, but by the relationships between those objects.

p131. ... in biology. No object exists in isolation---or if such objects do exist, it is less important to know them, because their isolation means that they have little to contribute to the phenomenon being studied. It is precisely relationships between objects that are at the heart of life. So we know in advance that, among the things we need to discover, there are relationships that have a particular form, whose implementation enables vital functions to be expressed, such as the regulation of gene expression. Of course we do not know exactly what these relationships are a priori, but we know that they do exist. We do not know what form they take, but we know that they demand a certain proximity between objects, whether in terms of space or time or other forms of mediation.

Once we realise that relationships are key, we can use this to make progress. We can exploit the structure of the relationships to investigate new possibilities, via (abstract) neighbourhoods of related objects.

pp130-1. Effective as it is, this hypothetico-deductive method has the drawback of being able to refine only knowledge that we already have, without giving us a way of forming hypotheses that are both new and pertinent. How can we find original ideas, but with an originality that is not alien to what we are studying? ... How can we advance inductively, how can we explore upstream, and not downstream as with deduction?
    ... We will consider only one approach, because it is particularly effective in the case of genomes: that of induction by exploring the neighborhood of the objects we want to consider. The idea behind this approach is that each object exists in relationship with other objects. ...

Here, Danchin takes the fundamental object to be the gene. Neighbourhoods are then "similar" genes. Importantly, similarity can be defined in a diverse range of abstract spaces. This is where biological intuition and knowledge can pays off: by focussing investigation in biologically "meaningful" such spaces.

pp132-4. Inductive exploration consists in finding all the neighbors of each given gene, as a starting point.
"Neighbor" is to be understood here in the broadest possible sense. It is not only a geometrical or structural notion. Each neighborhood will have its own particular light to throw on the gene of interest, and will provide clues for researching its function. ... One natural kind of neighborhood is proximity on the chromosome. ...
    The evolution of species proceeds by variation on ancestral themes. Consequently, many genes are descended from common ancestors, and just as children look like their parents, so genes, or more often their products, have points of resemblance. This is a rewarding kind of neighborhood to consider. ...
    There are many other ways of finding neighborhood. In particular, a gene may have been studied by researchers in laboratories all over the world. For one reason or another, the gene may have properties that have made these researchers associate them with other genes, so it is worth looking for a gene's neighbors in the sense that it is mentioned in their company in the scientific literature. ...
    A gene's similarity with others can also come from similar physicochemical characteristics of their products ... Similarities can be local rather than global, .... Similarity might also be a matter of the absence, rather than the presence, of certain motifs ... Giving free rein to the imagination can help us discover other kinds of neighborhood ... Neighborhood can be structural, if the products of different genes share the same cell compartment ... But there are also kinds of functional neighborhood. As the molecules involved in metabolism undergo interconversions, there are enzymes that are neighbors because they use the same substrate, produce the same product, or follow one another in a metabolic pathway.
    Finally there are more complex kinds of neighborhood, and studying these can bring particularly rewarding results. To take up the example ... of bias in the use of the genetic code, we find, for instance, that two genes can be neighbors because they use the code in the same way. It is interesting to study all the genes surrounding a given gene, in the cloud of points that describes the use of the genetic code in all the genes in that organism. When this is done, we begin to discover some very unexpected properties of genome texts.

This emphasis on relationships requires experimental setups where they can be investigated: setups where only the objects can be investigated are inadequate. In particular, spatial and structural relationships need to be considered. This has consequences:

p137. the genome text and its meaning are closely connected with an architecture, which is real even if it is minuscule. One consequence of the domination of biology by biochemistry, which favors the study of objects in isolation, has been to encourage an image of the cell as a miniature test tube. In this view, the concentration of molecules is seen as uniform, and the standard thermodynamic approach is normally used to measure the course of biochemical reactions, as if that were what happened in the cell. But this is very misleading.

The existence of spatial cellular architecture has consequences on the structure of the genome:

p151. there is a map of the cell in the chromosome. Genes are not randomly distributed in the genome text; their position relates to their mode of expression, depending on the nature of the environment, and to the location of their products in the different cell compartments.

The existence of temporal cellular architecture also has important consequences:

pp156-7. Up to now I have spoken only of the spatial organization of the cell, and of its very probable strong connection with the spatial organization of the genome. But of course we must add the time dimension to this. ... It takes a certain amount of time to transcribe or to translate a gene. ... Clearly, adding a section to be transcribed introduces a timing element, which can have an important effect on the cell's dynamics, simply because of its length, without the corresponding nucleotides' necessarily having any particular meaning. ... comparison of related genomes should reveal regions where the length is preserved, although the sequence is not.

If this book were only about the importance of relationships and functions, it would be interesting. But there is an additional crucial component. Not only should we think in terms of functions, but there is a level of indirection, a symbolic nature, to the way the material objects represent the functionality.

p110. life's exploration of reality has been based on symbolic transposition. Unlike physical or chemical objects, biological objects are more than just a site where actions occur; they represent functions. Very often they no longer correspond to them directly. ... The nucleic acids and the proteins, which are the very foundation of the objects, relationships, and processes that make up life, are made from completely different chemicals from each other, and the DNA of a gene that codes for the synthesis of an enzyme has absolutely no biochemical connection with that enzyme's function or shape.

pp123-4. I would like to reemphasize the arbitrary character of the association between a function and the control of its expression. This is a first level of an aspect we normally call "symbolic," when we are talking about human communication. This arbitrary, symbolic character allows the cell to manipulate associations situated at a high hierarchical level, between apparently unrelated functions. Life has made systematic use of this remarkable phenomenon. This is what makes it possible to introduce relationships between physical parameters, as well as chemical ones, into gene expression. ... This symbolic aspect is typical of the most important biological functions.
     … This model evokes a way of representing the world that is profoundly different from the way we usually account for the physical world. It adds abstract symbolic relationships to the objects of chemistry and physics. The difficulty of understanding this symbolic aspect explains why biology in general and what we call "molecular" biology in particular are the subject of so much misinterpretation and misunderstanding.

It is this symbolic abstraction that makes life not deducible from physics (although it is consistent with physics). It is more than mere physics. He has disparaging things to say about the current enthusiasm for self-organised complexity:

p174. What governs life is … absolutely not outside physics---it respects all its laws; but a law such as the genetic code cannot be simply an automatic consequence of the laws of physics. This is what I am summarizing when I say that it cannot be reduced to physics.
    … many modern thinkers … want life to be in itself an unavoidable consequence of things. This creates a very strong tendency to attempt to represent life not just as a possible and predictable result, but as an inevitable, logically derived consequence of the laws of physics. This reduction of life to the physicochemical world has culminated in studies which postulate more or less elaborate connections between various dynamics of simple physical systems, and which are summed up by an expression that is as fashionable as it is vague and inappropriate, self-organization. By sheer tricks of language and abuse of metaphor, the authors of these studies seek to "explain" life in terms of the complex behavior of oscillating chemical reactions, or the spontaneous appearance of organized structures on different levels. This painfully reductionist attitude completely fails to recognize what is the basis of life, symbolic abstraction. The objects that make biological functions happen often have no mechanical relationship with them; they are only their mediator, their symbol.

He really doesn't like complexity theory, and the physical kind of dynamical systems resulting in catastrophes, bifurcations and oscillations. It's not enough:

p240. We can only be astonished that, confronted by the marvelous variety and sheer gratuity of insect forms, scientists have not more often been inspired to explore the mode and timing of their production by starting from reality itself, rather than by hiding it under a veneer of simplistic, reductionist ideas.

Clearly reality lies between "just complex physics" and "arbitrary symbolic representation". The chosen symbols cannot be totally arbitrary: they need to (and do) obey laws of physics. But self-organised complexity shows us that those laws are potentially richer (and more structured) than realised. Physics permits, constrains, determines certain classes of symbols, but does not constrain the actual ones chosen. Even this constrained space is vast, and the realised actuality is just a small, arbitrary subset of this.

So if physics doesn't determine the symbols used, what chooses them? It's that novel law that appears at the level of biology: evolution.

pp175-6. The complementarity that exists between the material world produced by physics and the symbolic world produced by natural selection can be explained by the logic underlying the self-reference or recursiveness produced by the genetic code. The laws of physics and natural selection operate as complementary constraints: the laws of physics describe the unchanging part of phenomena, those properties that living organisms cannot in principle dominate or control. The theory of evolution through natural selection seeks to explain the way in which living organisms do, however, progressively improve their control over those laws.

This evolved symbolic mediation allows a relationship between one kind of regime and a completely different one:

p284. The role of the coding process is to make the transfer from a chemical world in which, broadly speaking, the objects (in this case segments of DNA) can be regarded as exploring only one dimension of space, to a world in which other objects, proteins, explore it in three dimensions, or even four if we include time, because proteins can change their shape.

How does this "accidental" relationship arise?

pp309-10. the idea of a cause-effect relationship between the structure of biological objects and their function is so well established that it motivates the work of hundreds of thousands of scientists around the world. But I have tried to show that the causal relationship between the architecture of biological objects and their function is often arbitrary and accidental. ...
    … adaptation occurs a posteriori, and not a priori, because there is no final cause. The living being that survives a critical situation did not know in advance what would save it, but, having found the solution, and because it has survived, it passes that solution on to its descendants, thus preserving and multiplying it. Such a solution is always some way of establishing a link or relationship between processes, events, or objects. The link is part of a structure, but it was the function that revealed it, so it is the function that ensures that the structure is retained. Function does not create structure, but discovers it when it is needed.

So the laws of physics are important in biology, but not always in the same way they are in the domain of physics itself. For example, that old bugaboo, the second law of thermodynamics and the increase of entropy, looks different through the lens of biology, where a statistically "representative" ensemble is neither realisable (because of the size of the state space) nor explored (because of the selective effect evolution).

p191. an increase in entropy, in accordance with the Second Law of Thermodynamics, simply means that objects will spontaneously explore all the environment accessible to them ... In this context, irreversibility … is simply the expression of the fact that the total "space" of states and positions available to the objects considered can only increase, in the absence of any ad hoc constraint ...
    However, we must insist that not everything is possible, because time is also a crucial consideration. It is meaningless to consider states that are theoretically possible but are inaccessible for lack of time. Once the number of objects considered is over a certain minimum, the number of possible states is so vast that they cannot all be explored. This is a fundamental flaw in the statistical model, right from the outset, and it is important to bear this in mind when considering what happens in real cases, but unfortunately this is almost never done. Entropy is therefore nothing but a measure of the extent to which everything that can be occupied is actually occupied, and an increase in entropy only accounts for the exploration of all this new space (perhaps we should say its creation, to mark the fact that it represents a transition from a virtual state to a real state, since the nature of the initial space was different from the nature it acquires when explored, because of the possibility of new interactions).

Having made the point that biology adds a layer of symbolic functionality and control on top of physics, he draws the analogy with computation, or information processing, that also has these two layers. (He is emphasising here that the functionality needs to be considered in addition to the physical implementation; ironically in computation the functionality is primary, and the fact of a physical implementation is often neglected.)

pp212-3. Turing's … approach distinguishes between symbolic processes, which control the interactions between objects, and the physicochemical nature of the underlying processes. Provided that the machine can actually exist as a material reality, its physical nature is not important, so long as it can establish the necessary relationships between the strings of symbols. This duality of the symbolic and the physical nature of things is a characteristic feature of living organisms: they are compatible with physics, but they cannot be deduced a priori from its laws. ... Physics represents the inevitable and universal constraints on things, whereas life will always try to take control.

Danchin makes the point that the physicochemical nature of the underlying processes can be separated from the symbolic processes both in biology and in computation. Laughlin points out that when (emergent) properties are insensitive to the substrate (as in this case) you can't draw conclusions about the substrate from them. So we shouldn't expect to be able to draw conclusions about the biochemical substrate from observing the biological processes. Which is good -- it explicitly admits the possibility of life based on other substrates.

So this control layer is (somewhat) independent of the underlying laws of physics. We design this control into our computers; life evolves this control:

p219. It is precisely because the cell functions using just local, basic operations (of the type connect/disconnect, or presence/absence) that life is possible without there being any external causality. ... It is the result of the succession of a very large number of simple events, which became organized essentially because this worked. The only systems (organisms) that have survived are those which were able to bring together relationships that were locally extremely simple and probable, and to combine them in the structured way we know today. Selection by existence (which is merely a principle of stability) is an infinitely powerful way of discovering precisely what is stable enough over time to be able to survive in a given environment. One property of the stability principle is systematic evolution toward ever-increasing control over the unavoidable physics of the world. And the object of biology is to discover the principles of this evolution toward increasing stability.

One mistake people often make when pursuing the computational analogy is to assume that the program is all there is -- an approach that developmental biologist Jack Cohen for one strongly decries. Danchin does not make this mistake, but explicitly brings in the role of the environment, providing data, and providing the context where the symbols gain their meaning:

p270. there is no one-to-one correspondence between a gene and its expression. In particular, a gene may or may not be expressed, depending on the cell's environment. This is obvious in multicellular organisms such as mammals---a skin cell does not express the same proteins as a brain cell, and when it divides it produces more skin cells, not neurons, despite the fact that both of them must have the same DNA content, and therefore the same genetic program. This same program can thus produce different outcomes, demonstrating that the external environment is an intrinsic part of the way the program is expressed, because it contains the data that determine the outcome. A cell can be defined as a machine that puts the genetic program into operation according to the data provided by its environment.

p176. The laws specific to biology are able to exist because of a particular aspect of their role: they do not affect the nature of physical and chemical objects, but govern the relationships that exist between certain objects. These objects have a meaning, which is connected to their function in the physicochemical processes of life. This gives them an original order of abstraction, quite distinct from what physics tells us: .... This space-time plan, this program that links together the material objects of physics in order to compose a living organism, is an abstraction. However, it cannot be regarded as arbitrary or as existing in itself, without the material support of the physicochemical objects of life. The links in question are not just any links; they have original properties which we must try to understand. They are the result of a continuing selection, in the normal course of an evolutionary process that can be measured by the survival and existence of the organisms in question.

Danchin takes the computational analogy further than most. For example, he considers the Kolmogorov (algorithmic) complexity of the genome, and what it might tell us:

p224. compress the sequence … to understand how the sequence has been generated in the course of evolution. A genome is not a random piece of DNA, but the result of evolution through duplication, recombination, mutation, and so on, and all these processes could be described in terms of algorithms.

Here again, time is of crucial importance: an algorithm is no good if it takes too long to execute! This relates to a complexity in terms of algorithmic, or logical, depth:

p229-30. Given a particular sequence, we will want to look for algorithms that will generate it, but we must always keep in mind the parallel importance of evaluating the program's run time. ...
    ... we should never speak of what is potential in the same way we do of what is real. It may be meaningless to speak of potential, because it may be impossible to realize that potential explicitly in the time available.

Danchin goes all the way to Godel and the Halting Problem:

p232. Although it is not possible to go into detail here, the connection between the halting problem and the finite character of genome texts, if they are considered as algorithms, suggests that their formal properties are worth studying in detail, as a source of mathematical conjectures. By the very fact that they exist, they prove that it is possible for an algorithm to have a critical structure, a critical depth, which is related to their capacity to reproduce themselves in a given environment, while at the same time producing the machine that runs them.

It is a real pity he doesn't go into further detail -- it was getting interesting!

His dislike of the modern application of physical complexity science to biology resurfaces, and he instead describes developmental systems in terms of construction via algorithmic description:

p239. Because many reproducible structures exist in physics (branching structures, cells, circles, spheres, and so on), many thinkers looked to certain physical or mathematical principles to explain the genesis of forms in biology. According to these ideas, life has simply rediscovered the general principles that govern physics …. This horribly reductionist, Platonist attitude prevailed for a long time. It is still sometimes popular among those who know nothing of biology, because they fail to understand two vital things: first, that the functions which construct, or which ensure control, have an essentially symbolic role; second, that the important form that is preserved in organisms is not the final shape, but the form of the algorithm that constructs it. …
Life certainly uses the principles of physics … but just as a basic vocabulary, a set of elementary processes, organized into a program, not as the main construction principle of life.

Algorithms provide iteration and (spatial and temporal) combinatorics, which lead to a biological-style complexity of developmental processes:

p241. The processes are all extremely simple in themselves, but the way they are strung together is complex, because it is compartmentalized in space and time. Although the diversity of the control elements is limited, their combinatorial possibilities are extremely rich.

p243. What preexists is not the organism itself, but the preformation of a development algorithm. … what heredity passes on is not the form, but its construction program. The successive expression of control genes, activated or suppressed one after another, enables morphogenesis to take place (while respecting and making use of the constraints of physics, of course, such as the rules of overall symmetry).

Messing about with this developmental program can have macroscopic, structured effects, such as growing legs where antennae should be. Even:

p244. The organization is so hierarchical that modifying a single gene, Lim-1, produces animals without a head.

Even though all organisms have a control level, it can be more sophisticated in sme than in others:

pp244-5. there is a significant difference between mammals and insects. In mammals, instead of a single linear arrangement corresponding to the layout of the insect, there are four linear arrangements, arranged exactly as in the fly, and also corresponding to the animal's development from the tail to the head. ... This discovery accounts for mammals' greater complexity compared to insects: the construction algorithm is produced by the combination of four homologous procedures working simultaneously. It also explains how the segmented character so visible in insects (mostly at the larval stage, of course) is much weaker in mammals. We can also definitely see signs of evolution by duplication of the genetic program, which suddenly makes new properties appear---the effects of duplication are not only quantitative, they also create new relationships de facto.

In programming, it is important to be able to remove old objects as well as create new ones. The same is true of the developmental approach: scaffolding is erected, then removed:

p242. During this development, certain cells are programmed to disappear, leaving room for other cells which are differently differentiated, and which could not otherwise have developed. It is thus important to note that development includes a significant element of absence, as distinct from presence, so that a "negative" form plays a role in development that is just as important as that of a positive presence.

This focus on the processes and relationships between objects within the cell leads to a definition of life here in terms of four features:

p253. The processes that make life are metabolism, compartmentalization, memory, and manipulation. Metabolism and compartmentalization are organized by small molecules (comprising a few tens of atoms, with a carbon skeleton), whereas memory and manipulation are controlled by nucleic acids and proteins, so the scale of their basic components is that of macromolecules ... Two spatial scales are thus interlinked in all living processes, which operate on a mesoscopic scale, intermediate between our macroscopic world and the microscopic world of atoms. This is the scale that is revealed in the geometrical program superimposed on the genetic program in the genome.

This does not fully carry over into the computational analogy:

p253. Reconciling all these processes has seemed so difficult that ... at the conceptual level, when comparisons have been made between life and Turing machines, the general principles for the construction of a self-replicating machine have nearly always overlooked the need for compartmentalization and metabolism.

This definition of life might seem to lead to a clear answer to the problem of viruses, but it is, of course, never that simple:

p254. This means that organisms such as viruses, which do not metabolize, cannot be considered to be straightforward living organisms. They must be studied for what they are: pure parasites, a memory that perpetuates itself at the expense of a genuine life, that of the cell they have infected. Of course they are not similar to the usual non-living matter found on the Earth; they seem to be artifacts created by life

It might seem that 300-odd pages is a long time to say "life has evolved symbolic relationships between its objects, and has an algorithmic development program". But there is much more to it than that. The thesis is backed up by detailed biological explanations, juicy physics and computational explorations, and interesting excursions into the philosophy of science. For example, he has some important things to say about the practice of (biological) science. In particular, on the important role of models, theory, and abstraction in science, when we need to move beyond "stamp collecting", beyond observed phenomena, he says:

p124. It is difficult to connect the text of genomes with biological functions. Knowing the text of a gene, predicting the sequence of the protein it specifies, visualizing its architecture, does not directly give us its function. The best we can do is to modify the gene or inactivate it and to study the genetically modified organism. But then we are faced with the difficult situation of studying phenomena … What is the best way forward? How should we interpret what we observe, and avoid taking our wishes for reality? Unlike in a number of domains of physics, where phenomenology is already well established and the theoretical, a priori approach is highly developed, we are not in a position to make a model of what we want to observe according to the criteria I have outlined. First we must observe and account for a phenomenon: growth under certain conditions, use of a particular molecule, sensitivity or resistance to a particular variation of a physical parameter. Simple phenomenology, because of its approach in which observation is only very loosely connected to a well-defined and delimited theoretical corpus, is on the borderline between science and an unstable form of thought, often close to a kind of primitive magic. This is not often recognized, but it explains why a large part of scientific work, even work that is institutionally recognized, is in fact of very little value in advancing scientific knowledge. It also explains the existence of many activities in the field of biology that are close to ignorance or even fraud.

This is a passionate book. It is a translation from the French, and in the acknowledgements he thanks his translator for making the transpositions required by the move from a Latin culture to an Anglo-American one. On the whole, this succeeds, but I feel there is still a French style peeking through in places, particularly in the philosophical stance. This is a good thing; it would have been sad to lose this flavour in the translation.

Danchin ends on a slightly depressed note, with references to 9/11, and the smallpox virus, but I think there is optimism in the observation:

p325. what we create cannot be reduced to what we are

Recommended -- but expect to take some time over it.