On The Analysis Of Biological Mutations
Browsing MEDAL Blogging, I got a nice link to a post at Olivia Judson's New York Times blog, see it here: A Mutual Affair.
That link was the pointer to another Olivia Judson's post: A Random Analysis. It deals with the biological nature of DNA mutations. Are they really random? Do they follow some kind of probabilistic distribution (or density) function? Are they weighted? Do small mutations happen much more often than the larger ones?
Of course, those questions are mine, not hers. :)
She states that mutations are small modification on the biological blueprint and, depending on the way they happen - small or large -, may affect seriously or not the living being's evolutionary path.
The manner through which mutations occur - be these deleting, inserting, and so on - can also affect the genome's configuration of a specimen. She says that when deletions are more frequent than insertions, the genome's configuration is prone to be compact and small, what may influence the specimen phisiology - those creatures holding small genomes own a heavy metabolic and growing rates.
Mutations are not only random, but also work different from species to species and the mutation flavours (deletion, insertion, etc.) also occur at different rates when taking into account other species. For example, in humans, deletions are more common and prominently working on DNA bases' repetitions, such as ATATAT or AGCAGCAGC. Why do mutations often occur on those repetitions? She explains:
"The reason mutations to repeated sequences are so common is that, in such repeats, it’s easy for the DNA copying machinery of the cell to slip and lose its place, and then put in too many repeats, or too few. (Even for a person, copying something like AAAAAAAA is harder than copying ACTGTCAG. Ahhh!) And although, obviously, these mutations can only happen in part of the genome where there is a repeated sequence, they happen at such a high rate that each of us probably carries as many new slippage mutations as “point mutations” — mutations that swap one base for another, say A for C."
Genomes, as Judson said, hold mutational hotspots and coldspots. These hotspots seem to be long repetitions of DNA bases, since "copying a long repeated segment without slipping is more difficult than copying a short one." Some creatures seem to have evolved their own kinds of mutational hotspots to aim for evolutionary profits - for instance, a pathogen developing the "stealth" ability before the immune system "eyes".
She advises that the manner mutations happen may trap a species in a specific evolutionary path, denying it the exploration of other evolutionary routes:
"But the mutational peppering has a consequence. As I mentioned at the start of this article, an important source of evolutionary novelty is when one member of a pair of duplicated genes evolves to take on a new function. In Neurospora [bread mold] this can’t happen: duplicated gene pairs get destroyed. Its use of mutations to defend its genome from invasion may have inadvertently blocked off some evolutionary paths."
Very nice article! :)
My Very Own Biased Comments
I think that there are so many lessons the evolutionary computation field may learn from Olivia Judson words. Of course, her small and layman article is just an initial step in that direction.
01. On Genetic Coldspots And Hotspots
For example, there is nothing in genetic algorithms that models the so-called mutational hotspots. On the contrary, take a common genetic algorithm book and the author - likely - advises you configuring the mutation rate between the range [0.0001, 0.001]. Crossover rates are set up at some point between [0.6, 0.9]. It seems that macromutations are avoided when it comes to genetic algorithms. When dealing with problems holding strong dependencies among theirs parameters, such a set up may even be harmful for the whole optimization process, since high crossover rates - along small mutation rates - tend to allow gene pool diversity losses, what can stuck the whole population in a specific local optima and/or optimization track. Sure, the GA community has made works on modelling genetic phenomena, such as gene linkage and viral infection.
Of course, detecting and handling the mutational coldspots is very valuable too, since we could avoid flipping bits (or a set of them) that must undergo small modifications or none at all.
I think that the optimization process of a given real world problem could benefit in some sense from that flavour of genetic phenomenon, since mutational hotspots and coldspots seem, at a first glance, to be useful for escaping local optima - remember: some problems require the modification of ALL parameters at same time to escape a local optimum. BUT... they could be harmful too. To avoid that drawback, mechanisms of self-adaptation could help to overtake that type of situation and could even implicitly identify mutational hotspots and coldspots, handling each one according to their respective nature.
Mutational hotspots and coldspots in non-decomposable problems should be addressed considering the relationship between those bits located at those spots and the others - sure, and between those spots' bits themselves. A well designed mutation operator would be very important here. (Again, a self-adaptation mechanism would be helpful.)
Upon the evolution without mutation, see her another nice article: Stop The Mutants!
Another important point has to do with the detection of those mutational hotspots and coldspots. Although there are some works from the Estimation of Distribution Algorithms (EDA) community handling problems that loosely resemble those genetic phenomena (see the Extended Compact Genetic Algorithm or the Hierarchical Bayesian Optimization Algorithm), they were designed under the statistical philosophy of the EDAs - an approach that throws away the genetic stuffs in genetic algorithms replacing them by statistical sampling and probabilistic distribution functions. Let alone that their bioinspired aspect is completely flawed, since in nature there is not an entity that captures data, analyses them, and etc. I guess those methods were designed much more as an optimization tool rather than intended to be a medium to increase our understanding of biological evolution. As optimization tools, those methods have a nice performance, mainly when it comes to discrete nearly decomposable problems. Summing up this paragraph: Despite a loosely resemblance between some genetic phenomena and the inner working of some EDAs, the later were not bioinspired.
For the EDA enthusiasts, I left a simple question: Is there a probability distribution that behaves in the same way mutational hotspots and coldspots do? :)
My hunch: I guess there is!
02. On Gene Deletion And Duplication
One of the few works I am aware of using abstractions of those two genetic phenomena is Professor Hans-Paul Schwefel's nozzle experiment, see it here: Optimization of a Two-Phase Nozzle with an Evolution Strategy. John Koza has used a kind of deletion and duplication too.
He got impressive results applying those genetic phenomena as an experimental optimization abstraction.
Those nozzle designs were obtained through a (1+1)-ES - without computer!
03. On Long DNA Base Repetitions
I have little to say here, since in "01" I have said a lot about a similar phenomenon. But, the way the genes interact in DNA seem to be much more important than their arrangement itself.
In the evolutionary computation realm, that problem has been addressed by linkage learning techniques in genetic algorithms and through correlated mutations in evolution strategies.