Friday, April 11, 2008

On The Analysis Of Biological Mutations

Browsing MEDAL Blogging, I got a nice link to a post at Olivia Judson's New York Times blog, see it here: A Mutual Affair.

That link was the pointer to another Olivia Judson's post: A Random Analysis. It deals with the biological nature of DNA mutations. Are they really random? Do they follow some kind of probabilistic distribution (or density) function? Are they weighted? Do small mutations happen much more often than the larger ones?

Of course, those questions are mine, not hers. :)

She states that mutations are small modification on the biological blueprint and, depending on the way they happen - small or large -, may affect seriously or not the living being's evolutionary path.

The manner through which mutations occur - be these deleting, inserting, and so on - can also affect the genome's configuration of a specimen. She says that when deletions are more frequent than insertions, the genome's configuration is prone to be compact and small, what may influence the specimen phisiology - those creatures holding small genomes own a heavy metabolic and growing rates.

Mutations are not only random, but also work different from species to species and the mutation flavours (deletion, insertion, etc.) also occur at different rates when taking into account other species. For example, in humans, deletions are more common and prominently working on DNA bases' repetitions, such as ATATAT or AGCAGCAGC. Why do mutations often occur on those repetitions? She explains:

"The reason mutations to repeated sequences are so common is that, in such repeats, it’s easy for the DNA copying machinery of the cell to slip and lose its place, and then put in too many repeats, or too few. (Even for a person, copying something like AAAAAAAA is harder than copying ACTGTCAG. Ahhh!) And although, obviously, these mutations can only happen in part of the genome where there is a repeated sequence, they happen at such a high rate that each of us probably carries as many new slippage mutations as “point mutations” — mutations that swap one base for another, say A for C."

Genomes, as Judson said, hold mutational hotspots and coldspots. These hotspots seem to be long repetitions of DNA bases, since "copying a long repeated segment without slipping is more difficult than copying a short one." Some creatures seem to have evolved their own kinds of mutational hotspots to aim for evolutionary profits - for instance, a pathogen developing the "stealth" ability before the immune system "eyes".

She advises that the manner mutations happen may trap a species in a specific evolutionary path, denying it the exploration of other evolutionary routes:

"But the mutational peppering has a consequence. As I mentioned at the start of this article, an important source of evolutionary novelty is when one member of a pair of duplicated genes evolves to take on a new function. In Neurospora
[bread mold] this can’t happen: duplicated gene pairs get destroyed. Its use of mutations to defend its genome from invasion may have inadvertently blocked off some evolutionary paths."

Very nice article! :)

My Very Own Biased Comments

I think that there are so many lessons the evolutionary computation field may learn from Olivia Judson words. Of course, her small and layman article is just an initial step in that direction.

01. On Genetic Coldspots And Hotspots

For example, there is nothing in genetic algorithms that models the so-called mutational hotspots. On the contrary, take a common genetic algorithm book and the author - likely - advises you configuring the mutation rate between the range [0.0001, 0.001]. Crossover rates are set up at some point between [0.6, 0.9]. It seems that macromutations are avoided when it comes to genetic algorithms. When dealing with problems holding strong dependencies among theirs parameters, such a set up may even be harmful for the whole optimization process, since high crossover rates - along small mutation rates - tend to allow gene pool diversity losses, what can stuck the whole population in a specific local optima and/or optimization track. Sure, the GA community has made works on modelling genetic phenomena, such as gene linkage and viral infection.

Of course, detecting and handling the mutational coldspots is very valuable too, since we could avoid flipping bits (or a set of them) that must undergo small modifications or none at all.

I think that the optimization process of a given real world problem could benefit in some sense from that flavour of genetic phenomenon, since mutational hotspots and coldspots seem, at a first glance, to be useful for escaping local optima - remember: some problems require the modification of ALL parameters at same time to escape a local optimum. BUT... they could be harmful too. To avoid that drawback, mechanisms of self-adaptation could help to overtake that type of situation and could even implicitly identify mutational hotspots and coldspots, handling each one according to their respective nature.

Mutational hotspots and coldspots in non-decomposable problems should be addressed considering the relationship between those bits located at those spots and the others - sure, and between those spots' bits themselves. A well designed mutation operator would be very important here. (Again, a self-adaptation mechanism would be helpful.)

Upon the evolution without mutation, see her another nice article: Stop The Mutants!

Another important point has to do with the detection of those mutational hotspots and coldspots. Although there are some works from the Estimation of Distribution Algorithms (EDA) community handling problems that loosely resemble those genetic phenomena (see the Extended Compact Genetic Algorithm or the Hierarchical Bayesian Optimization Algorithm), they were designed under the statistical philosophy of the EDAs - an approach that throws away the genetic stuffs in genetic algorithms replacing them by statistical sampling and probabilistic distribution functions. Let alone that their bioinspired aspect is completely flawed, since in nature there is not an entity that captures data, analyses them, and etc. I guess those methods were designed much more as an optimization tool rather than intended to be a medium to increase our understanding of biological evolution. As optimization tools, those methods have a nice performance, mainly when it comes to discrete nearly decomposable problems. Summing up this paragraph: Despite a loosely resemblance between some genetic phenomena and the inner working of some EDAs, the later were not bioinspired.

For the EDA enthusiasts, I left a simple question: Is there a probability distribution that behaves in the same way mutational hotspots and coldspots do? :)

My hunch: I guess there is!

02. On Gene Deletion And Duplication

One of the few works I am aware of using abstractions of those two genetic phenomena is Professor Hans-Paul Schwefel's nozzle experiment, see it here: Optimization of a Two-Phase Nozzle with an Evolution Strategy. John Koza has used a kind of deletion and duplication too.

He got impressive results applying those genetic phenomena as an experimental optimization abstraction.

Those nozzle designs were obtained through a (1+1)-ES - without computer!

03. On Long DNA Base Repetitions

I have little to say here, since in "01" I have said a lot about a similar phenomenon. But, the way the genes interact in DNA seem to be much more important than their arrangement itself.

In the evolutionary computation realm, that problem has been addressed by linkage learning techniques in genetic algorithms and through correlated mutations in evolution strategies.

Labels: , , , , , , ,

Thursday, April 10, 2008

EvoWeb Interviews John Koza

Check his interview here.

I have not been aware that John Koza was John Holland's student along the 1960s.

Interesting interview, despite my own disagreements on some points.

Koza summarizes his path towards genetic programming (GP), the choice for LISP as the main computer language of early GP implementations, the initial papers, the hostility from the GA community at that time (early 1990s), his practitioner style, his achievements, the 1000 nodes computer cluster at Stanford, and so on.

The interview is from 1998.


P.S: There is someone that extremely disagrees upon Koza's statements.


Labels: , , , , , , ,

Friday, April 04, 2008

Evolving Movie Contracts Through Simulated Evolution

MEDAL Blogging has a nice link to an application of genetic algorithms that I have not seen so far: movie contracts between distributors and theaters. See the whole story here.

The author of that investigation, Eunkyu Lee, implemented a hybrid method using genetic algorithm and game theory to help in that kind of business among the parts.

One of the current structures of those contracts is the so-called slide scale:

"These contracts usually come in the form of a sliding scale: in the first week, when the movie is most in demand, the studios get a larger share of the revenue - sometimes as much as 80 percent. As time goes by and demand decreases, the shares even out or reverse, giving theaters a more favorable cut of the revenue. These weekly revenue shares are negotiated on a movie-by-movie basis and are typically augmented by exception clauses for unexpectedly strong box office performances."

The method applied is helping to bring to light new ways to those structures:

"New research by Eunkyu Lee, associate professor of marketing in the Whitman School of Management at Syracuse University, questions the necessity of the complicated and costly contracting practice in the industry. Using a combination of game-theory and an optimization technique called genetic algorithm, Lee and his research partners analyzed different contract structures in simulated movie markets and compared their effectiveness and performance to the current sliding scale structure.

'The major finding of this analysis is that the contract between studios and theaters does not have to be as complicated as the current industry practice,' says Lee. 'Much simpler contracts such as a 50-50 split or a two-part tariff are just as effective as or better than the sliding scale, and they are also not as costly to implement.'"

Remember: "We'll always have Evolutionary Computation." :)

Labels: , , , , , , , ,

Wednesday, April 02, 2008

Evolving Genomics Through Simulated Evolution

Very good pointer from Epistasis blog, see it here.

It outlines the urgent need of computational intelligence approaches in genomics in general and what kind of (tremendously) high dimensional problems that field is facing nowadays. Jason Moore - the article's author - claims computational intelligence can ease those dificulties, softnening, in some sense, the problems' hardnesses.

The section four is of main interest for evolutionary computation fans, since it deals a lot with genetic algorithms, non-linearities in gene interaction, and so on.

Labels: , , , , , ,

Tuesday, April 01, 2008

Evolutionary Computation Researcher In Antarctica Has Found The Best Time Complexity Evolutionary Algorithm

My news webservice has brought to me the following evolutionary computation findings:

"An international mission based in Antarctica in a special lab has found the best time complexity evolutionary algorithm ever! Professor Pangloss Van Thunder-ten-Tronckh is the director of the project, which began along the late 1970s and had the former U.S.S.R. as the main manager beside U.S.A - Professor Trollov Buraninev is the director of the U.S.S.R. part. 'It is a finding that will open the doors of a new kind of science among all other sciences', said Professor Pangloss. He continues: 'This discovery is to evolutionary computation as the calculus was to physics and mathematics.'

Professor Pangloss states that his team was the first to solve an over a billion variables problems and they did it using just an AMIGA 500. He comments: 'Although some guys from Urbana-Champaign have claimed that they are the first ones, it is not true! As we are funded by CIA and the former KGB, our results never made the big mainstream publication magazines!'.

He claims he was the inventor of the Estimation of Distribution Algorithms (EDA): 'What outside Antarctica evolutionary computation researchers call EDA, was invented by me when I was an undergrad student during the 1950s! It was a homework that my advisor gave me.'

He also said that what Professor Hans-Georg Beyer has made upon the theory of evolution strategies is, indeed, a less elegant version of what Professor Buraninev's team had already done in 1981. Professor Buraninev states: 'I know very well the work of those German guys! Of course, it is just a lower quality flavour of my own research that I started in April 1st, 1977. What Professor Beyer has made is valid only under certain conditions, that is, Gaussian mutation, intermediate recombination, and comma/truncation selection - let alone that my self-adaptation mechanism is much more general than Schwefel's and/or Rechenberg's, since it can adapt to any kind of fitness/objective function! We made a complete theory of evolution strategies under all imaginable conditions, taking into account correlations, covariances, discrete recombination, noise, plus selection, comma selection, Gaussian mutation, Cauchy mutation, Gamma mutation, Geometric mutation, Bernoulli mutation, Poison mutation, Laplace mutation, uniform mutation, Zipf-Mandelbrot mutation, Student mutation, and so on... both univariate and multivariate distributions! And I did it in my special built Agat computer! And... remember: In Soviet Russia genetic algorithms rotate... YOU!!!'

Professor Pangloss comments about a strange guy that once entered his iglu: 'Yes... he was very strange! He stayed here for almost two weeks and he worn every single day only striped shirts! He promised me that he would teach us how to always win when betting in lottery. I thought him a nice guy and showed him my new invention - Programs that generates, through simulated evolution, others programs! I gave him a source code in LISP! To my very own desperation, years later I was told that he had invented an invention machine and was using my code on it!'

Professor Buraninev once met a Californian guy from La Jolla: 'Yes! That guy once told me that he would introduce me to a friend of his, her name was Blondie24. At a first glance, I thought her name a little strange, since girls do not use to put numbers on their names. But he stated that that was a new Western fad and that the girl was a true master in checkers. Oh, heavens! I would finally realize my Soviet Nerd dream: Date a girl which also is a checkers master! Then, he fixed a day for a meeting. I believed... how fool I was! While the girl, the so-called Blondie24, had not arrived, he showed me some of his father's work on automata. I decided to show him my new evolutionary algorithm flavour which is almost the same as evolution strategies, except on the selection mechanism and the self-adaptation procedure. The guy went insane! He immediately ran away from the Iglu Cafe! Some years later, I got aware that he was engaged in a new business endeavour and using my mind child!! The girl, Blondie24, has never appeared!'

Well, despite all those confusions along the years, Professor Pangloss is really happy to announce the ultimate finding of his lab team. He says: 'Finally, after 30 years of research, we are able to declare that we have found an evolutionary algorithm whose complexity is only O(n-100-100)! This is the complexity of our algorithm in ALL classes of problems! Both decomposable - the easy ones! - and non-decomposable!!' So, where is such a poweful evolutionary algorithm? Professor Pangloss explains: 'Since we have been fooled by those guys from mainstream academia, we will not show how our procedure works! It is even impossible to rescue its inner working, because this was written on a sheet of ice that Professor Buraninev applied on his vodka research!'

But... what about computer codes and simulations? Again, Professor Pangloss has an explanation: 'Our codes, simulations, and computers were all eaten out by a blue whale! This animal is very common by this side of the world and even ate my favourite grad student!'

It seems that evolutionary computation will need to wait a little more to be completely formalized and put on formal grounds!"

Labels: , , , , , ,

Charles Darwin Has A Posse Check Google Page Rank