Friday, June 12, 2009

Forty Five Years Of Evolution Strategies


Today, June 12th 2009, is the 45th anniversary of an evolutionary algorithm: Evolution strategies.

So, to celebrate this, somehow, important date to evolutionary computation, I have made a brief interview with Professor Hans-Paul Schwefel concerning what the "unofficial" Arbeitsgruppe Evolutionstechnik had to overcome during the creation of this new optimization procedure. The questions focused on the period before, during, and after the innovation process that brought evolution strategies into life and how he sees the contemporary approaches being used. Here you are the interview.

By the way, I would like to invite you to read the new and much improved post Evolutionary Computation Classics - Volume I, that tells the long version of the story below.

I hope you enjoy it!

01. Could you tell us a little about growing up in post WWII Germany and your high school later years at Canisius College? What kind of student were you? Did your daily life orbit around school-home and home-school?

School went smooth without any problems and without highlights until October 4th, 1957 (launch of the first artificial earth satellite 'Sputnik 1'). From then on, I wanted to become astronaut and became one of the best pupils, especially in maths. I had also lessons in playing the violin, was member of the school orchestra and the school choir, but more important for me were the actions as a boy scout, where I was urged to play a leader's role, very soon - until I needed more time for studying. B.t.w. I needed 90 minutes each per day for the way to and from highschool (class 6 to 13).

02. Why did you choose aerospace technology engineering as your undergraduation degree? Did the brain drain Germany faced (e.g., Operation Paperclip) have any influence on your decision?

Why? - see above! I was not aware of the brain drain (at least I don't remember) and due to my family's financial situation studying away from home was unthinkable.

03. When you were admitted at TUB, during your initial semesters, you had any (at least the vaguest) idea of working with computers and simulation in general?

No, since there were only mechanical calculators operated by means of a rotatable handle, at that time. The first available computer (in the basement of the institute of mechanics) became 'mine' during night times for my diploma thesis' simulation experiments.

04. Before joining in TUB's Hermann Föttinger-Institute for Hydrodynamics (HFI), had you already heard of your evolution strategies fellows (Ingo Rechenberg and Peter Bienert)?

No. I became engaged by invitation from the Head of the institute, because I was his best student (two times best marks in written exams).

05. How did you end up working at TUB's Hermann Föttinger-Institute for Hydrodynamics (HFI)? What was the academic feeling/environment there?

Besides of my duties, i.e. preparing and controlling other students' experiments, I spent much time with Ingo Rechenberg and later also Peter Bienert in performing 'our' work - until all of us were relegated, being said: "Cybernetics as such will no longer be done at this institute".

06. Was the idea of experimental optimization a well known practical method during 1960s engineering Zeitgeist or was it seen as an exotic way of doing research? What did the well established professors at HFI think about that?

The traditional way - as we saw it - was a sequence of creating hypotheses from first principles and known facts plus experiments to prove or falsify the hypotheses. Iteratively improving facilities or processes systematically was rare, though there was some literature about the 'design and analysis of experiments'. Our first successes were highly accepted and even mentioned in the press, but the established experts in the field of hydrodynamics were skeptical or even hostile. Thus we were forced to go back to traditional work or to leave.

07. Can we say the 1960s optimization Zeitgeist was all about linear and non-linear programming? Was gradient-based methods the archetype optimization methods of that time?

Exactly! Optimization methods were a topic of numerical maths. Some opponent said:  "We've got the optimal optimization method already..." (he meant steepest descent/ascent) "...and need no more".

08. What were your initial feelings after seeing the results of the experimentum crucis? At least in essence, does that experiment share some features with the modern evolution strategies? To whom the Arbeitsgruppe Evolutionstechnik reported the feedbacks got from that experiment? Could we consider the experimentum crucis the first evolvable hardware experiment ever made?

The first ES version operated with just one offspring per 'generation' because we did not have a population of objects to operate with. I termed it later (1+1)-ES in order to distinguish it from 'multimembered' versions. And we used discrete mutations in the vicinity of the parent's position. In my diploma thesis'(1964/65) work I demonstrated that such more or less local strategy can get stuck prematurely. Therefore I proposed to use probability density distributions like the gaussian one for continuous variables. But we were proud of having shown that the simple ES worked under noisy and (perhaps) multimodal conditions, whereas one-variable-at-a-time and gradient methods failed. And the simple ES was much more efficient (quick) than opponents had predicted. Their (sometimes even now maintained) misconception was to think of uniformly distributed mutations in the 
whole search space.

But: There was no proof of convergence nor any theory of the efficiency. Rechenberg's PhD. thesis from 1971 contained the first efficiency results, i.e. the local velocity of the (1+1)-ES in n dimensions of two fitness landscapes.

The first evolvable hardware was created with Peter Bienert's diploma thesis as FORO 1, the first research robot (FOrschungsROboter) able to handle several step motors calibrating e.g. potentiometers at some mechano-electric facility (actually a hybrid computer acting on a pneumatic control device). That was also around 1970.

After the relegation from the institute of hydrodynamics in 1966, I earned my money at industry until 1970 (where I did the nozzle experiments), whereas Ingo Rechenberg and Peter Bienert settled (without salaries) at another institute of the Technical University of Berlin There we met again after I got a grant from the German research foundation for work on comparing ES with other numerical optimization strategies on a digital computer. That money reached from 1970 to 1975 (I finished 'my book' in 1974, which was accepted as dissertation, the defense of which tool part only in 1975). Besides of both theses of ours there were no further publications between 1965 and 1971 since too much time went into writing grant proposals and struggle for survival (except an article of mine in my school's yearbook 1966 and the frequent technical reports for the grant giving authority, i.e. unpublished work).

09. Once I saw a Professor Rechenberg's PPT presentation (only the PPT) in which there was a Galton Box (or bean machine). Was that device used as the mutation operator for the experimentum crucis? Galton boxes may have different distribution/density probability functions (depending on the manner the user set it up), what was the distribution/density function of the box used by the "unofficial" Arbeitsgruppe Evolutionstechnik?

The Galton box was used for demonstration purposes only. Actually, we used some kind of children's playing chips with a plus sign on one side and a minus sign on the opposite side.

In the simplest case with two chips the result could be:
  ++ with probability 1/4 for changing a variable in positive direction, or;
  -- also with probability 1/4 for changing a variable in negative direction, or;
  +- (or -+) with probability 1/2 for no change.

With more chips one could of course produce broader binomial probability distributions.

10. What did the judge board think about your senior thesis (undegraduation final project)?

I got the highest mark for that work (I had written the task's requirement myself, which was accepted by the Head of the fluid dynamics institute). Ingo and me got our diplomas at the same time together with a prize from the German Association of Engineers (VDI) for the best 3 students of the year 1965 (Ingo being 6 years, or 12 semesters, older than me).

11. After graduating, you did not join in (right away) a MSc. course or something similar, but went to work repairing Lockheed Super Constellation aircrafts coming back from South America. Didn't you think it was a (somehow) "ungrateful" work compared to all the experimental optimization and computer simulation work you had performed at TUB some few years (months?) earlier?

To put things in right order, my times studying at the TU Berlin required not only at least 10 semesters of topical lectures and exercises with exams, but in addition to that 12 months of practical work (internships), 6 of which had to be completed before entering the first semester, the other 6 in between the following 11 terms, plus studies with exams in 4 non-technical domains like philosophy, laws and economics, arts and foreign languages. It was during the 6 months between the semesters that I joined Lufthansa's dockyards at Hamburg and had to do with Lockheed's SuperConstellations. I also joined a team of specialists concerned with oxygen/hydrogen rocket motors (later on used for Ariane missiles' position control motors) at Boelkow's company near Munich as well as a helicopter jet engines factory (Turboméca) in Southern France.

In Germany there was not such a sharp split between undergraduate and graduate studies. One had to pass a series of exams after about half of the time (the 'Vordiplom') but one could not yet get a job with only that part of the 'basic' studies, normally. For clarity, here is a schedule of mine: 1959 highschool ended, 6 months internship, beginning of studies (after I had in vain tried to get a pilot school place at Lufthansa - the school was closed that year, the only reason why I tried to bridge the gap by joining the university) 1959-1965 studies in aero- and space technology with emphasis on propulsion and further 6 months internships and 'humanistic' studies in between. 1965/1966 coworker at the institute of hydrodynamics (full paid work for studies in turbulent wall shear stress measurement techniques - together with Ingo Rechenberg ((ES things were done aside and not paid for)))

1967-1970 work at industry (AEG research group Berlin, concerned with work on some kind of liquid metal driven energy converter without rotating parts; the flashing nozzle being one essential part of it) 1970-1975 grants from the German research foundation (DFG) for self-defined work, two professors had to serve (better non-serve) as supervisors - a biologist and a control and measurement scientist (Ingo Rechenberg became professor himself by a transition rule in the reorganisation of the university, saying that all people with a high grade Ph.D. exam should be some kind of lower-grade professors; that was around 1973/74 when his dissertation became published as a book).

1976-1985 first 1/2 year on a grant from a research project at Hanover, for which I developed a model of non-genetic variance among cloned guinea pigs; then work in Juelich between Cologne and Aachen in the nuclear research centre (KFA) as systems analyst in a working group doing simulation models of the whole energy system (demand, conversion, consumption with all kinds of energy carriers) in Germany, the European Union and beyond.

1985... professor at a new chair of systems analysis (applied computer science) at the (now also Technical) University of Dortmund - gained for my experience in systems analysis, not evolutionary computation.

12. The two phase flashing nozzle experiment is an interesting example of evolutionary optimization, do you think it would be a practical idea to apply the same approach (or a computer assisted one) to larger nozzles, such as those employed in aerospace industry/research?

One-component one-phase nozzles are theoretically well established and can be designed easily by means of known physical laws. Two-component nozzles (e.g. gas and solid particles in the flow) are a bit more difficult to design due to the shear stresses between fast fluid and slower particles, but there is a lot of empirical knowledge, now.

The problem with 'my' one-component two-phase case suffers additionally from non equilibrium thermodynamics of the phase transition from liquid to a mix of steam (fast) and liquid droplets (slow, also tending to cluster on walls and thus losing their energy content). I think that even now nobody is able to simulate these processes with boiling delay and supersonic shocks in the divergent part of the nozzle in order to optimally design such a flashing nozzle by means of CFD (computational fluid dynamics). But hot water rocket users might make use of my experiences. I tried to correspond with them - they not even answered a line.

13. Basically, what was the core of your work at AEG and KFA Jülich? Were there many opportunities of applying evolutionary computation in their problems?

After my success with the nozzle I was urged to manage a larger project, so that I tried to escape from such non-scientific work as soon as possible. After work I met Ingo Rechenberg and Peter Bienert at their site (control and measurement institute) and hoped for success of my grant application.

14. What was the main subject of your PhD./Dr.-Ing, thesis?

Comparison of Evolution Strategy (ES) with traditional numerical optimization methods, including improvements of ES (e.g. self adaptation of internal parameters, which lead to the later standard (μ, λ) versions), and trying to enhance theoretical analyses.

15. We may consider the years of 1964-1970, at large, as an early developmental and test of concept period for evolution strategies. How would you qualify (for evolution strategies) the subsequent decade, that is, the 1970s? Could we say it was during this decade the self-adaptation mechanism took place?

Exactly. But in 1976 I turned away because such work did not pay, and I became some kind of futurologist at Juelich.

16. After being admitted at Dortmund University as a full professor, what was the main aim your group set up for further research in evolutionary computation and how do you evaluate the results achieved?

I smuggled evolutionry algorithms' ideas into the subchapter 'optimization' in my courses on systems analysis, not forgetting to mention my experiences. It took not long until one and then more and more of my students became interested in just that part of the general topic. For them I tried to get money from research grants, and after 15 years the team had grown up to more than 30 coworkers.

17. Do you consider the genetic algorithm researchers switching from the traditional binary string representation and well known evolutionary operators approach to the estimation of distribution algorithms a (somehow) step toward an ES-like approach, since there is a (somehow) similarity between those algorithms when it comes to probability density/distribution function parameter(s) adaptation?

As always, I argue that any idea improving robustness and/or efficiency (at best: both) of an optimization algorithms is welcome. But, my personal interest (and at least at the beginning also Ingo's) has been in understanding and making use of real nature's tricks, too.

That is why arithmetic tricks not resembling natural processes are of a bit lesser interest to me. Artificial immune systems, ant colony optimization, differential evolution and many other approaches will have their specific domains and a practitioner would be stupid not to have all of them - including traditional methods -in his toolbox.

My experience is that even in one successful run to an optimum it may be necessary to  switch between methods and to set some parameters anew by hand, as well.

18. What is your view upon derandomization approaches? Wouldn't be better to keep some "noise" along the evolution optimization process rather than biasing it towards a "direction"?

All of that may be helpful or stupid depending on the specific situation. Noise hampers in 'easy' cases, but helps sometimes in more difficult ones. A good strategy adapts its internal parameters during the search, e.g. the main direction. I call these parameters 'internal model' of the corresponding individuals and found that it is important to maintain diversity of those internal models' within the population. I am dreaming of an evolutionary algorithm that comprises many more internal parameters self-adjusting during the search.

19. Along the history of evolution strategies we can see an addition process of new features (multi-individual population, self-adaptation, variances and covariances, and so on) and these features have became a standard in evolution strategies. When seeing others' approaches, such as genetic algorithms (and the almost forgotten evolutionary programming), we don't verify the same phenomenon, since the most applied genetic algorithm (elitist SGA) are practically the same one established by Kenneth De Jong during the early 1970s, even though there were and there have been well-intentioned works to add new features to genetic algorithm, but they have not became a GA standard. In your opinion, why did that happen in, for example, genetic algorithms' field?

In some cases simple versions are good enough, in some cases people are not even aware of traditional approaches. Sometimes people follow the advice of prophets without reading the bible, and prophets often simplify to spread their message.

The problem of problem solving is multifaceted: First, there is not enough theoretical foundation of the cause/effect relations in optimum seeking methods under various conditions. Second, black box situations (those for which evolutionary algorithms MAY be used) are not classified, perhaps not classifiable. Therefore unpredictability prevails, surprises are common, and convergence to the (a) global optimum within a manageable time cannot be guaranteed. In practice, being better than the competitors is sufficient - for a while.

Thus, even slight improvements towards a (perhaps moving) optimum are appreciated, only theoreticians remain dissatisfied.

20. After forty five years of evolution strategies, what are your impressions for the next forty five?

I cannot forecast. Curiosity prevails - and satisfaction.

Labels: , , , , , , ,

Charles Darwin Has A Posse Check Google Page Rank