Monday, June 04, 2007

Good Math, Bad Math and Behe.

I probably shouldn't review a rant, especially when I haven't read the book that the rant is about, but Bunc requested it so ...

Good Math, Bad Math is the work of Dr. Chu-Carroll, a computer scientist working at Google, which is across the San Francisco Bay from here and we have a few church members working there.

The review begins by objecting to Behe's definition of evolution. Of course, evolution is a synonym for change and the theory of evolution is quite fluid, so there are few definitions that I would find objectionable. This seems like a smoke screen.

Another complaint is that Behe is using continuous variable reasoning and the correct should be discrete variables. Of course, Behe, being a biochemist is well aware of the discrete nature of proteins. My tests on genetic algorithms working with both discrete and continuous representations of the same problem indicate similar convergence rates. Discrete mathematics, however, is vastly more difficult for analysis purposes, but this doesn't help Dr. Carroll at all. One major discrete design optimization candidate would be the software that Dr. Carroll develops. What would it look like if we did a genetic algorithm on the binary executables? (With an older windows OS, I can visualize the lovely blue screens already!) Thus, I think Behe is correct to use a classical optimization viewpoint and the Darwinist shouldn't complain. Otherwise, the analysis becomes intractable and the winner of a debate is the one that has the most rhetorical firepower. Another smoke screen.

The next area of bickering is over how many maximums are in the fitness space. I have complained that Dawkins assumed exactly one, which isn't quite fair. Dr. Chu-Carroll seems to think that there are more than one, but that they are few in number. He also argues that increasing the number of design dimensions somehow increases the likelihood of macro evolution. Dr. Carroll probably doesn't want to contemplate n-dimensional fractal design spaces. The nature of proteins is that they form essentially 4-dimensional components (3-spacial + 1 electric charge). These must fit together like puzzle pieces, but also not fit together randomly with the wrong piece. Behe's assumption is that this is probably much more like the n-dimensional fractal problem with astronomical numbers of maxima, rather than the extremely limited scenario preferred by evolutionists. Common sense compels me to agree with Behe.

The final complaint is that the fitness space is dynamic, unlike the static design space that I am always analyzing. Dawkins also made this point, but again, this is very familiar territory to the engineer and a place they really don't want to go.

The issue of competing organism is a bit like the issue of a car engine and a car air conditioner competing for space in an engine compartment. There are also many other components competing for space. If we look at the air conditioner species alone assuming a changing compartment shape that it must fit into, then the problem becomes hopeless. The automotive engineer will iterate between the various shapes to find a good compromise. This is usually done by optimizing one component at a time while holding the others fixed.

The engineer has a very good reason not to attempt to optimize the entire engine compartment at once. This can easily be defined by a simple transform of the design problem. The number of design variables becomes the sum of all the component design variables with a bunch of constraint equations thrown in. The fitness space is also a sum of the performance of the various components, and this too becomes vastly more complex. What was difficult, but tractable on a component-by-component basis suddenly becomes astronomically more complex.

Likewise, the issue of organisms in a dynamic competition environment is easily transformed to a relatively static fitness space for analysis purposes simply by adding up all the design variables of the individual species. The environmentalists have helpfully informed us that the actual fitness objective that is optimized is "ecological balance", rather than survival as Darwin thought. In other words, the entire biosphere of the planet is more or less linked into one glorious symbiotic relationship. Now how many variables are there, Dr. Chu-Carroll? What is the order of convergence? Isn't the time scale for evaluating fitness some multiple of the species with the longest life span?


Bunc said...

Hi Looney,

One good thing about science is that when are models are not working we can always go back and look at the evidence.

Here is an interesting piece of evidence for you to ponder;

Maybe the Intelligent designer intervened in this experiment?

Looney said...

Bunc, this is certainly a more challenging article in the link. Tests like this cause many questions. In my own profession, I am always asking questions to try to find out what really happened and whether or not the observer really isn't just being mislead. For example, we have a "deleted gene" at the beginning of the experiment and apparently something new in the same place later on. There is a lot that isn't being told in this article and we would need to sift through the original papers in all their technical detail rather before accepting prof. Miller's 2nd hand interpretation.

The other point is that there is already a large amount of additional control machinery in the bacteria for feeding and only one part was deleted (per this article). For example, in the fourth paragraph, prof. Miller says "They made it by tinkering with another gene, in which a simple mutation changed an existing enzyme just enough to make it also capable of cleaving the bond that holds the two parts of lactose together." In other words, it is likely that they knew what the outcome of this experiment would be before hand because the bacteria had nearly identical systems. Thus, Behe could be refuted on a technicality, while Behe's main thesis remained untouched.

That is just some speculation, since it is not my field. The point is that it is extremely easy to mislead as we get into complex experiments.

Todd said...

You seem to be missing the main point of the review. Behe is making simplifying assumptions about how evolution works and the fundamental nature of the fitness landscapes. As you stated, this is not necessarily a bad thing. If you are asking specific questions where these assumptions do not greatly impact the results, that is fine. If you are trying to make a problem that is currently intractable by humans solvable, that is fine. But that is not what Behe is doing, if the review is accurate. What Behe is doing is making all of these simplifying assumptions, and then using these simplifying assumptions to argue that evolution cannot work. He has constructed a strawman version of evolution and then attacked that. To continue with your engineering examples, think again about a car engine. If you want to analyze the force on the piston at a given moment of time, it is not an incorrect assumption to say that the piston does not move. Allowing the piston to move would make the problem much more difficult to solve and would add nothing to the solution you are looking for. If that was what Behe was doing no one would object. However, if you then take that simplifying assumption and use that to argue that because the piston does not move the engine cannot operate, then the assumption is no longer valid and leads to a demonstrably wrong conclusion. That is what Behe is doing. He is making simplifying assumptions that could be useful in specific situations but then using those assumptions to argue the system cannot operate under any circumstances.

To give more specific examples, lets talk about the static fitness landscape. You say, correctly, that having a stationary fitness landscape makes it much easier for humans to solve evolutionary problems. That is good if you are looking at how humans solve problems. But Behe does not do that. He uses this assumption to argue that evolution cannot work in nature. What simplifying assumption humans may use to solve our problems does not make the slightest difference to nature. In nature, the fitness landscape is not static. Therefor basing your argument on the assumption that it is static, although making the problem easier to analyze, also means the problem no longer bears any relation to the real world. Therefor, except in very narrow and specific situations, any answer you get based on that assumption is not likely to be valid.

The same goes for his definition of evolution. His definition of evolution is formulated in such a way that it leaves out a number of important mechanisms that contribute to evolutionary change and can help get a population out of a local maximum. He ignores those mechanisms completely. If you want to look at one specific evolutionary mechanism and see how it works leaving out the others is fine. But Behe once again does not do that. He uses his simplified version of evolution as the basis to argue that evolution cannot get organisms out of local maxima. But it is the very mechanisms that he ignores that allow organisms to get around this problem. Another anology would be ignoring friction. Ignoring friction can make physics calculations easier. But using calculations that ignore friction to argue that a car cannot have traction is silly. That is basically what Behe is doing, ignoring mechanisms that make the calculations harder but are necessary for the system you are trying to analyze.

Finally, the same goes for discrete vs. continuous domains. The effectively continuous domain is one of the bases on which Behe's argument is built, since it prevents organisms from making discrete jumps, but that is not how things are in the real world. Large discrete jumps are both possible and in fact quite common. So basing an argument on them not existing will lead to the wrong conclusions unless you are very careful about the questions you are asking.

So in the end, Behe is making a lot of assumptions that in certain contexts may not be a problem. But he is then using those assumptions to argue that the system cannot work even in situations where those assumptions have large impacts on the results. That is not valid. He is construction a strawman, a false version of evolution that is simpler, easier to understand, easier to attack, but does not reflect how it actually operates in the real world. And that is the problem.

Looney said...

Bunc, I checked some more on the net. It seems that there was more to Hall's paper and experiments that Dr. Miller didn't report. We would need to go back to the original paper, since skepticism of all sides is warranted.

Looney said...

Todd, I did not read Behe's book, so I don't want to look too much at what he said. On the other hand, I think my review of what Chu-Carroll said is accurate and doesn't needs any further support.

Fitness design spaces have astronomical numbers of maxima. If you are close to one, then evolution is useful (i.e. micro evolution). If you are far from one, then you need an engineer (i.e. macro evolution).

The Hall experiment(s) reported by Miller actually proved just what I said, but Miller left out critical bits of info on the follow on experiments.

Anonymous said...

If a fitness landscape is impossible to observe, model or reconstruct, is it still science? Or is it Naturalist Philosophy?

Looney said...

"If a fitness landscape is impossible to observe, model or reconstruct, is it still science? Or is it Naturalist Philosophy?"

If you pick up a book on Operations Research, you will find that it is quite a well developed concept and extremely valuable in engineering.

Bunc said...

Hi Looney,

I read your contribution on Mark Chu-Carroll's post. As you know I make no pretence of taking either of you on in terms of actual detailed maths - but I dont think this is necessary anyway as there are more fundamental issues at play.

As I made the point to you before the mathematical modelling of evolution via design space and fitness seems to me a reasonable approach in trying to improve understanding of the interplay of randomness and selection (I simplify outrageously)

However I must say that I find Chu-Carroll's argument that the fitness space must be dynamic very compelling.

The point was made in the comments on his site that even looking at things like the weather ( or better climate change) one can see that what may be maximally fit at one time may not be so at another.

As was pointed out (also by me here non-mathematically) the interactions within and across species and with the broader changing environment and the variable availability of niches mean that evolution and natural selection take place as an incredibly complex parallel process.

Whether or not any particular mathematical model entirely satisfactorily decsribes this is besides the point. We observe it happening. We see evidence in the fossil record. We can produce it over short timescales in the lab.

An analogy here could be observing the rotation of heavenly bodies. At one time this could be observed and the first mathematical models aproximated the motion. We did not seek to refute that there was rotation of the bodies simply because the maths didn't exactly work to the nth degree.

We start with the observations and seek to refine our models until they describe the observations as accurately as possible. At the same time we look for the mechanisms behind the process. In astronomy this resulted in Newtonian mechanics and later Einstein's more detailed mathematical modelling of space time.

In biology we have vast sets of data entirely consistent with a theory based on reproduction with variation and selection (of various types).
What do we do if our models dont quite fit the observed evidence? Refine the models! We dont simply try to ignore the evidence!

As always best wishes though! I have also posted this at Chu-carrol's site.

Looney said...

Bunc, as I said, the fitness space is only dynamic because of the frame of reference. If you use the "balanced ecosystem" paradigm, then you have the various sub-species pursuing their niches within a static framework.

Within the dynamic, individual species reference, it is impossible to analyze the outcome so more results become plausible. From the static global perspective, all of the mathematical tools immediately become available. In a way, both viewpoints are valid.

Regarding the fossil record, the evolutionist are currently stuck with how to resolve "punctuated equilibrium". Although they have publicly asserted that the fossil record proves evolution, the "punctuated equilibrium" discussion proves this to be false.

Bunc said...

I may misunderstand your paradigm but who says that a "balanced ecosystem" is a reasonable description of the circumstances in which evolution occurs?

Patently I would have thought this would not be the case.

In ecology we learn that the is "succession" (if I remember the term correctly) of communities of species. Ecosystems are generally not static although they may appear so on human timescales. But evolution occuiirs on timescales in which ecosystems are patently not "balanced" - the climate changes, the geology changes etc etc.

What was advatageous at one point may become handicapping at another. Equally what was a moderately diasvantageous version of a gene in one circumstance may become advantageous in another.

So at best any model built around a "balanced ecosystem" paradigm is predestined to come to the conclusion that everything will stay the same by and large. (although if I recall correctly things like genetic drift still arise).

I have made the point about the changing interplay of forces within which evolution takes place an number of times and it doesn't seem to me that you are addressing that point.

I have some understanding of genetic and evolutionary algorithms - enough to know that they barely scratch the surface of the complexity of variables which come into play when nature decides who lives and who dies in each species as they compete for space,food,shelter,mates and try to avoid getting eaten, parasistised etc etc.

I am not aware that any current computing models even remotely approach handling that kind of complexity of interaction.

As for punctuated equilibrium its main proponent , Gould , was very much a Darwinist in that it is still based on natural selection as the mechanism and still accepts and is built on evidence of evolutionary development of species.

Gould lays more stress on stop / start bursts of evolution and speciation than the more traditional Dawkins approach of gradual evolutionary change. Punctuated equilibrium in no way supports ID.

Bunc said...

Apologies I am getting dyslexic in my old age.

The third para in my last comment should of course have read "... about the "succession"..."

...and the rest of my spelling is pretty atrocious too!

Looney said...

"Gould lays more stress on stop / start bursts of evolution and speciation than the more traditional Dawkins approach of gradual evolutionary change. Punctuated equilibrium in no way supports ID."

This is really beside the point. Any resolution of punctuated equilirium must postulate that macro evolutionary change is impossible to see in the fossil record. Certainly this fact alone doesn't preclude macro evolution as Dawkins has argued. I don't dispute that point. Puntuated equilibrium and fossil proof of macro evolution remain mutually exclusive assertions.

Regarding the changing ecosystems, I can agree that this occurs. On the other hand, the elements of the changing ecosystems are usually there over long periods of time. The primary argument for dynamic fitness, however, is the predator-prey paradigm within a relatively unchanging environment, and secondarily the synergy between different organisms. These are both made analyzable by the usual methods.

bunc said...

Looney - I will come back at you later re punctuated equilibrium and other points. I have been following things also on DR CC's blog and as you will have noticed I stumbled on you on a post in another blog

our discussion has become quite spread out over various posts. I have started drawing together your various points and what I see as the main responses to them and I am going to post them in a table form.

I was going to try to also pick up issues you have rasied on the other blogs and what appear to be the key responses.

You can then tell me if I am representing your position appropriately on each point and we could pick up the thread at that point?

However as I also need to earn a living none of this will happen in the next 24hrs!

By the way I am going to take some pictures of the Ayrshire hills soon and post them up so you can visualise running in soggy wet heather with the midgies chasing you!

Bunc said...

Looney, leaving aside our debate for a second (I still want you to address the points I made though :-) )

I have been interested in the issues re GA /evolutionary algorithms and what I understand to be the more traditional ( I dont mean that pejoratively) more formal algorithms that I think OR would be using.

My understanding is that the former use a mixture of stochastic and selective methods together with some predefined "fitness function" wheareas your appraoch would be based on seeking an optimal mathematical solution route to the optimization problem.

Would that be a fair summary? I am not a mathematician but my feeble brain usually manages to follow the main thread on things like this. ( I even managed to visualise your n dimensional fractal for about thirty seconds but had to stop in case my brain exploded)

My question - assuming that was a reasonable summary - are there any hybrid optimisation models which attempt to build in the strengths of each appproach?

My understanding ( from reasonably wide but not highly technical reading) is that GA models tend to perform best where;

1) the problem is perhaps not easily tractable to a formal solution and
2) the issue is not finding necessarily the theoretically best optimisation but one that is workable and as fit as possible by the use of a more general method.

Example - I am a very poor programmer but wanted to learn about AI/GA. I obtained a public library of AI /GA software and was able to code a GA programme to learn a new pattern recognition task. Had I tried to analyse and implement this by a formal mathmetical approach I would have been unable to do this I think.

Anyway - back to my main question - are there any methods which have attempted to build in both approaches? Would that be possible?

Looney said...

Bunc, it seems you are getting a grasp of things. There are a multitude of models and OR is a vast science that includes stochastic techniques of which evolutionary types are a set. We mix and match bits and pieces of diffent techniques as well as alternate between techniques to deal with intractable problems. I don't reject the usefulness of genetic algorithms nor their usefulness in the limited adaption cases that are encountered in biology all the time. Knowing the limits of various techniques is critical to the science.

Bunc said...

I understand your point about mixing and matching models. When faced with a particular "problem space" (ok probably not the right term but hopefully you get my drift) your task will be to understand the nature of the problem and then design an optimal solution using methods best fitted ( or which can be best adapted) to that problem.

I want to come back to the issue of GA and EA here but leave the God/ID/Creation bit aside and just look at it as a modelling problem.

Current modelling of GA is limited by the fact that it is a software activity and this by its nature requires a programmer to predefine the problem space within which the GA will operate and predefine a fitness function which will be used to select from the candidate population of solutions.

But this doesn't model the real world very well - it requires the software designer to pre-design the problem space to a greater or lesser extent. It models the essential core effects of reproduction variability selection quite well but it doesn't I agree model at the macro level very well.

The problem is that to model at the macro level one would need to remove some of the a priori constraints and pre-determined fitness functions - and allow the model to itself evolve a best fit to how it approached the problem space and it would need to be able to do this with novel problem spaces.

I am not sure if I am getting this across clearly. Thats why I am asking about hybrid models. I am going to get wildly speculative here so bear with me and dont laugh too much. I am just following a train of thought.

Lets say we had such a hybrid and its genetic "code" coded for various types of methods, each of which would be brought to bear on the problem space as determined by the presence or absence of that part of the code in an individual candidate "solution". The sequence of the code would presumably represent the sequence of methods ( or parts of methods) that were brought to bear.

We start with a randomly generated population of candidate solutions, each of which generates a sequence of operations as represented from its own code. (These operations being themselves various formal methods which the code represents).

Now we let this population "loose" on the problem and select the best candidates to breed the next round. ( recombination, reoordering, some mutation etc all to avoid getting stuck).

Would we have potentially created a more "generalised" problem solver by such an approach? ( I am not suggesting a universal problem solver as I know there are theoretical limitations on that.)

The problem even with this generalised problem solver ( if it were even possible) is that we still seem to need to predefine the problem space (because presumably we must still order/structure the data for it) and we also seem to need to still predefine the fitness test or how else can we select among the candidates in each generation?

Following this thought experiment leads me to the conclusion that there will always be an inherent problem with software modelling of natural selection. That is - we must always pre-model the space and the criteria for fitness.

It doesn't mean that modelling of stochastic/selected processes has no merit. It will show still show us problems where this approach has use and also the limits of such software approaches.

But in terms of modelling the real world of biological entities and deciding if we can model how "micro" evolution stacks up to "macro" evolution we will only have good models when they 1) can function within a less precisely predefined "problem space" and 2) do not have tightly predefined fitness criteria.

Intuitively it seems to me that this may be inherently impossible using the types of computing approaches we have at present.

Any thoughts on this? (assuming you can make sense of what I have said above?)

Looney said...

Yes, it is good to evaluate the effectiveness of methods independent of any application. The effectiveness of genetics relative to other methods is highest for certain types of fitness spaces. In particular, n-dimensional fractals with noise are where they do best - relative to others. All methods become less efficient in this circumstance, it is just that genetics doesn't lose efficiency as fast as the other methods. That is why genetics look good for the traveling salesman problem, but hopelessly inefficient on traditional continuous variable optimization problems.

When I am confronted with noisy design spaces, I usually switch to a stochastic analysis method that permits me to fit a smooth surface. Then I can use my calculus based methods to get long term (macroscopic) trends. The process of looking for the trends means accumulating a history which is what genetic algorithms can't do.

Thus, genetics may well be the most efficient method for micro evolution of life. The nature of protein encoding almost guarantees the fractal type fitness space while the randomness of nature throws noise into the system. I don't think a better algorithm than genetics could be designed for micro evolution of biological systems.

What makes genetics most efficient in the micro case, however, is what causes it to be most implausible in the macro case. An n-dimensional fractal design space with noise is the worst case scenario for macroscopic evolution. Thus, Dawkins and other researchers assume the simplest macroscopic fitness spaces possible: A flat plane or a constant slope surface with a single maximum.

A method that has no way of identifying long term trends simply can't go very far and becomes astronomically less efficient as the number of variables (genes) multiplies. I agree with Dr. CC that a proper understanding of what genetics can and can't do is important for medicine.

Bunc said...

I think the problem here is that we are talking about two different things.

I am talking about evolutionary models as implemented by software designs and I am suggesting that these may have inherent limitations.

You are making the leap ( again!) to seeing this as evidence against biological macro-evolution.

I need to point out again that "the map is not the territory". the limitation of a model in being able to model a real world phenomena is not evidence that that real world phenomena does not exist!

Re my argument about software limitations, the problem I see is I think related to the "No Free lunch" argument and relates to whether additional information can be created within a system.

It seems to me that this argument has some force when we apply it to software models which by their nature are very circumscribed in terms of their information content are are in effect highly closed systems. It is quite fair I think to argue in relation to these that the information content is to some degree "designed in".

ie in relation to my previous argument we have no "general problem solver" and we must design the solution to fit the problem. Even GA's fit this description (although they are perhaps less constrained than other approaches.)

However the real world is not a closed system in this way. Information (in the form of energy flows) is generally available and the system ( when considered at a scale below the universe) is for life an open one. This is why I see great difficulty in succesfully modelling full evolution within software.

To do so we would essentially have to create what looked like an open system without any strongly pre-defined fitness criteria.

Let me pose you a roughed out example of what this might look like and you will see the contrast with current GA's.

Imagine a class of "benign" computer viruses which were designed with a genetic code subject to variation. Our "benign" viruses could replicate and spread.

They are not subject to any pre-determined fitness criteria (of course the benign restraint does fall into that category but we dont want the experiment to crash our computers!!).

They are selected only by the operators of computers; operators are free to select using any criteria they wish pleasing, useful, etc etc. The users in this case act as the selecting agency much like the environment in the real world.

Those that survive and reproduce will then be those who are fittest and best adapted to this environment - but we would not have pre-designed this into the system because we could not exactly predict what traits the environment would be selecting for.

Even this approach would not exactly model an open system like the real world but it would be a closer analogy I think.

As for micro and macro evolution - you will be aware that this is a very woolly construct and that essentially all the evidence available suggets that larger scale evolutionary changes are simply stacked up from smaller "micro" changes. Your n dimensional fractal fitness landscape of course only proving to be problematic if you assume the fitness landscape is static ( and we'll not rehash that one)

There are of course debates within biology about the rates of evolutionary changes over time ( eg the punctuated equilibrium debate) but none of this calls into question the fundamental mechanism.

I can do no better than to quote from Mr Punctuated equilibrium himself Gould on that who wrote;

" biologist has been led to doubt the fact that evolution occurred; we are debating how it happened. We are all trying to explain the same thing: the tree of evolutionary descent linking all organisms by ties of genealogy. Creationists pervert and caricature this debate by conveniently neglecting the common conviction that underlies it, and by falsely suggesting that evolutionists now doubt the very phenomenon we are struggling to understand."

Which takes me back to a point I made earlier. Whatever the areas of debate within current models and our attempts to refine them this in no way refutes the essential core of evolutionary theory.

Suggesting that such debates leave room for IDer's to come in and trample the evidence with theories of "supernatural designer" intervention moves us out of scientific discourse and into the realms of metaphysics.

Imagine that,following Netwon, we had discovered that newtonian equations were not exactly describing some planetary motions.

Your approach would be to say " Ah see Newton is wrong so the designer must have done it."

Fortunately we had scientists who pursued such issues scientifically. The end result? We know that at human scales newtonian physics is a good description of mechanical and planetary systems but that a more fine grained theory ( einstein space time) gives us the fuller and more accurate picture. Newtonian mechanics is not "wrong" merely an incomplete but useful appproximation.

What you describe as "woolly theories of evolution" are of this type. Natural selection encompasses a number of selective mechanisms which have become apparent with time. We have discovered the mechanisms of heredity and these have been built into the "modern synthesis". We are even able to model selctive mechanisms in software and produce approximations of what we see in real life. All our evidence is consistent with the core mechanism of reproduction with variation and selection.

Why then should we abandon this theoretical construct simply to introduce some "God of the gaps"?

The shame is that in the US there are skilled people like yourself who if they applied their skills appropriately might actually contribute to the next wave of development of evolutionary theory and modelling.

Instead you hide in fear from well established facts and one of the most powerful explanatory theories in modern science.

Bunc said...

Sorry - me again.

There is a very useful summary of a wide range of evidence for Macro evolutionary change in te real world (rather than those pesky software models).

If you havent read it I would suggest this this is worth reading when you have time;