Saturday, June 28, 2008

Come on, Hank, don't you know we're past that science stuff?

Hank Williams writes one of the more interesting blogs I've seen, Why does everything suck? A lot of his posts are reasonably technical, so the average person stumbling across this blog might look at the first post or two on a given day, decide that Symbian or the Semantic Web are not topics of interest, and move along.

And that would be a shame, because in amongst the interesting technical posts are some very accessible ones, and they generally express my thoughts better than I can. I've linked to WDES a couple of times before, each time to highlight a post I thought was particularly spot on.

Hank, if I might call him that (hey, he can call me Andro), is especially persuasive when taking on the increasingly loopy ideas of Chris Anderson, editor-in-chief of Wired Magazine. Anderson is the fellow who's currently pushing the idea that we're moving to an economy in which everything of value will be free (I referred to Hank's previous post on this here). He's already gotten himself on Charlie Rose to discuss this, even though his book on the subject won't be out until next year, and there's bound to be a lot of buzz around this idea (Wired tends to create its own buzz, so prescient are they thought to be).

Anderson's newest "idea" is that the common axiom that "correlation is not causation" is no longer operable in the Google world. Hank:
And then Chris puts forward what I will call the Anderson Theory. It is based on the idea that with massive stores of data, most notably Google, we do not need such scientific methods any more. With such huge amounts of data we can establish much more detailed correlations, therefore making formal logic and scientific method irrelevant....So there you have it. Correlation is enough. Causation is irrelevant.
Anderson goes on to make the remarkable statement that, since models are imperfect, they can easily be replaced by the sheer weight of data that can now be linked and associated. His primary example is genome decoder Craig Venter's work in sequencing the air:
In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

There is so much wrong with this that I don't even know where to begin. That Venter doesn't know anything about these new species other than they must exist means he hasn't "discovered" anything. Large numbers of the populace believe in fairies or angels based on the same kind of logic. Furthermore, finding vast numbers of correlations without any backing facts is not advancing biology.

I don't want to be too extreme here; I'm not going to argue that there is no value in these kinds of activities. Where it becomes dangerous, and Hank expresses this idea better than I, is the point at which we decide that we have a new scientific paradigm. Why finance traditional research when we can just crank up the supercomputer and "discover" whole new wondrous worlds?

Certainly this form of extreme data mining can provide new lines of inquiry. That Venter is detecting new things in the air may well provide food for rounds of future research, and there's value in that; but Anderson is contending that the science is done, that the vast amount of data sifting closes the process. He's wrong.

I think Anderson is guilty of extending the very popular concept of energence way too far. Emergence is the hot idea that complexity can arise from simplicity through wholly natural processes. It is, in brief, the extension of evolution to other situations, ignoring the incredible timeframe over which evolution has occurred. The most common idea in this space is that, using the Internet, computers will become "intelligent" through their number and interactivity. It has always seemed to me that, without pressure, evolutionary or otherwise, this intelligence is unlikely to emerge - computers do not need to become intelligent in order to survive, if they can even be said to have a survival instinct.

Anderson's premise comes from the same shelf. Truth will emerge from data with no causal component, no simplifying model to get in the way of new reality. (And his not-very-thinly veiled Google love comes out quite inappropriately for a journalist.) It may be true that models are only imperfect versions of reality, and that new techniques and technology may expand our reality, but I feel confident that both of those things can be fit into the framework of science, that science will become stronger as a result.

1 comment:

Hank Williams said...

Of course you can call me Hank, Andro!

And thanks for the link and keeping the conversation going.

Clicky Web Analytics