Watson's Jeopardy win, and a reality check on the future of AI

So Watson beat the two best Jeopardy champions at their own game. What now?

Call me cynical, but as someone who has undertaken machine learning research for 14 years, the Jeopardy result is really not that surprising -- you would win too if you had the equivalent knowledgebase of all of Wikipedia at your fingertips for instant recall, and if you had a huge buzzer advantage by being able to process individual pieces of information in parallel at much faster rates than the human brain!

But there is a much deeper problem with some of the media pontification about the future of AI and machines taking over the world: try asking Watson how "he" feels about winning.

Watson's learning model is currently (only?) really, really good at figuring out what question you were asking given an answer to a general knowledge question. I'm sure there are lots of reusable pieces of the Watson system (some natural language processing (NLP) code, etc.). But what the mainstream media doesn't seem to understand is that it would be an enormous stretch to say that the system could simply and easily be applied to other domains.

The promise of machine learning is that algorithms should in theory be reusable in many situations. The Weka machine learning toolkit, for example, provides a generic ML framework that is used for all sorts of things. But extracting the right features from your data, and deciding how to represent them, is a huge problem on its own, and can be tackled completely separately from the learning issues. (This is all further muddied once you throw in NLP.)

Today most of the feature selection for any given learning task is done by hand engineers. An AGI (Artificial General Intelligence) would have to do that itself. We don't have much of a clue yet how to teach an AGI how to pick its own reasonable and useful feature sets in a totally generic or smart way. But it's quite easy to show that, for most complex datasets, your feature selection strategy is almost as important as, or more important than, the exact machine learning algorithm you apply.

What very few people appreciate is that machine learning has so far amounted to little more than learning arbitrary function approximators. You learn a mapping from a domain to a range, or from an input to an output. Minimizing the classification error is the process of refining that function approximation to minimize error on as-yet unseen data (the test dataset, i.e. data that was not used to train the previous iteration of function approximation). Because all machine learning algorithms (as they are currently framed) are basically just trying to learn a function, they are all in some deep sense quite equivalent. (Of course in practice, not all algorithms even work with the same data types, so that's why this is mostly only true in the deepest sense, but there has been quite a bit of work done to show that at the end of the day, most of today's machine learning algorithms are basically doing the same thing with different strengths and weaknesses.)

Incidentally, the fact that the whole field of machine learning is about learning arbitrary function approximators is pretty much the whole reason that a lot of people in CS learning theory don't really talk about AI anymore, only ML. There's nothing much intelligent about machine learning as it stands currently. I heard it said that CSAIL (the CS and AI Lab) here where I work at MIT is only still called CSAIL in deference to Marvin Minsky and the glory days of AI, and that a lot of people don't like the name and want to change it when Marvin finally totally retires. (That probably won't happen, but the statement alone was illustrative...) We need a complete revolution in learning theory before we can start to truly claim we're creating AI, even if the behaviors of ML algorithms feel "smart" to us: they only feel smart because they are correctly predicting outputs given inputs. But you could write down a function to do that on paper.

I'm not claiming we can't do it -- "It won't happen overnight, but it will happen" -- I'm just stating that ML and AI are quite different, and we're very good at ML and not at all good at AI.

Efforts to simulate the brain are moving along, and Ray Kurzweil predicts that in just a decade or two we should be able to build a computer as powerful as the brain. While that may be true in terms of total computational throughput of the hardware, there is no way to know if we will be able to create the right software to run on this hardware by that time. The software is everything.

One of the problems is that we don't know exactly how neurons work. People (even many neuroscientists) will tell you, "of course we know how a neuron works, it's a switching unit, it receives and accumulates signals until a certain potential is reached, then it sends on a signal to the other neurons it is connected to." I suspect in several years' time we will realize just how naive that assumption is. For now, there are already lots of fascinating discoveries made that show that things are just not that simple, e.g. (hot off the press yesterday): http://www.eurekalert.org/pub_releases/2011-02/nu-rtt021711.php

From that article:
> "It's not always stimulus in, immediate action potential out. "
> "It's very unusual to think that a neuron could fire continually without stimuli"
> "The researchers think that others have seen this persistent firing behavior in neurons but dismissed it as something wrong with the signal recording."
> "...the biggest surprise of all. The researchers found that one axon can talk to another."

This is exactly the sort of thing that makes me think it's going to take a lot longer than Ray predicts to simulate the brain: we don't even know what a neuron is doing. A cell is an immense, extraordinarily complex machine on the molecular scale, and simplifying it to a transistor or thresholded gate is not necessarily going to produce the correct emergent behavior when you connect a lot of them together. I'm glad people like the researcher conducting the above research are doing some more fundamental work into what a neuron actually is and how it really functions. I suspect that years down the line we'll discover much more complicated information processing capabilities of individual cells -- e.g. the ability of a nerve cell to store information in custom RNA strands based on incoming electrical impulses in order to encode memories internally [you read it here first], or something funky like that.

Of course even a simplified model is still valuable: "Essentially, all models are wrong, but some are useful" (--George E. P. Box). However we have to get the brain model right if we want to recreate intelligence the biologically-inspired way. Simply stated, we can't predict what it will take to build intelligence, or how long it will take, until we understand what it actually is we're trying to build. Just saying "it's an emergent property" is not a sufficient explanation. And emergent properties might only emerge if some very specific part of the behavior of our simplified models works correctly -- but we have no way of knowing which salient features must be modeled correctly and which can be simplified.

But a much bigger problem will hold up the arrival of AGI: not only do we not know how single neurons really work, we have NO CLUE what intelligence really is. And even less clue what consciousness really is. And the problem with Ray's predictions is that even though we can forecast the progress of a specific quantifiable parameter of known technology, perhaps even if the exact underlying technology that embodies the parameter changes form (e.g. Moore's Law continued to hold across at least 50 years, even across the switch from vacuum tubes to transistors to silicon wafers etc.), we can't forecast the time of creation or invention of a new technology that is for all intents "magic" right now because we still don't know how it would work. In fact we can predict the arrival of a specific magic technology about as well as we can predict the time of discovery of a specific mathematical proof or scientific principle. Nature sometimes chooses simply not to reveal herself to us. Can we even approximately predict when we will prove or disprove P=NP or the Goldbach Conjecture? How much harder is it to define intelligence (or even more so, consciousness) than to prove or disprove a mathematical statement?

Finally, and most importantly, somebody needs to get Watson to compete in Jeopardy against Deep Thought to guess the correct question to the answer 42...


  1. Well said Luke. Just a couple of observations to add.

    First, I think Watson may be a very useful new paradigm for search, much better than PageRank for a whole class of search problems. Machine learning may finally get its day in a leading search platform, and if that happens, I'll be very happy to see it, as it will empower a whole lot of people.

    Second, we have some very interesting and highly testable ideas about what consciousness is: neural synchronization. Check out Gyorgy Buzsaki's Rhythms of the Brain, 2006, for more on this. http://www.amazon.com/Rhythms-Brain-Gyorgy-Buzsaki/dp/0195301064

    We're living in amazing times, but these domains are still sorely underfunded, relative to their great benefit to all of us.

  2. Thanks John, a good observation about the value of Watson to search.

    I bought Buzsaki's Rhythms of the Brain when you recommended it to me at WFS in Boston. It's a remarkable book, although I haven't finished reading it. And clearly the brain is doing some interesting things with gamma wave synchrony. However I don't get the jump in logic from neural synchronization to consciousness. It seems impossible to infer that neural synchronization gives rise to consciousness when we can't actually define what consciousness is. Defining consciousness as what emerges from neural synchronization is circular logic. More likely, gamma synchrony is more like a carrier wave of some sort for the brain, used for bookkeeping when brain regions are idle like a RAM refresh cycle. Local involvement in global synchronization is reduced when local activity levels go up.

  3. Thanks Luke!

    RE: Watson and search. Thanks for that. I hope Google will create a new Google Answers platform using this approach. They could roll it out a domain at a time, with press around each upgrade, and use the humans who respond to the answers offered to really increase the confidence intervals. If they don't, I think someone else (Bing? IBM-Yahoo?) might take away their lunch.

    RE: Neural synch and consciousness. I'd suggest we don't need to define consciousness to discover its correlation with measurable phenomena, and in order to infer causality. We may have to (one day) build a system with the right basic features and watch consciousness emerge. Such an empirical demonstration could be the first definitive proof. For me and others (see JA Scott Kelso, below), neural synch, which is variable and fleeting, is a very plausible candidate for higher awareness, which I consider very weak and recently evolved. We humans aren't conscious in slow wave sleep, we have a reduced form of it in REM, and even during our "waking" hours, our consciousness is constantly interrupted by daydreams, absentmindedness, etc.

    Fortunately, lots of interesting empirical tests are possible today and in the next few years. If we can use reversible interventions, both local and global, to alter and disrupt our neural synch (anesthesia, TMS, cooling, etc.) and if some of those interventions directly couple with alteration or loss of certain attentional and awareness capacities, that would put it on my critical feature list for any future artificial neural networks and other biologically inspired machines.

    The big problem, as you know, is raising awareness of and funding for these promising research domains. They are largely under the radar at present.

    For folks who want to know more, besides Buzsaki I'd recommend JA Scott Kelso http://en.wikipedia.org/wiki/J._A._Scott_Kelso, who investigates neural coordination dynamics and who has also said neural synch may mediate feature binding (awareness). His work is seriously underfunded.

    More recently, Laura Colgin has done exciting work on gamma synch in rat hippocampus. Laura Lee Colgin et. al. Frequency of gamma oscillations routes flow of information in the hippocampus. Nature, 2009; 462 (7271):353.

    Warm Regards,

    John M. Smart
    President, Acceleration Studies Foundation
    216 Mountain View Ave, Mountain View, CA 94041 | 650 396-8220
    accelerating.org | Can You Help? http://accelerating.org/donations.html

  4. Hi Luke! Thanks for re-tweeting my latest Econosystemics.com essay (On Watson and poverty). Nice "Hutchison bump".

    I must take issue with your suggestion above that we "have no clue" about what intelligence or consciousness are. Surely just an emphatic turn of a phrase, and not meant literally? We have to have some "clues" or we wouldn't have the words!

    Intelligence, in particular, can be technically defined, as something like "Intelligence is the ability of certain systems (primarily if not exclusively biological) to receive information about their dynamic environment, evaluate that information using individual (and often collective) biases and algorithms (both instinctual and learned), detect threats and opportunities as measured against instinctual and/or adopted goals, hypothesize and evaluate possible scenarios based upon probabilistic analysis of goal satisfaction or threat avoidance, execute actions based on that analysis, and reflectively adjust learned algorithms based upon perceived environmental responses."

    Consciousness is open to more debate, but certainly we have clues, like:

    "I am conscious when I am awake, semi-conscious when dreaming, and not conscious at other times.

    "I can, at a particular time, be apparently conscious to others, and yet not remember anything later (due to injury or drugs).

    "I am conscious of being conscious, and this fact seems to be essential to my sense of identity"

    The last clue is particularly intriguing, as it suggests a recursive process -- or, as D. Hofstader says, "I am a strange loop".

    I think Dennett also gives us some good clues that what we call consciousness has a lot to do with multiple, overlapping, and constantly re-edited narratives.

    Anyway, it is a fascinating puzzle, all the more because we have tantalizing clues.

  5. Great post, I agree with almost everything that has been said. I always like to say that "the only thing that grows exponentially are Ray's predictions". It's not true, but its 0.1 true and it's kinda funny. Well maybe not.

    But anyway, I say *almost* because I think that you don't make a good point about Watson. Just having access to large amounts of data doesn't mean anything: you must be able to process all of it-- that's the hard problem! And I think the IBM team did a good job on that. Sure, many aspects of Watson are not novel and have been done before in NLP+ML, but nobody has been able to tie everything together into a coherent working whole like they did. It's a huge engineering effort and I was surprised to see how well it worked. It gives me hope. Granted, the buzzer issue was annoying.

    As a side note, I am also very skeptical of General AI and all that. I call ML "appearance modelling". You model appearance of data. Humans do something strictly more powerful, because they are able to model causes of data. They can infer generative models. The work of Josh Tenenbaum gives me hope in this area. He's one of my top top hero's, he'll figure it out :)

  6. Giving the matching questions to answers doesn't necessarily imply there's any real *understanding* going on. But understanding is hard to even define. It's still a useful trick, though it doesn't necessarily say anything about consciousness. It's also not clear that the techniques in Watson will *easily* extend to other domains -- though every little bit we learn about knowledge representation and inference helps. [PS Josh is a smart guy, I agree.]