The last time a computer famously beat humans, the domain was chess. The thing that was most interesting about that victory is the dichotomy between humans and machines. The way a human plays chess and the way that a chess-playing computer plays chess have nothing in common. The computer uses a brute-force technique that looks at billions of possible moves to pick the best moves. A human can’t do anything close to that, so instead looks at many fewer possibilities through a lens of strategy and tactics. Technology has advanced to the point where it is now possible for a laptop computer to use brute force to beat most normal human players.
IBM’s Watson computer is more complicated than a chess program, but its victory at the game "Jeopardy!" is similar to the chess situation: The techniques used by Watson to answer questions have nothing in common with the way humans answer questions. Yet Watson’s technology is quite interesting, and it is likely to open up practical applications in a way that chess programs never have.
Watson does use a brute-force approach to play "Jeopardy!" IBM started by feeding Watson normal text information. So they fed it all of the English version of Wikipedia as normal text (about 14 billion characters of text), another encyclopedia, a complete dictionary, a complete thesaurus, the Bible, a movie database, geographic information, plus lots and lots of other stuff. Approximately a trillion characters of normal text in all went into Watson, which is equal to approximately a million books. If you think about 10 books fitting on a foot of shelf space, we’re talking about 19 miles of books. There was no "markup" done to the text — no separate human-structuring-of-the-data step. From that terabyte of input, the system drew all of its "knowledge" when interpreting and answering questions.
Now there are two big questions. First, how does the machine deconstruct the questions to query this terabyte of information, and second, how does the machine store all of the information so that it is in a usable, computer-accessible form that can respond to the queries? The first area apparently was handled with custom code to do the natural language processing. The second area looks a bit like a search engine, but with a twist. IBM has openly talked about two pieces of software that help with this area.
The most important piece of software is something called UIMA. UIM stands for unstructured information management. Unstructured information is what human beings normally use to communicate. The opposite of unstructured information is structured information, like the rows and columns of information you find in a computer database. If you think about the incoming information fed to Watson — stuff like encyclopedia entries — it was all unstructured. UIMA (A for architecture) helps organize all of that information and structure it in different ways for a computer to use.
The second piece of software is something called Hadoop. This software can help large groups of machines work together to solve problems.
Watson is made up of nearly 3,000 computing cores and 15 trillion bytes of memory. Hadoop got all of that hardware to work together and act like a single machine.
The thing that is most interesting about Watson is the future applications of its technology. Because IBM fed Watson normal text data, IBM suspects that the technology will let them create useful expert systems in specific domains. So IBM could take a knowledge base and create a system that is much better at answering call center questions, or medical information to create a system to answer medical questions, or thousands of law books to answer legal questions. As seen on "Jeopardy!," Watson can search through millions of books, find possible answers and rank them in just a few seconds.
The hardware system used to create Watson looks ridiculously massive right now. However, in just a few years, Watson is likely to fit into a desktop machine that costs $1,000. That is what past technology trends would indicate, anyway. That is why your laptop can now beat you at chess, even though chess computers once filled entire rooms.
And now that it has been proved possible, the algorithms and techniques inside Watson will be tweaked and simplified. There will come a day in the not-too-distant future where question-answering machines will be much easier to create and deploy. In the interim, large machines will be housed in data centers and made available over the Internet. That is exactly how Internet search engines are created today, and search engines are about to get a whole lot smarter.