I had the pleasure of visiting with some fine folks at Cycorp in Austin, Texas recently. Cycorp is interesting for many reasons, but chiefly because they have expended more effort developing a deeper model of common world knowledge than any other group on the planet. They are different from current semantic web startups. Unlike Metaweb‘s Freebase, for example, Cycorp is defining the common sense logic of the world, not just populating databases (which is an unjust simplification of what Freebase is doing, but is proportionally fair when comparing their ontological schemata to Cyc’s knowledge). Not only does Cyc have the largest and most practical ontology on earth, they have almost incomprehensible numbers of formulas[1] describing the world.
Cyc’s formulas define much of what we take for granted, like “birds fly” or “unsupported objects fall”. Cyc handles the exceptions to these defaults, too. For example, things don’t fall without gravity and penguins don’t fly (except under water! And, for that matter, can something fly through outer space?).
Cyc knows enough about the world that it can be overwhelming. If you ask it a seemingly simple question, it can deduce more plausible interpretations than you would believe! Cyc’s knowledge is so vast that you have to tell it that a yellow submarine is not a coward but is a particular color. How else would it know? How do you know?
Actually, you may not have to tell Cyc about the Beatles’ Yellow Submarine for long. Cyc is learning to read. Among the fine folks I met were a few pretty sharp linguistically-oriented artificial intelligence guys. They have a lot of what we built at Haley[2] and are using it with parsing technology that can extract plausible semantics without understanding all the grammatical nuances (or lack of grammaticality) in text.
Because Cyc knows a lot, and because it can make deductions using its formulas and default reasoning, Cyc can determine the plausible interpretations of phrases, such as whether a submarine can be cowardly and whether cowardice is relevant to an interpretation. Perhaps one could conjure up a sentence in which a retreating submarine was described as yellow, but more likely, the color would be related to a sighting of a submarine or a song, either of which interpretation Cyc could provide, no problem. Trust me, if Cyc doesn’t know that yellow things are relatively easy to see under various lighting conditions, it can handle it.
When I asked if he minded my posting this, founder and mentor from my Inference days, Doug Lenat, responded:
Cyc does actually know that only IntelligentAgents can be cowards, so a “yellow notebook” couldn’t be a coward, but then again submarines are so complex and autonomous that they are intelligent agents, in the ModernWarfare microtheory, so the ambiguity does indeed come back in after all!
Keep your eye on Cyc. It has been a long time coming, but the acceleration of the semantic web is not only driving the development of natural language processing technology, it will shortly start driving the need for much more reasoning and knowledge. Existing semantic web technologies only address ontologies. RIF will begin to address the logic that Cyc has already superceded. But no one is close to the knowledge that Cyc has ready to go.
[1] Cyc formulas and reasoning covers the practical aspects of first order logic and adds some interesting extras, especially micro-theories, which are a generalization of contexts or other approaches to hypothetical reasoning, such as using an assumptive truth maintenance system. The term ATMS reflects the work of Johan de Kleer, but the viewpoints introduced before then by Chuck Williams, one of my partners in crime in ART, are more general. Cyc takes that up and sideways a notch or two. This is fascinating technology that I will try to cover in the future.[2] that is, they built it on their own, independently, but it was stunningly similar, essentially because it’s essential!