A panel on whether or not ontology is needed to achieve a collective vision for the semantic web was held on Tuesday at the International Semantic Web Conference (ISWC 2009) near Washington, DC. For most of the panelists the question was rhetorical. But there were a few interesting points made, including that machine learning of ontology is one extreme of a spectrum that extends to human authoring of ontology (however authoritative or coordinated). Nobody on the panel or in the audience felt that the extreme of human authored ontology was viable for the long-term vision of a comprehensively semantic and intelligent web. It was clear that the panelists believed that machine learning of ontology will substantially enrich and automate ontology construction, although the timeframe was not discussed. Nonetheless, the subjective opinion that substantial ontology will be acquired automatically within the next decade or so was clear. There was much discussion about the knowledge being in the data and so on. The discussion had a bit of the statistics versus logic debate to it. Generally, the attitude was “get over it” and even Pat Hayes, who gave a well-received talk on Blogic and whom one would expect to take the strict logic side of the argument, pointed out seminal work on combining machine learning and logic in natural language understanding of text.
David Karger of MIT’s AI lab challenged the panel from the audience by asserting that the data people posted on the web is much more important than any ontology that might define what that data means. This set off a bit of a firestorm. There was consensus that data itself is critically important, if not central. For the most part, panelists were aghast at the notion that spreadsheets of data would be useless to computers unless the meaning of its headings, for example, were related to concepts defined by reference to ontology those computers understood.
With respectful deference, the panel and audience yielded. Sir Tim Berners-Lee took the floor.
The issue of semantics briefly faded from the discussion. Utility seemed to be the crux of the matter. Sir Tim illustrated how even the smallest bit of semantics (i.e., meaning by reference) added to a spreadsheet allowed others to quickly and incrementally, almost continuously add value to published data. He did this by way of example, discussing mash-ups of bicycle accident, traffic, and map data by different people over the course of a day or so after someone first published the bicycle accident data. Most interesting to me, however, was his concluding point: simply identifying what something is by reference to an existing semantic web concept makes data much more immediately consumable, useful and valuable. For example, simply identifying that a column is a longitude using RDF adds a lot of value. Yes, I realize mash-ups are old, and linked data is well known, but his point was delivered in such a straightforward and compelling manner that the argument simply passed. He put a lot of experts eyes back on the ball.
Sir Tim reiterated this point an hour or so later at a meet-up of meet-ups. His consistent, critical point was that lightweight use of ontology realizes a great deal of value. Simply using RDF to anchor the semantics of linked data, without substantial ontology development, makes data much more useful to humans. And with regard to enterprises, he discussed how eleven (yes, he said 11) different concepts are enough to address most of the semantic needs of various enterprises (i.e., their relational database models). Of course, they will receive more value from reusing existing ontology, extended according to their needs. His point was that the small step of simply linking data with open web semantics using only the most widely adopted RDF identifiers is a huge step forward for the semantic web and the benefits its technologies can bring to individuals and enterprises. Full-fledged ontology is important for deeper functionality and long-term visions, but simply using concepts from existing ontology can be a huge step forward.