Knewton is an interesting company providing a recommendation service for adaptive learning applications. In a recent post, Jonathon Goldman describes an algorithmic approach to generating questions. The approach focuses on improving the manual authoring of test questions (known in the educational realm as “assessment items“). It references work at Microsoft Research on the problem of synthesizing questions for a algebra learning game.
We agree that more automated generation of questions can enrich learning significantly, as has been demonstrated in the Inquire prototype. For information on a better, more broadly applicable approach, see the slides beginning around page 16 in Peter Clark’s invited talk.
What we think is most promising, however, is understanding the reasoning and cognitive skill required to answer questions (i.e., Deep QA). The most automated way to support this is with machine understanding of the content sufficient to answer the questions by proving answers (i.e., multiple choices) right or wrong, as we discuss in this post and this presentation.
At the SemTech conference last week, a few companies asked me how to respond to IBM’s Watson given my involvement with rapid knowledge acquisition for deep question answering at Vulcan. My answer varies with whether there is any subject matter focus, but essentially involves extending their approach with deeper knowledge and more emphasis on logical in additional to textual entailment.
Today, in a discussion on the LinkedIn NLP group, there was some interest in finding more technical details about Watson. A year ago, IBM published the most technical details to date about Watson in the IBM Journal of Research and Development. Most of those journal articles are available for free on the web. For convenience, here are my bookmarks to them.
In the spring of 2012, Vulcan engaged Automata for a knowledge acquisition (KA) experiment. This article provides background on the context of that experiment and what the results portend for artificial intelligence applications, especially in the areas of education. Vulcan presented some of the award-winning work referenced here at an AI conference, including a demonstration of the electronic textbook discussed below. There is a video of that presentation here. The introductory remarks are interesting but not pertinent to this article.
Background on Vulcan’s Project Halo
From 2002 to 2004, Vulcan developed a Halo Pilot that could correctly answer between 30% and 50% of the questions on advanced placement (AP) tests in chemistry. The approaches relied on sophisticated approaches to formal knowledge representation and expert knowledge engineering. Of three teams, Cycorp fared the worst and SRI fared the best in this competition. SRI’s system performed at the level of scoring a 3 on the AP, which corresponds to earning course credit at many universities. The consensus view at that time was that achieving a score of 4 on the AP was feasible with limited additional effort. However, the cost per page for this level of performance was roughly $10,000, which needed to be reduced significantly before Vulcan’s objective of a Digital Aristotle could be considered viable.
Continue reading “Background for our Semantic Technology 2013 presentation”
Benjamin Grosof and I will be presenting the following review of recent work at Vulcan towards Digital Aristotle as part of Project Halo at SemTechBiz in San Francisco the first week of June.
Acquiring deep knowledge from text
We show how users can rapidly specify large bodies of deep logical knowledge starting from practically unconstrained natural language text.
English sentences are semi-automatically interpreted into predicate calculus formulas, and logic programs in SILK, an expressive knowledge representation (KR) and reasoning system which tolerates practically inevitable logical inconsistencies arising in large knowledge bases acquired from and maintained by distributed users possessing varying linguistic and semantic skill sets who collaboratively disambiguate grammar, logical quantification and scope, co-references, and word senses.
The resulting logic is generated as Rulelog, a draft standard under W3C Rule Interchange Format’s Framework for Logical Dialects, and relies on SILK’s support for FOL-like formulas, polynomial-time inference, and exceptions to answer questions such as those found in advanced placement exams.
We present a case study in understanding cell biology based on a first-year college level textbook.