Translating English into Logic using the Linguist

Now that the patent filings are done, we can discuss and show more about the Linguist…

The following link is a video that shows a sentence from Project Sherlock being translated from English into first-order logic using the patent-pending  Linguisttm software.

The hydrophobic ends of the lipids of a cell’s plasma membrane are oriented away from the cell’s cytoplasm.

This video was recorded in October, 2012.  More recent versions of the Linguist can render the logic in more ways, such as shown below:

A grammatically disambiguated and logically formalized English sentence using Automata Linguist

Background for our Semantic Technology 2013 presentation

In the spring of 2012, Vulcan engaged Automata for a knowledge acquisition (KA) experiment.  This article provides background on the context of that experiment and what the results portend for artificial intelligence applications, especially in the areas of education.  Vulcan presented some of the award-winning work referenced here at an AI conference, including a demonstration of the electronic textbook discussed below.  There is a video of that presentation here.  The introductory remarks are interesting but not pertinent to this article.

Background on Vulcan’s Project Halo

Background on Vulcan's Project Halo

From 2002 to 2004, Vulcan developed a Halo Pilot that could correctly answer between 30% and 50% of the questions on advanced placement (AP) tests in chemistry.  The approaches relied on sophisticated approaches to formal knowledge representation and expert knowledge engineering.  Of three teams, Cycorp fared the worst and SRI fared the best in this competition.  SRI’s system performed at the level of scoring a 3 on the AP, which corresponds to earning course credit at many universities.  The consensus view at that time was that achieving a score of 4 on the AP was feasible with limited additional effort.  However, the cost per page for this level of performance was roughly $10,000, which needed to be reduced significantly before Vulcan’s objective of a Digital Aristotle could be considered viable.

Continue reading “Background for our Semantic Technology 2013 presentation”

SBVR in OWL

In preparation for generating RIF and SBVR from the Linguist, we have produced an OWL ontology for the pertinent aspects of the SBVR specification.  We hope that this is helpful to others and would sincerely appreciate any corrections or comments on how to improve it.

Paul

Semantic Technology & Business Conference (SemTechBiz)

Benjamin Grosof and I will be presenting the following review of recent work at Vulcan towards Digital Aristotle as part of Project Halo at SemTechBiz in San Francisco the first week of June.

Acquiring deep knowledge from text

We show how users can rapidly specify large bodies of deep logical knowledge starting from practically unconstrained natural language text.

English sentences are semi-automatically interpreted into  predicate calculus formulas, and logic programs in SILK, an expressive knowledge representation (KR) and reasoning system which tolerates practically inevitable logical inconsistencies arising in large knowledge bases acquired from and maintained by distributed users possessing varying linguistic and semantic skill sets who collaboratively disambiguate grammar, logical quantification and scope, co-references, and word senses.

The resulting logic is generated as Rulelog, a draft standard under W3C Rule Interchange Format’s Framework for Logical Dialects, and relies on SILK’s support for FOL-like formulas, polynomial-time inference, and exceptions to answer questions such as those found in advanced placement exams.

We present a case study in understanding cell biology based on a first-year college level textbook.

Deep QA

Our efforts at acquiring deep knowledge from a college biology text have enabled us to answer a number of questions that are beyond what has been previously demonstrated.

For example, we’re answering questions like:

  1. Are the passage ways provided by channel proteins hydrophilic or hydrophobic?
  2. Will a blood cell in a hypertonic environment burst?
  3. If a Paramecium swims from a hypotonic environment to an isotonic environment, will its contractile vacuole become more active?

A couple of these are at higher levels on the Bloom scale of cognitive skills than Watson can reach (which is significantly higher than search engines).

As some other posts have shown in images, we can translate completely natural sentences into formal logic.  We actually do the reasoning using Vulcan’s SILK, which has great capabilities, including defeasibility.  We can also output to RIF or SBVR, but the temporal aspects and various things such as modality and the need for defeasibility favor SILK or Cyc for the best reasoning and QA performance.

One thing in particular is worth noting:  this approach does better with causality and temporal logic than is typically considered by most controlled natural language systems, whether they are translating to a business rules engine or a logic formalism, such as first order or description logic.  The approach promises better application development and knowledge management capabilities for more of the business process management and complex event processing markets.

Logic from the English of Science, Government, and Business

Our software is translating even long and complicated sentences from regulations to textbooks into formal logic (i.e,, not necessarily first-order logic, but more general predicate calculus).   As you can see below, we can translate this understanding into various logical formalisms including defeasible first-order logic, which we are applying in Vulcan’s Project Halo.  This includes classical first-order logic and related standards such as RIF or SBVR, as well as building or extending an ontology or description logic (e.g., OWL-DL).

We’re excited about these capabilities in various applications, such as in advancing science and education at Vulcan and formally understanding, analyzing and automating policy and regulations in enterprises.

English translated into predicate calculus

English translated into predicate calculus

a sentence understood by Automata

an unambiguously, formally understood sentence


English translated into SILK and Prolog

English translated into SILK and Prolog

Project Sherlock

Working as part of Vulcan’s Project Halo[1], Automata is applying a natural language understanding system that translates carefully formulated sentences into formal logic so as to answer questions that typically require deeper knowledge and inference than demonstrated by Watson.

The objective over the next three quarters is to acquire enough knowledge from the 9th edition of Campbell’s Biology textbook to demonstrate three things.

  • First, that the resulting system answers, for example, biology advanced placement (AP) exam questions more competently than existing systems (e.g., Aura[2] or Inquire[3]).
  • Second, that knowledge from certain parts of the textbook is effectively translated from English into formal knowledge with sufficient breadth and depth of coverage and semantics.
  • Third, that the knowledge acquisition process proves efficacious and accessible to less than highly skilled knowledge engineers so as to accelerate knowledge acquisition beyond 2012.

Included in the second of these is a substantial ontology of background knowledge expected of students in order to comprehend the selected parts of the textbook using a combination of OWL, logic, and English sentences from sources other than the textbook.

Automata is hiring logicians, linguists, and biologists to work as consultants, contracts, or employees for:

  • Interactive tree-banking and word-sense disambiguation of several thousand sentences.[4]
  • Extending its lexical ontology and a broad-coverage grammar of English with additional vocabulary and deeper semantics, especially concerning cellular biology and related scientific knowledge including chemistry, physics, and math.
  • Maturing its upper and middle ontology of domain independent knowledge using OWL in combination with various other technologies, including description logic, first-order logic, high-order logic, modal logic, and defeasible logic.[5]
  • Enhancing its platform for text-driven knowledge engineering towards a collaborative wiki-like architecture for self-aware content in scientific education and biomedical applications.

Terms of engagement are flexible; ranging from small units of work to full-time employment.  We are based in Pittsburgh, Pennsylvania and Vulcan is headquartered in Seattle, Washington, but the team is distributed across the country and overseas.

Please contact Paul Haley by e-mail to his first name at this domain.


[1] Vulcan: http://www.vulcan.com/TemplateCompany.aspx?contentId=54; Project Halo: http://www.projecthalo.com/
Video introduction/overview :http://videolectures.net/aaai2011_gunning_halobook/

[2] Aura: http://www.ai.sri.com/project/aura

[3] Inquire: http://www.franz.com/success/customer_apps/artificial_intelligence/aura.lhtml

[4] tree-banking and WSD: http://www.omg.org/spec/SBVR and http://en.wikipedia.org/wiki/Word-sense_disambiguation

[5] e.g., SILK (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174.1796 ) and SBVR (http://www.omg.org/spec/SBVR)

Event-centric BPM and goal-driven processing

The slides for my Business Rules Forum presentation on event semantics and focusing on events in order to simplify process definition and to facilitate more robust governance and compliance are at Event-centric BPM.

After the talk I spoke with Jan Verbeek and Gartjan Grijzen of Be Informed and reviewed their software, which is excellent.  They have been quite successful with various government agencies in applying  the event-centric methodology to produce goal-driven processing.  Their approach is elegant and effective.  It clearly demonstrates the merits of an event-centric approach and the power that emerges from understanding event-dependencies.  Also, it is very semantic, ontological, and logic-programming oriented in its approach (e.g., they use OWL and a backward-chaining inference engine).

They do not have the top-down knowledge management approach that I advocate nor do they provide the logical verification of governing policies and compliance (i.e., using theorem provers) that I mention in the talk (see Guido Governatori‘s 2010 publications and Travis Breaux‘s research at CMU, for example) but theirs is the best commercially deployed work in separating business process description from procedural implementation that comes to mind. (Note that Ed Barkmeyer of NIST reports some use of SBVR descriptions of manufacturing processes with theorem provers.  Some in automotive and aerospace industries have been interested in this approach for quality purposes, too.)

BeInformed is now expanding into the United States with the assistance of Mills Davis and others.  Their software is definitely worth consideration and, in my opinion, is more elegant and effective than the generic BPMN approach.

Simple problems with the semantic web

The standard for defining ontologies these days is OWL and Protege.  Unfortunately, OWL lacks any notion of exceptions in inheritance or any other notion of defeasibility.

So, although you may want to say that birds fly, you’re ontology will be broken (or become much more complicated) when you realize there are birds that can’t fly, such as penguins or ostriches, or even sick or injured birds.

Practically speaking, you need something like courteous logic or the defeasibility in SILK to handle this (or any 1980s expert system shell or even earlier frame system).  OWL is very hard on mortal man (e.g., mainstream IT) in this regard.

How can I tell OWL that a pronoun is a noun but that pronouns are a closed class of words, unlike nouns, verbs, adjectives, and adverbs (in general).  Well, I’ll have to tell it about open-class nouns versus closed class nouns.  What a pain!

This is why we use Protege primarily as a drafting tool and, for example, SILK, to do reasoning.   Non-defeasible description logic and first-order reasoners are difficult to get along with, in practice (and make sustainable knowledge repositories too difficult – which inhibits adoption, obviously).

What is has always been going to be

I’ve been working for a while now on an ontology for representing events (which includes process, of course).  One of the requirements of a system that is to monitor, govern, implement, or reason about processes is that it consider “situations”, which are things that happen or occur, including events and states.  (See, for example, the perdurants of the DOLCE ontology, BFO‘s occurents, or OpenCyc’s situations.)  This requires the representation of time-variant information at various points or during various intervals of time (more than just the Allen relations or OWL Time).   If you’re interested in such things, I’d recommend Parsons‘ “Events in the Semantics of English” or Pustejovsky‘s “Syntax of Event Structure“, both of which look at the subject from a linguistic rather than inferential perspective.  When you pursue this to the point that you implement the axioms that an artificial intelligence needs to provide assistance in defining or governing a business process (or answering questions about molecular biological processes) you land up in some pretty abstract stuff, including the Stanford Encyclopedia of Philosophy.  I found the title of this post entertaining within the page on temporal logic.