Commercial Intelligence Rotating Header Image

Linguist

It’s hard to reckon nice English

The title is in tribute to Raj Reddy‘s classic talk about how it’s hard to wreck a nice beach.

I came across interesting work on higher order and semantic dependency parsing today:

So I gave the software a try for the sentence I discussed in this post.  The results discussed below were somewhat disappointing but not unexpected.  So I tried a well know parser with similar results (also shown below).

There is no surprise here.  Both parsers are marvels of machine learning and natural language processing technology.  It’s just that understanding is far beyond the ken of even the best NLP.  This may be obvious to some, but many are surprised given all the hype about Google and Watson and artificial intelligence or “deep learning” recently.

(more…)

Properly disambiguating a sentence using the Linguist™

Consider the following disambiguation result from a user of Automata’s Linguist™.

(more…)

Deep Parsing vs. Deep Learning

For those of us that enjoy the intersection of machine learning and natural language, including “deep learning”, which is all the rage, here is an interesting paper on generalizing vector space models of words to broader semantics of English by Jayant Krishnamurthy, a PhD student of Tom Mitchell at Carnegie Mellon University:

Essentially, the paper demonstrates how the features of high-precision lexicalized grammars allow machines to learn the compositional semantics of English.  More specifically, the paper demonstrates learning of compositional semantics beyond the capabilities of recurrent neural networks (RNN).  In summary, the paper suggests that deep parsing is better than deep learning for understanding the meaning of natural language.

For more information and a different perspective, I recommend the following paper, too:

Note that the authors use Combinatory Categorial Grammar (CCG) while our work uses head-driven phrase structure grammar (HPSG), but this is a minor distinction.  For example, compare the logical forms in the Groningen Meaning Bank with the logic produced by the Linguist.  The former uses CCG to produce lambda calculus while the latter uses HPSG to produce predicate calculus (ignoring vagaries of under-specified representation which are useful for hypothetical reasoning and textual entailment).

Affiliate Transactions covered by The Federal Reserve Act (Regulation W)

Benjamin Grosof, co-founder of Coherent Knowledge Systems, is also involved with developing a standard ontology for the financial services industry (i.e., FIBO).  In the course of working on FIBO, he is developing a demonstration of defeasible logic concerning Regulation W of the The Federal Reserve Act.  Regulation W specifies which transactions involving banks and their affiliates are prohibited under Section 23A of the Act.  In the course of doing this, there are various documents which are being captured within the Linguist™ platform.  This is a brief note of how those documents can be imported into the platform for curation into formal semantics and logic (as Benjamin and Coherent are doing). (more…)

Automatic Knowledge Graphs for Assessment Items and Learning Objects

As I mentioned in this post, we’re having fun layering questions and answers with explanations on top of electronic textbook content.

The basic idea is to couple a graph structure of questions, answers, and explanations into the text using semantics.  The trick is to do that well and automatically enough that we can deliver effective adaptive learning support.  This is analogous to the knowledge graph that users of Knewton‘s API create for their content.  The difference is that we get the graph from the content, including the “assessment items” (that’s what educators call questions, among other things).  Essentially, we parse the content, including the assessment items (i.e., the questions and each of their answers and explanations).   The result of this parsing is, as we’ve described elsewhere, precise lexical, syntactic, semantic, and logic understanding of each sentence in the content.  But we don’t have to go nearly that far to exceed the state of the art here. (more…)

Artificially Intelligent Educational Technology

Over the last two years, machines have demonstrated their ability to read, listen, and understand English well enough to beat the best at Jeopardy!, answer questions via iPhone, and earn college credit on college advanced placement exams.  Today, Google, Microsoft and others are rushing to respond to IBM and Apple with ever more competent artificially intelligent systems that answer questions and support decisions.

What do such developments suggest for the future of education? (more…)

Knowledge acquisition using lexical and semantic ontology

In developing a compliance application based on the institutional review board policies of John Hopkins’ Dept. of Medicine, we have to clarify the following sentence:

  • Projects involving drugs or medical devices other than the use of an approved drug or medical device in the course of medical practice and projects whose data will be submitted to or held for inspection by the FDA will not be exempt from JHM IRB review UNLESS that use falls within the Emergency Use provisions of 21 CFR 56.102 (d).

As you can see, there are a number of compound words and acronyms, as well as references to the Code of Federal Regulations that need to be defined or recognized to understand this sentence.  (more…)

Neat vs. Scruffy and Watson

Recently, John Sowa has commented on LinkedIn or in correspondence with some of us at Coherent Knowledge Systems on the old adage due to Shanks concerning the Neats. vs. the Scruffies.  The Neats want nice formal logics as the basis of artificial intelligence.  This includes anyone who prefers classical logic (e.g., Common Logic, RIF-BLD, or SBVR) or standard ontologies (e.g., OWL-DL) for representing knowledge and reasoning with it.  The Scruffies may use well-defined technology, but are not constrained by it.  They’ll do whatever they think works, now, whether or not it is a good long term solution and despite its shortcomings, as long as it can obtain immediate objectives.

Watson is scruffy.  It doesn’t try to understand or formally represent knowledge.  It combines a lot of effective technologies into an evidentiary framework that allows it to effectively “guess”.

Today, in response to continued discussion in the Natural Language Processing group on LinkedIn under the topic “This is Watson”, I’m posting the following presentation on Project Sherlock and the Linguist vs. Google and IBM.

Essentially, the neat approach is more viable today than ever.  So, chalk one up for the neats, including Dr. Sowa and Menno Mofait’s comment in that discussion.

During a presentation at CMU after winning the game show,, IBM admitted that in order to get the last leg of improvement needed to win Jeopardy!, they needed to do some “neat” ontological knowledge acquisition, too!

Pedagogical applications of proofs of answers to questions

In Vulcan’s Project Halo, we developed means of extracting the structure of logical proofs that answer advanced placement (AP) questions in biology.  For example, the following shows a proof that separation of chromatids occurs during prophase.

textual explanation of entailment using the Linguist and SILK

This explanation was generated using capabilities of SILK built on those described in A SILK Graphical UI for Defeasible Reasoning, with a Biology Causal Process Example.  That paper gives more details on how the proof structures of questions answered in Project Sherlock are available for enhancing the suggested questions of Inquire (which is described in this post, which includes further references).  SILK justifications are produced using a number of higher-order axioms expressed using Flora‘s higher-order logic syntax, HiLog.  These meta rules determine which logical axioms can or do result in a literal.  (A literal is an positive or negative atomic formula, such as a fact, which can be true, false, or unknown.  Something is unknown if it is not proven as true or false.  For more details, you can read about the well-founded semantics, which is supported by XSB. Flora is implemented in XSB.)

Now how does all this relate to pedagogy in future derivatives of electronic learning software or textbooks, such as Inquire?

Well, here’s a use case: (more…)

Acquring Rich Logical Knowledge from Text (Semantic Technology 2013)

As noted in prior posts about Project Sherlock, we have acquired knowledge from a biology textbook to build the business case for applications like Inquire.  We reported our results at SemTech recently.  The slides are  available here.

Logic acquired from English sentences in Campbell's college biology textbook published by Pearson