Combinatorial ambiguity? No problem!

Working on translating some legal documentations (sales and use tax laws and regulations) into compliance logic, we came across the following sentence (and many more that are even worse):

  • Any transfer of title or possession, exchange, or barter, conditional or otherwise, in any manner or by any means whatsoever, of tangible personal property for a consideration.

Natural language processing systems choke on sentences like this because of such sentences’ combinatorial ambiguity and NLP’s typical lack of knowledge about what can be conjoined or complement or modify what.

This sentences has many thousands of possible parses.  They involve what the scopes of each of the the ‘or’s are and what is modified by conditional, otherwise, or whatsoever and what is complemented by in, by, of, and for.

The following shows 2 parses remaining after we veto a number of mistakes and confirm some phrases from the 400 highest ranking parses (a few right or left clicks of the mouse):

Continue reading “Combinatorial ambiguity? No problem!”

Simple, Fast, Effective, Active Learning

Recently, we “read” ten thousand recipes or so from a cooking web site.  The purpose of doing so was to produce a formal representation of those recipes for use in temporal reasoning by a robot.

Our task was to produce ontology by reading the recipes subject to conflicting goals.  On the one hand, the ontology was to be accurate so that the robot could reason, plan, and answer questions robustly.  On the other hand, the ontology was to be produced automatically (with minimal human effort).[1]

In order to minimize human effort while still obtaining deep parses from which we produce ontology, we used more techniques from statistical natural language processing than we typically do in knowledge acquisition for deep QA, compliance, or policy automation.  (Consider that NLP typically achieves less than 90% syntactic accuracy while such work demands near 100% semantic accuracy.)[2]

In the effort, we refined some prior work on representing words as vectors and semi-supervised learning.  In particular, we adapted semi-supervised, active learning similar to Stratos & Collins 2015 using enhancements to the canonical correlation analysis (CCA) of Dhillon et al 2015 to obtain accurate part of speech tagging, as conveyed in the following graphic from Stratos & Collins:

Continue reading “Simple, Fast, Effective, Active Learning”

It’s hard to reckon nice English

The title is in tribute to Raj Reddy‘s classic talk about how it’s hard to wreck a nice beach.

I came across interesting work on higher order and semantic dependency parsing today:

So I gave the software a try for the sentence I discussed in this post.  The results discussed below were somewhat disappointing but not unexpected.  So I tried a well know parser with similar results (also shown below).

There is no surprise here.  Both parsers are marvels of machine learning and natural language processing technology.  It’s just that understanding is far beyond the ken of even the best NLP.  This may be obvious to some, but many are surprised given all the hype about Google and Watson and artificial intelligence or “deep learning” recently.

Continue reading “It’s hard to reckon nice English”

Smart Machines And What They Can Still Learn From People – Gary Marcus

This is a must-watch video from the Allen Institute for AI for anyone seriously interested in artificial intelligence.  It’s 70 minutes long, but worth it.  Some of the highlights from my perspective are:

  • 27:27 where the key reason that deep learning approaches fail at understanding language are discussed
  • 31:30 where the inability of inductive approaches to address logical quantification or variables are discussed
  • 39:30 thru 42+ where the inability of deep learning to perform as well as Watson and the inability of Watson to understand or reason are discussed

The astute viewer and blog reader will recognize this slide as discussed by Oren Etzioni here.

Deep Learning, Big Data and Common Sense

Thanks to John Sowa‘s comment on LinkedIn for this link which, although slightly dated, contains the following:

In August, I had the chance to speak with Peter Norvig, Director of Google Research, and asked him if he thought that techniques like deep learning could ever solve complicated tasks that are more characteristic of human intelligence, like understanding stories, which is something Norvig used to work on in the nineteen-eighties. Back then, Norvig had written a brilliant review of the previous work on getting machines to understand stories, and fully endorsed an approach that built on classical “symbol-manipulation” techniques. Norvig’s group is now working within Hinton, and Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone.

Other quotes along the same lines come from Oren Etzioni in Looking to the Future of Data Science:

  1. But in his keynote speech on Monday, Oren Etzioni, a prominent computer scientist and chief executive of the recently created Allen Institute for Artificial Intelligence, delivered a call to arms to the assembled data mavens. Don’t be overly influenced, Mr. Etzioni warned, by the “big data tidal wave,” with its emphasis on mining large data sets for correlations, inferences and predictions. The big data approach, he said during his talk and in an interview later, is brimming with short-term commercial opportunity, but he said scientists should set their sights further. “It might be fine if you want to target ads and generate product recommendations,” he said, “but it’s not common sense knowledge.”
  2. The “big” in big data tends to get all the attention, Mr. Etzioni said, but thorny problems often reside in a seemingly simple sentence or two. He showed the sentence: “The large ball crashed right through the table because it was made of Styrofoam.” He asked, What was made of Styrofoam? The large ball? Or the table? The table, humans will invariably answer. But the question is a conundrum for a software program, Mr. Etzioni explained
  3. Instead, at the Allen Institute, financed by Microsoft co-founder Paul Allen, Mr. Etzioni is leading a growing team of 30 researchers that is working on systems that move from data to knowledge to theories, and then can reason. The test, he said, is: “Does it combine things it knows to draw conclusions?” This is the step from correlation, probabilities and prediction to a computer system that can understand

This is a significant statement from one of the best people in fact extraction on the planet!

As you know from elsewhere on this blog, I’ve been involved with the precursor to the AIAI (Vulcan’s Project Halo) and am a fan of Watson.  But Watson is the best example of what Big Data, Deep Learning, fact extraction, and textual entailment aren’t even close to:

  • During a Final Jeopardy! segment that included the “U.S. Cities” category, the clue was: “Its largest airport was named for a World War II hero; its second-largest, for a World War II battle.”
  • Watson responded “What is Toronto???,” while contestants Jennings and Rutter correctly answered Chicago — for the city’s O’Hare and Midway airports.

Sure, you can rationalize these things and hope that someday the machine will not need reliable knowledge (or that it will induce enough information with enough certainty).  IBM does a lot of this (e.g., see the source of the quotes above).  That day may come, but it will happen a lot sooner with curated knowledge.

Deep Parsing vs. Deep Learning

For those of us that enjoy the intersection of machine learning and natural language, including “deep learning”, which is all the rage, here is an interesting paper on generalizing vector space models of words to broader semantics of English by Jayant Krishnamurthy, a PhD student of Tom Mitchell at Carnegie Mellon University:

Essentially, the paper demonstrates how the features of high-precision lexicalized grammars allow machines to learn the compositional semantics of English.  More specifically, the paper demonstrates learning of compositional semantics beyond the capabilities of recurrent neural networks (RNN).  In summary, the paper suggests that deep parsing is better than deep learning for understanding the meaning of natural language.

For more information and a different perspective, I recommend the following paper, too:

Note that the authors use Combinatory Categorial Grammar (CCG) while our work uses head-driven phrase structure grammar (HPSG), but this is a minor distinction.  For example, compare the logical forms in the Groningen Meaning Bank with the logic produced by the Linguist.  The former uses CCG to produce lambda calculus while the latter uses HPSG to produce predicate calculus (ignoring vagaries of under-specified representation which are useful for hypothetical reasoning and textual entailment).

Higher Education on a Flatter Earth

We’re collaborating on some educational work and came across this sentence in a textbook on finance and accounting:

  • All of these are potentially good economic decisions.

We use statistical NLP but assist with the ambiguities.  In doing this, we relate questions and answers and explanations to the text.

We also extract the terminology and produce a rich lexicalized ontology of the subject matter for pedagogical uses, assessment, and adaptive learning.

Here’s one that just struck me as interesting.  This is a case where the choice looks like it won’t matter much either way, but …

Continue reading “Higher Education on a Flatter Earth”

Knowledge acquisition using lexical and semantic ontology

In developing a compliance application based on the institutional review board policies of John Hopkins’ Dept. of Medicine, we have to clarify the following sentence:

  • Projects involving drugs or medical devices other than the use of an approved drug or medical device in the course of medical practice and projects whose data will be submitted to or held for inspection by the FDA will not be exempt from JHM IRB review UNLESS that use falls within the Emergency Use provisions of 21 CFR 56.102 (d).

As you can see, there are a number of compound words and acronyms, as well as references to the Code of Federal Regulations that need to be defined or recognized to understand this sentence.  Continue reading “Knowledge acquisition using lexical and semantic ontology”

Pedagogical applications of proofs of answers to questions

In Vulcan’s Project Halo, we developed means of extracting the structure of logical proofs that answer advanced placement (AP) questions in biology.  For example, the following shows a proof that separation of chromatids occurs during prophase.

textual explanation of entailment using the Linguist and SILK

This explanation was generated using capabilities of SILK built on those described in A SILK Graphical UI for Defeasible Reasoning, with a Biology Causal Process Example.  That paper gives more details on how the proof structures of questions answered in Project Sherlock are available for enhancing the suggested questions of Inquire (which is described in this post, which includes further references).  SILK justifications are produced using a number of higher-order axioms expressed using Flora‘s higher-order logic syntax, HiLog.  These meta rules determine which logical axioms can or do result in a literal.  (A literal is an positive or negative atomic formula, such as a fact, which can be true, false, or unknown.  Something is unknown if it is not proven as true or false.  For more details, you can read about the well-founded semantics, which is supported by XSB. Flora is implemented in XSB.)

Now how does all this relate to pedagogy in future derivatives of electronic learning software or textbooks, such as Inquire?

Well, here’s a use case: Continue reading “Pedagogical applications of proofs of answers to questions”