Entailment-driven Extracting and Editing for Conversational Machine Reading

When I wrote Are Vitamins Subject to Sales Tax, I was addressing the process of translating knowledge expressed in formal documents, like laws, regulations, and contracts, into logic suitable for inference using the Linguist.

Recently, one of my favorite researchers working in natural language processing and reasoning, Luke Zettlemoyer, is among the authors of Entailment-driven Extracting and Editing for Conversational
Machine Reading
.  This is a very nice turn towards knowledge extraction and inference that improves on superficial reasoning by textual entailment (RTE).

I recommend this paper, which relates to BERT, which is among my current favorites in deep learning for NL/QA.  Here is an image from the paper, FYI:

Entailment-driven Extracting and Editing for Conversational Machine Reading

Natural Intelligence

Deep natural language understanding (NLU) is different than deep learning, as is deep reasoning.  Deep learning facilities deep NLP and will facilitate deeper reasoning, but it’s deep NLP for knowledge acquisition and question answering that seems most critical for general AI.  If that’s the case, we might call such general AI, “natural intelligence”.

Deep learning on its own delivers only the most shallow reasoning and embarrasses itself due to its lack of “common sense” (or any knowledge at all, for that matter!).  DARPA, the Allen Institute, and deep learning experts have come to their senses about the limits of deep learning with regard to general AI.

General artificial intelligence requires all of it: deep natural language understanding[1], deep learning, and deep reasoning.  The deep aspects are critical but no more so than knowledge (including “common sense”).[2] Continue reading “Natural Intelligence”

Common sense about deep learning

I regularly build deep learning models for natural language processing and today I gave one a try that has been the leader in the Stanford Question Answering Dataset (SQuAD).  This one is a impressive NLP platform built using PyTorch.  But it’s still missing the big picture (i.e., it doesn’t “know” much).

Generally,  NLP systems that emphasize Big Data (e.g., deep learning approaches) but eschew more explicit knowledge representation and reasoning are interesting but unintelligent.  Think Siri and Alexa, for example.  They might get a simple factoid question if a Google search can find closely related text, but not much more.

Here is a simple demonstration of problems that the state of the art in deep machine learning is far from solving…

Here is a paragraph from a Wall Street Journal article about the Fed today where the deep learning system has “found” what the pronouns “this” and “they” reference:

The essential point here is that the deep learning system is missing common sense.  It is “the need”, not “a raise” that is referenced by “this”.  And “they” references “officials”, not “the minutes”.

Bottom line: if you need your natural language understanding system to be smarter than this, you are not going to get there using deep learning alone.

Ending Back Pain with AI

A nationwide physical therapy business is renowned for eliminating chronic pain.  One unique aspect of this business is that it eliminates pain not by manipulation but by providing clients with expertly selected sequences of exercises that address problems in their functional anatomy.  In effect, the business helps people fix themselves and teaches them to maintain their musculoskeletal function for pain-free life.

The business has dozens of clinics with many more therapists.  Its expertise comes primarily from its founder and a number of long-time employees who have learned through experience.  We have been engaged to assist with several challenges on several occasions.

Continue reading “Ending Back Pain with AI”

It’s hard to reckon nice English

The title is in tribute to Raj Reddy‘s classic talk about how it’s hard to wreck a nice beach.

I came across interesting work on higher order and semantic dependency parsing today:

So I gave the software a try for the sentence I discussed in this post.  The results discussed below were somewhat disappointing but not unexpected.  So I tried a well know parser with similar results (also shown below).

There is no surprise here.  Both parsers are marvels of machine learning and natural language processing technology.  It’s just that understanding is far beyond the ken of even the best NLP.  This may be obvious to some, but many are surprised given all the hype about Google and Watson and artificial intelligence or “deep learning” recently.

Continue reading “It’s hard to reckon nice English”

Smart Machines And What They Can Still Learn From People – Gary Marcus

This is a must-watch video from the Allen Institute for AI for anyone seriously interested in artificial intelligence.  It’s 70 minutes long, but worth it.  Some of the highlights from my perspective are:

  • 27:27 where the key reason that deep learning approaches fail at understanding language are discussed
  • 31:30 where the inability of inductive approaches to address logical quantification or variables are discussed
  • 39:30 thru 42+ where the inability of deep learning to perform as well as Watson and the inability of Watson to understand or reason are discussed

The astute viewer and blog reader will recognize this slide as discussed by Oren Etzioni here.

Deep Learning, Big Data and Common Sense

Thanks to John Sowa‘s comment on LinkedIn for this link which, although slightly dated, contains the following:

In August, I had the chance to speak with Peter Norvig, Director of Google Research, and asked him if he thought that techniques like deep learning could ever solve complicated tasks that are more characteristic of human intelligence, like understanding stories, which is something Norvig used to work on in the nineteen-eighties. Back then, Norvig had written a brilliant review of the previous work on getting machines to understand stories, and fully endorsed an approach that built on classical “symbol-manipulation” techniques. Norvig’s group is now working within Hinton, and Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone.

Other quotes along the same lines come from Oren Etzioni in Looking to the Future of Data Science:

  1. But in his keynote speech on Monday, Oren Etzioni, a prominent computer scientist and chief executive of the recently created Allen Institute for Artificial Intelligence, delivered a call to arms to the assembled data mavens. Don’t be overly influenced, Mr. Etzioni warned, by the “big data tidal wave,” with its emphasis on mining large data sets for correlations, inferences and predictions. The big data approach, he said during his talk and in an interview later, is brimming with short-term commercial opportunity, but he said scientists should set their sights further. “It might be fine if you want to target ads and generate product recommendations,” he said, “but it’s not common sense knowledge.”
  2. The “big” in big data tends to get all the attention, Mr. Etzioni said, but thorny problems often reside in a seemingly simple sentence or two. He showed the sentence: “The large ball crashed right through the table because it was made of Styrofoam.” He asked, What was made of Styrofoam? The large ball? Or the table? The table, humans will invariably answer. But the question is a conundrum for a software program, Mr. Etzioni explained
  3. Instead, at the Allen Institute, financed by Microsoft co-founder Paul Allen, Mr. Etzioni is leading a growing team of 30 researchers that is working on systems that move from data to knowledge to theories, and then can reason. The test, he said, is: “Does it combine things it knows to draw conclusions?” This is the step from correlation, probabilities and prediction to a computer system that can understand

This is a significant statement from one of the best people in fact extraction on the planet!

As you know from elsewhere on this blog, I’ve been involved with the precursor to the AIAI (Vulcan’s Project Halo) and am a fan of Watson.  But Watson is the best example of what Big Data, Deep Learning, fact extraction, and textual entailment aren’t even close to:

  • During a Final Jeopardy! segment that included the “U.S. Cities” category, the clue was: “Its largest airport was named for a World War II hero; its second-largest, for a World War II battle.”
  • Watson responded “What is Toronto???,” while contestants Jennings and Rutter correctly answered Chicago — for the city’s O’Hare and Midway airports.

Sure, you can rationalize these things and hope that someday the machine will not need reliable knowledge (or that it will induce enough information with enough certainty).  IBM does a lot of this (e.g., see the source of the quotes above).  That day may come, but it will happen a lot sooner with curated knowledge.

Deep Parsing vs. Deep Learning

For those of us that enjoy the intersection of machine learning and natural language, including “deep learning”, which is all the rage, here is an interesting paper on generalizing vector space models of words to broader semantics of English by Jayant Krishnamurthy, a PhD student of Tom Mitchell at Carnegie Mellon University:

Essentially, the paper demonstrates how the features of high-precision lexicalized grammars allow machines to learn the compositional semantics of English.  More specifically, the paper demonstrates learning of compositional semantics beyond the capabilities of recurrent neural networks (RNN).  In summary, the paper suggests that deep parsing is better than deep learning for understanding the meaning of natural language.

For more information and a different perspective, I recommend the following paper, too:

Note that the authors use Combinatory Categorial Grammar (CCG) while our work uses head-driven phrase structure grammar (HPSG), but this is a minor distinction.  For example, compare the logical forms in the Groningen Meaning Bank with the logic produced by the Linguist.  The former uses CCG to produce lambda calculus while the latter uses HPSG to produce predicate calculus (ignoring vagaries of under-specified representation which are useful for hypothetical reasoning and textual entailment).