Commercial Intelligence Rotating Header Image


Iterative Disambiguation

In a prior post we showed how extraordinarily ambiguous, long sentences can be precisely interpreted. Here we take a simpler look upon request.

Let’s take a sentence that has more than 10 parses and configure the software to disambiguate among no more than 10.

Once again, this is a trivial sentence to disambiguate in seconds without iterative parsing!

The immediate results might present:

Suppose the intent is not that the telescope is with my friend, so veto “telescope with my friend” with a right-click.


“Only full page color ads can run on the back cover of the New York Times Magazine.”

A decade or so ago, we were debating how to educate Paul Allen’s artificial intelligence in a meeting at Vulcan headquarters in Seattle with researchers from IBM, Cycorp, SRI,  and other places.

We were talking about how to “engineer knowledge” from textbooks into formal systems like Cyc or Vulcan’s SILK inference engine (which we were developing at the time).   Although some progress had been made in prior years, the onus of acquiring knowledge using SRI’s Aura remained too high and the reasoning capabilities that resulted from Aura, which targeted University of Texas’ Knowledge Machine, were too limited to achieve Paul’s objective of a Digital Aristotle.  Unfortunately, this failure ultimately led to the end of Project Halo and the beginning of the Aristo project under Oren Etzioni’s leadership at the Allen Institute for Artificial Intelligence.

At that meeting, I brought up the idea of simply translating English into logic, as my former product called “Authorete” did.  (We renamed it before Haley Systems was acquired by Oracle, prior to the meeting.)


Dictionary Knowledge Acquisition

The following is motivated by Section 6359 of the California Sales and Use Tax.  It demonstrates how knowledge can be acquired from dictionary definitions:

Here, we’ve taken a definition from WordNet and prefixed it with the word followed by a colon and parsed it using the Linguist.


‘believed by many’

A Linguist user recently had a question about part of a sentence that boiled down to something like the following:

  • It is believed by many.

The question was whether “many” was an adjective, cardinality, or noun in this sentence.  It’s a reasonable question!


Nominal semantics of ‘meaning’

Just a quick note about a natural language interpretation that came up for the following sentence:

  • Under that test, the rental to an oil well driller of a “rock bit” having an effective life of but one rental is a transaction in lieu of a transfer of title within the meaning of (a) of this section.

The NLP system comes up with many hundreds of plausible parses for this sentence (mostly because it’s considering lexical and syntactic possibilities that are not semantically plausible).  Among these is “meaning” as a nominalization.

From Wikipedia:

  • In linguistics, nominalization is the use of a word which is not a noun (e.g. a verb, an adjective or an adverb) as a noun, or as the head of a noun phrase, with or without morphological transformation.

It’s quite common to use the present participle of a verb as a noun.  In this case, Google comes up with this definition for the noun ‘meaning’:

  • what is meant by a word, text, concept, or action.

The NLP system has a definition of “meaning” as a mass or count noun as well as definitions for several senses of the verb “mean”, such as these:

  1. intend to convey, indicate, or refer to (a particular thing or notion); signify.
  2. intend (something) to occur or be the case.
  3. have as a consequence or result.


Combinatorial ambiguity? No problem!

Working on translating some legal documentations (sales and use tax laws and regulations) into compliance logic, we came across the following sentence (and many more that are even worse):

  • Any transfer of title or possession, exchange, or barter, conditional or otherwise, in any manner or by any means whatsoever, of tangible personal property for a consideration.

Natural language processing systems choke on sentences like this because of such sentences’ combinatorial ambiguity and NLP’s typical lack of knowledge about what can be conjoined or complement or modify what.

This sentences has many thousands of possible parses.  They involve what the scopes of each of the the ‘or’s are and what is modified by conditional, otherwise, or whatsoever and what is complemented by in, by, of, and for.

The following shows 2 parses remaining after we veto a number of mistakes and confirm some phrases from the 400 highest ranking parses (a few right or left clicks of the mouse):


Higher Education on a Flatter Earth

We’re collaborating on some educational work and came across this sentence in a textbook on finance and accounting:

  • All of these are potentially good economic decisions.

We use statistical NLP but assist with the ambiguities.  In doing this, we relate questions and answers and explanations to the text.

We also extract the terminology and produce a rich lexicalized ontology of the subject matter for pedagogical uses, assessment, and adaptive learning.

Here’s one that just struck me as interesting.  This is a case where the choice looks like it won’t matter much either way, but …


Deep question answering: Watson vs. Aristotle

At the SemTech conference last week, a few companies asked me how to respond to IBM’s Watson given my involvement with rapid knowledge acquisition for deep question answering at Vulcan.  My answer varies with whether there is any subject matter focus, but essentially involves extending their approach with deeper knowledge and more emphasis on logical in additional to textual entailment.

Today, in a discussion on the LinkedIn NLP group, there was some interest in finding more technical details about Watson.  A year ago, IBM published the most technical details to date about Watson in the IBM Journal of Research and Development.  Most of those journal articles are available for free on the web.  For convenience, here are my bookmarks to them.