Parsing Winograd Challenges

The Winograd Challenge is an alternative to the Turing Test for assessing artificial intelligence.  The essence of the test involves resolving pronouns.  To date, systems have not fared well on the test for several reasons.  There are 3 that come to mind:

  1. The natural language processing involved in the word problems is beyond the state of the art.
  2. Resolving many of the pronouns requires more common sense knowledge than state of the art systems possess.
  3. Resolving many of the problems requires pragmatic reasoning beyond the state of the art.

As an example, one of the simpler exemplary problems is:

  • There is a pillar between me and the stage, and I can’t see around it.

A heuristic system (or a deep learning one) could infer that “it” does not refer to “me” or “I” and toss a coin between “pillar” and “stage”.  A system worthy of the passing the Winograd Challenge should “know” it’s the pillar.

Even this simple sentence presents some NLP challenges that are easy to overlook.  For example, does “between” modify the pillar or the verb “is”?

This is not much of a challenge, however, so let’s touch on some deeper issues and a more challenging problem…

The common sense knowledge or pragmatic inference involved here might include:

  • pillars are typically opaque and tall but not too thin
  • a platform on which a spectacle is performed is a stage
  • wanting to see around something suggests that one is not interested in looking at it
  • one sees around something when one can see what is behind it without moving much
  • spectators want to be able to see the performances they attend
  • spectators do not typically want to see around the stage on which the spectacle they attend is performed  (or, more positively, spectators want to be able to see ‘the’ stage or field on which the performance or sport they attend is performed or played)
  • the pillar is blocking my line of sight (i.e., my ability to see what I desire to see)
  • it is the stage that I cannot see
  • I am more interested in seeing the performance than the pillar

Well, that’s enough of that, but we could go on, of course.  What motivated this note was a challenge in the natural language processing more than with regard to the knowledge or reasning required.

Here is one of the exemplary problems:

  • Tatyana knew that Grandma always enjoyed serving an abundance of food to her guests. Now Tatyana watched as Grandma gathered Tatyana’s small mother into a wide, scrawny embrace and then propelled her to the table, lifting her shawl from her shoulders, seating her in the place of honor, and saying simply: “There’s plenty.”

This turned out to be a challenge for one of the NLP systems we employ.  Indeed, this is probably beyond any parser you are like to come across or have experience with.  By that, I mean (and challenge you) to post the parse that you obtain from your favorite parser!

Recently, we’ve added hundred thousands of names to the parsing system (mostly from Wikipedia), so we had no problem with Tatyana, but we forgot about Grandma!  Actually, we had removed many common proper nouns from the NLP system when we were parsing scientific, medical, recipe, legal and other such content.  So that took a few minutes and is critical for the Winograd Challenge since the problems are quite colloquially phrased.

The use of “Now” in the beginning of the second sentence also required a tweak to the lexicon.  If you look closely, you will see that it introduce a declarative sentence where it would typically introduce an imperative or a verb phrase.  This is just another step towards making our deep parsing system more robust (a lot of which we accomplished when parsing recipes from social media websites for a recent client).

Examining the second sentence more closely, the relationships among the clauses have interesting temporal relationships (especially with regard to aspect).

Note that an appropriate parse of the second sentence should have the grandmother as the subject of a conjunction involving “gathered”, “propelled”, “lifting”, “seating”, and “saying”, as in:

This turned out to require some improvements to our handling of quoted text and the use of iterative re-parsing as described in a prior post on handling combinatorial ambiguity.

Try it yourself and you’ll see not only how challenging the NLP is, but how much common sense knowledge and pragmatic reasoning is necessary just to parse the problem!