Working on translating some legal documentations (sales and use tax laws and regulations) into compliance logic, we came across the following sentence (and many more that are even worse):
- Any transfer of title or possession, exchange, or barter, conditional or otherwise, in any manner or by any means whatsoever, of tangible personal property for a consideration.
Natural language processing systems choke on sentences like this because of such sentences’ combinatorial ambiguity and NLP’s typical lack of knowledge about what can be conjoined or complement or modify what.
This sentences has many thousands of possible parses. They involve what the scopes of each of the the ‘or’s are and what is modified by conditional, otherwise, or whatsoever and what is complemented by in, by, of, and for.
The following shows 2 parses remaining after we veto a number of mistakes and confirm some phrases from the 400 highest ranking parses (a few right or left clicks of the mouse):
After vetoing the selected parse (which leaves no viable parses from the top 400 ranking parses) we reparse with what we learned above.
This gives us another 400 parses and we again veto and select correct parses (a few more right or left mouse clicks) until we get here:
After we reparse, we have another 400 top-ranked parses that satisfy the constraints accumulated from prior clicks of the mouse.
After we veto the above selection and continue selecting correct parts and vetoing incorrect parts we find there are 66 syntactically equivalent parses, as shown here:
The remaining ambiguity come from some lexical ambiguity (e.g., what ‘means’ means) and semantic ambiguity, all of which can be resolved using the Linguist.