Commercial Intelligence Rotating Header Image

Ending Back Pain with AI

A nationwide physical therapy business is renowned for eliminating chronic pain.  One unique aspect of this business is that it eliminates pain not by manipulation but by providing clients with expertly selected sequences of exercises that address problems in their functional anatomy.  In effect, the business helps people fix themselves and teaches them to maintain their musculoskeletal function for pain-free life.

The business has dozens of clinics with many more therapists.  Its expertise comes primarily from its founder and a number of long-time employees who have learned through experience.  We have been engaged to assist with several challenges on several occasions.

Continue reading →

Combinatorial ambiguity? No problem!

Working on translating some legal documentations (sales and use tax laws and regulations) into compliance logic, we came across the following sentence (and many more that are even worse):

  • Any transfer of title or possession, exchange, or barter, conditional or otherwise, in any manner or by any means whatsoever, of tangible personal property for a consideration.

Natural language processing systems choke on sentences like this because of such sentences’ combinatorial ambiguity and NLP’s typical lack of knowledge about what can be conjoined or complement or modify what.

This sentences has many thousands of possible parses.  They involve what the scopes of each of the the ‘or’s are and what is modified by conditional, otherwise, or whatsoever and what is complemented by in, by, of, and for.

The following shows 2 parses remaining after we veto a number of mistakes and confirm some phrases from the 400 highest ranking parses (a few right or left clicks of the mouse):

Continue reading →

Simple, Fast, Effective, Active Learning

Recently, we “read” ten thousand recipes or so from a cooking web site.  The purpose of doing so was to produce a formal representation of those recipes for use in temporal reasoning by a robot.

Our task was to produce ontology by reading the recipes subject to conflicting goals.  On the one hand, the ontology was to be accurate so that the robot could reason, plan, and answer questions robustly.  On the other hand, the ontology was to be produced automatically (with minimal human effort).[1]

In order to minimize human effort while still obtaining deep parses from which we produce ontology, we used more techniques from statistical natural language processing than we typically do in knowledge acquisition for deep QA, compliance, or policy automation.  (Consider that NLP typically achieves less than 90% syntactic accuracy while such work demands near 100% semantic accuracy.)[2]

In the effort, we refined some prior work on representing words as vectors and semi-supervised learning.  In particular, we adapted semi-supervised, active learning similar to Stratos & Collins 2015 using enhancements to the canonical correlation analysis (CCA) of Dhillon et al 2015 to obtain accurate part of speech tagging, as conveyed in the following graphic from Stratos & Collins:

Continue reading →

Of Kalman Filters and Hidden Markov Models

This provides some background relating to some work we did on part of speech tagging for a modest, domain-specific corpus.  The path is from Hsu et al 2012, which discusses spectral methods based on singular value decomposition (SVD) as a better method for learning hidden Markov models (HMM) and the use of word vectors instead of clustering to improve aspects of NLP, such as part of speech tagging.

The use of vector representations of words for machine learning in natural language is not all that new.  A quarter century ago, Brown et al 1992 introduced hierarchical agglomerative clustering of words based on mutual information and the use of those clusters within hidden Markov language models.  One notable difference versus today’s word vectors is that paths through the hierarchy of clusters to words at the leaves correspond to vectors of bits (i.e., Boolean features) rather than real-valued features.

Continue reading →

Artificially Intelligent Physical Therapy

Over the last 25 years we have developed two generations of AI systems for physical therapy.  The first was before the emergence of the Internet, when Windows was 16 bits.  There were no digital cameras, either.  So, physical therapists would take Polaroid pictures; anterior and left and right lateral pictures or simply eyeball a patient and enter their posture into the computer using a diagram like the following:

Continue reading →

Impressive result from Google

This is pretty impressive work by Google!

They are seeing the objective behind the query.  It’s pretty simple, in theory, to see the verb “read” operating on the object “string” with source (i.e., “from) being consistent with an input stream (also handling the concatenated compound).

More impressive is that they have learned from such queries and content that people view following such queries, perhaps even more deeply, that character streams, scanners, and stream APIs are relevant.

And they have also narrow my results based on the frequency that I look at Java versus other implementation languages.

Robust Inference and Slacker Semantics

In preparing for some natural language generation[1], I came across some work on natural logic[2][3] and reasoning by textual entailment[4] (RTE) by Richard Bergmair in his PhD at Cambridge:

The work he describes overlaps our approach to robust inference from the deep, variable-precision semantics that result from linguistic analysis and disambiguation using the English Resource Grammar (ERG) and the Linguist™.

Continue reading →

It’s hard to reckon nice English

The title is in tribute to Raj Reddy’s classic talk about how it’s hard to wreck a nice beach.

I came across interesting work on higher order and semantic dependency parsing today:

So I gave the software a try for the sentence I discussed in this post.  The results discussed below were somewhat disappointing but not unexpected.  So I tried a well know parser with similar results (also shown below).

There is no surprise here.  Both parsers are marvels of machine learning and natural language processing technology.  It’s just that understanding is far beyond the ken of even the best NLP.  This may be obvious to some, but many are surprised given all the hype about Google and Watson and artificial intelligence or “deep learning” recently.

Continue reading →

Properly disambiguating a sentence using the Linguist™

Consider the following disambiguation result from a user of Automata’s Linguist™.

Continue reading →

A pedagogical model by any other name

Educational technology for personalized or adaptive learning is based on various types of pedagogical models.

Google provides the following info-box and link concerning pedagogical models:

Many companies have different names for their models, including:

  1. Knewton: Knowledge Graph
    (partially described in How do Knewton recommendations work in Pearson MyLab)
  2. Kahn Academy: Knowledge Map
  3. Declara: Cognitive Graph

The term “knowledge graph” is particularly common, including Google’s (link).

There are some serious problems with these names and some varying limitations in their pedagogical efficacy.  There is nothing cognitive about Declara’s graph, for example.  Like the others it may organize “concepts” (i.e., nodes) by certain relationships (i.e., links) that pertain to learning, but none of these graphs purports to represent knowledge other than superficially for limited purposes.

  • Each of Google’s, Knewton’s, and Declara’s graphs are far from sufficient to represent the knowledge and cognitive skills expected of a masterful student.
  • Each of them is also far from sufficient to represent the knowledge of learning objectives and instructional techniques expected of a proficient instructor.

Nonetheless, such graphs are critical to active e-learning technology, even if they fall short of our ambitious hope to dramatically improve learning and learning outcomes.

The most critical ingredients of these so-called “knowledge” or “cognitive” graphs include the following:

  1. learning objectives
  2. educational resources, including instructional and formative or summative assessment items
  3. relationships between educational resources and learning objectives (i.e., what they instruct and/or assess)
  4. relationships between learning objectives (e.g., dependencies such as prerequisites)

The following user interface supports curation of the alignment of educational resources and learning objectives, for example:

And the following supports curation of the dependencies between learning objectives (as in a prerequisite graph):

Here is a presentation of similar dependencies from Kahn Academy:

And here is a depiction of such dependencies in Pearson’s use of Knewton within MyLab (cited above):

Of course there is much more that belongs in a pedagogical model, but let’s look at the fundamentals and their limitations before diving too deeply.

Prerequisite Graphs

The link on Knewton recommendations cited above includes a graphic showing some of the learning objectives and their dependencies concerning arithmetic.  The labels of these learning objectives include:

  • reading & writing whole numbers
  • adding & subtracting whole numbers
  • multiplying whole numbers
  • comparing & rounding whole numbers
  • dividing whole numbers

And more:

  • basics of fractions
  • exponents & roots
  • basics of mixed numbers
  • factors of whole numbers


  • There is nothing in the “knowledge graph” that represents the semantics (i.e., meaning) of “number”, “fraction”, “exponent”, or “root”.
  • There is nothing in the “knowledge graph” that represents what distinguishes whole from mixed numbers (or even that fractions are numbers).
  • There is nothing in the “knowledge graph” that represents what it means to “read”, “write”, “multiply”, “compare”, “round”, or “divide”.

Graphs, Ontology, Logic, and Knowledge

Because systems with knowledge or cognitive graphs lack such representation, they suffer from several problems, including the following, which are of immediate concern:

  1. dependencies between learning objectives must be explicitly established by people, thereby increasing the time, effort, and cost of developing active learning solutions, or
  2. learning objectives that are not explicitly dependent may become dependent as evidence indicates, which requires exponentially increasing data as the number of learning objectives increases, thereby offering low initial and asymptotic efficacy versus more intelligent and knowledge-based approaches

For example, more advanced semantic technology standards (e.g., OWL and/or SBVR or defeasible modal logic) can represent that digits are integers are numbers and an integer divided by another is a fraction.  Consequently, a knowledge-based system can infer that learning objectives involving fractions depend on some learning objectives involving integers.  Such deductions can inform machine learning such that better dependencies are induced (initially and asymptotically) and can increase the return on investment of human intelligence in a cognitive computing approach to pedagogical modeling.

As another example, consider that adaptive educational technology either knows or do not know that multiplication of one integer by another is equivalent to computing the sum of one the other number of times.  Similarly, they either know or they do not know how multiplication and division are related.  How effectively can the improve learning if they do not know?  How much more work is required to get such systems to achieve acceptable efficacy without such knowledge?  Would you want your child instructed by someone who was incapable of understanding and applying such knowledge?

Semantics of Concepts, Cognitive Skills, and Learning Objectives

Consider again the labels of the nodes in Knewton/Pearson’s prerequisite graph listed above.  Notice that:

  • the first group of labels are all sentences while the second group are all noun phrases
  • the first group (of sentences) are cognitive skills more than they are learning objectives
    • i.e., they don’t specify a degree of proficiency, although one may be implicit with regard to the educational resources aligned with those sentences
  • the second group (of noun phrases) refer to concepts (or, implicitly, sentences that begin with “understanding”)
  • the second group (of noun phrases) that begin with “basics” are unclear learning objectives or references to concepts

For adaptive educational technology that does not “know” what these labels mean nor anything about the meanings of the words that occur in them, the issues noted above may not seem important but they clearly limit the utility and efficacy of such graphs.

Taking a cognitive computing approach, human intelligence helps artificial intelligence understand these sentences and phrases deeply and precisely.  A cognitive computing approach also results in artificial intelligence that deeply and precisely understands many additional sentences of knowledge that don’ fit into such graphs.

For example, the system comes to know that reading and writing whole numbers is a conjunction of finer grained learning objectives and that, in general, reading is a prerequisite to writing.  It comes to know that whole numbers are non-negative integers which are typically positive.  It comes to know that subtraction is the inverse of addition (which implies some dependency relationship between addition and subtraction).  In order to understand exponents, the system is told and learns about raising numbers to powers and about what it means to square a number.  The system is told and learns about roots how they relate to exponents and powers, including how square roots relate to squaring numbers.  The system is told that a mixed number is an integer and proper fraction corresponding to an improper fraction.

Adaptive educational technology either understands such things or it does not.  If it does not, human beings will have to work much harder to achieve a system with a given level of efficacy and subsequent machine learning will take a longer time to reach a lower asymptote of efficacy.