Commercial Intelligence Rotating Header Image

Smart Machines And What They Can Still Learn From People – Gary Marcus

This is a must-watch video from the Allen Institute for AI for anyone seriously interested in artificial intelligence.  It’s 70 minutes long, but worth it.  Some of the highlights from my perspective are:

  • 27:27 where the key reason that deep learning approaches fail at understanding language are discussed
  • 31:30 where the inability of inductive approaches to address logical quantification or variables are discussed
  • 39:30 thru 42+ where the inability of deep learning to perform as well as Watson and the inability of Watson to understand or reason are discussed

The astute viewer and blog reader will recognize this slide as discussed by Oren Etzioni here.

Thiel, Creativity, and Jobs

A colleague at knowmatters.com sent me a an interview with Peter Thiel as I was reading about a battery less transceiver where I found a partial quote from Steve Jobs.  The original quote, which can be found at the Wall Street Journal, goes on to further agree with Mr. Thiel:

Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something. It seemed obvious to them after a while. That’s because they were able to connect experiences they’ve had and synthesize new things. And the reason they were able to do that was that they’ve had more experiences or they have thought more about their experiences than other people. “Unfortunately, that’s too rare a commodity. A lot of people in our industry haven’t had very diverse experiences. So they don’t have enough dots to connect, and they end up with very linear solutions without a broad perspective on the problem. The broader one’s understanding of the human experience, the better design we will have.

As it turns out, my colleagues at Knowmatters worked extensively with Jobs.  I only pursued the quote further because of the stories they have told.

The interview/article states:

Mr. Thiel spends much of his time agitating to change how we educate people and create economic and technological growth. In his book “Zero to One,” written with Blake Masters, Mr. Thiel argues that society has become too rule-oriented, and people need to devise ways to think differently, and find like-minded individuals to realize goals.

And quotes him:

We’ve built a country in which people are tracked, from kindergarten to graduate school, and everyone who is “successful” acts the same way. That is overrated. It distorts things and hurts growth.

This is the “one size fits all” approach to standardized education.  Personalized e-learning promises to disrupt this.  The resulting creativity and growth is aspirational for Knowmatters.

Deep Learning, Big Data and Common Sense

Thanks to John Sowa’s comment on LinkedIn for this link which, although slightly dated, contains the following:

In August, I had the chance to speak with Peter Norvig, Director of Google Research, and asked him if he thought that techniques like deep learning could ever solve complicated tasks that are more characteristic of human intelligence, like understanding stories, which is something Norvig used to work on in the nineteen-eighties. Back then, Norvig had written a brilliant review of the previous work on getting machines to understand stories, and fully endorsed an approach that built on classical “symbol-manipulation” techniques. Norvig’s group is now working within Hinton, and Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone.

Other quotes along the same lines come from Oren Etzioni in Looking to the Future of Data Science:

  1. But in his keynote speech on Monday, Oren Etzioni, a prominent computer scientist and chief executive of the recently created Allen Institute for Artificial Intelligence, delivered a call to arms to the assembled data mavens. Don’t be overly influenced, Mr. Etzioni warned, by the “big data tidal wave,” with its emphasis on mining large data sets for correlations, inferences and predictions. The big data approach, he said during his talk and in an interview later, is brimming with short-term commercial opportunity, but he said scientists should set their sights further. “It might be fine if you want to target ads and generate product recommendations,” he said, “but it’s not common sense knowledge.”
  2. The “big” in big data tends to get all the attention, Mr. Etzioni said, but thorny problems often reside in a seemingly simple sentence or two. He showed the sentence: “The large ball crashed right through the table because it was made of Styrofoam.” He asked, What was made of Styrofoam? The large ball? Or the table? The table, humans will invariably answer. But the question is a conundrum for a software program, Mr. Etzioni explained
  3. Instead, at the Allen Institute, financed by Microsoft co-founder Paul Allen, Mr. Etzioni is leading a growing team of 30 researchers that is working on systems that move from data to knowledge to theories, and then can reason. The test, he said, is: “Does it combine things it knows to draw conclusions?” This is the step from correlation, probabilities and prediction to a computer system that can understand

This is a significant statement from one of the best people in fact extraction on the planet!

As you know from elsewhere on this blog, I’ve been involved with the precursor to the AIAI (Vulcan’s Project Halo) and am a fan of Watson.  But Watson is the best example of what Big Data, Deep Learning, fact extraction, and textual entailment aren’t even close to:

  • During a Final Jeopardy! segment that included the “U.S. Cities” category, the clue was: “Its largest airport was named for a World War II hero; its second-largest, for a World War II battle.”
  • Watson responded “What is Toronto???,” while contestants Jennings and Rutter correctly answered Chicago — for the city’s O’Hare and Midway airports.

Sure, you can rationalize these things and hope that someday the machine will not need reliable knowledge (or that it will induce enough information with enough certainty).  IBM does a lot of this (e.g., see the source of the quotes above).  That day may come, but it will happen a lot sooner with curated knowledge.

Nice AI Video

I found the following video in a recent post by Steve DeAngelis  of Enterra Solutions:

It’s a bit too far towards the singularity/general AI end of the spectrum for me, but nicely done and fun for many not in the field, perhaps:

Enterra is an interesting company,too, FYI.  They are in cognitive computing with a bunch of people formerly of Inference Corporation where I was Chief Scientist.  Doug Lenat of Cycorp was one of our scientific advisors.  Interestingly enough, Enterra uses Cyc!

A good example of the need for better assessment

The Andes Physics Tutor [1,2] is a nice piece of work.  Looking at the data from the Spring 2010 physics course at the US Naval Academy [3] there are obvious challenges facing the current generation of adaptive education vendors.

Most of the modeling going on today is limited to questions where a learner gets the answer right or wrong with nothing in between.  Most assessment systems use such simple “dichotomous” models based on item response theory (IRT).  IRT models range in sophistication based on the number of “logistic parameters” that the models use to describe assessment items.  IRT models also come in many variations that address graded responses (e.g., Likert scales [4]) or partial credit and onto multi-dimensional and/or multi-level/hierarchical and/or longitudinal models.  I am not aware of any model that addresses the richness of pedagogical data available from the physics tutor, however.

Having spent too long working through too much of the research literature in the area, a summary path through all this may be helpful…  There are 1, 2, and 3 parameter logistic IRT models that characterize assessment items in terms of their difficulty, discrimination, and guess-ability. These are easiest to understand initially in the context of an assessment item that assesses a single cognitive skill by scoring a learner’s performance as passing or failing on the item.   Wikipedia does a good job of illustrating how these three numbers describe the cumulative distribution function of the probability that a learner will pass as his or her skill increases [5].   A 1 parameter logistic (1PL) model describes an assessment item only in terms of a threshold of skill at which half the population falls on either side.  The 2PL IRT model also considers the steepness of the curve.  The steeper it is the more precisely the assessment item discriminates between above and below average levels of skill.  And the 3PL model takes into consideration the odds that the a passing result is a matter of luck, such as in guessing the right answer from a multiple choice question.

In the worlds of standardized testing and educational technology, especially with regard to personalized learning (as in adaptive assessment, curriculum sequencing, and intelligent tutoring), multiple choice and fill-in-the-blank questions dominate because grading can be automated.  It is somewhat obvious then that 3PL IRT is the appropriate model for standardized testing (which is all multiple choice) and a large fraction of personalized learning technology.  Fill-in-the-blank questions are sometimes less vulnerable to guessing, in which case 2PL may suffice.  You may be surprised to learn, however, that even though 3PL is strongly indicated, its use is not pervasive because estimating the parameters of a 3PL model is mathematically and algorithmically sophisticated as well as computationally demanding.  Nonetheless, algorithmic advances in Bayesian inference have matured over the last decade such that 3PL should become much more pervasive in the next few years.  (To their credit, for example, Knewton appears to use such techniques [6].)

It gets worse, though.  Even if an ed-tech vendor is sophisticated enough to employ 3PL IRT, they are far less likely to model assessment items that involve multiple or hierarchical skills and assessments other than right or wrong. And there’s more beyond these complications, but let’s pause to consider two of these for a moment.  In solving a word problem, such as in physics or math, for example, a learner needs linguistic skills to read and comprehend the problem before going on to formulate and solve the problem.  These are different skills.  They could be modeled coarsely, such as a skill for reading comprehension, a skill for formulating equations, and a skill for solving equations,  but conflating these skills into what ITR folks sometimes call a single latent trait is behind the state of the art.

Today, multi-dimensional IRT is hot given recent advances in Bayesian methods.  Having said that, it’s worth noting that multiple skills have been on experts’ minds for a long time (e.g., [7]) and have been prevalent in higher-end educational technology, such as intelligent tutoring systems (aka, cognitive tutors), for years.  These issues are prevalent in almost all the educational data available at PLSC’s DataShop [3].  Unfortunately, the need to associate multiple skills with assessment items exacerbates a primary obstacle to broader deployment of better personalized learning solutions: pedagogical modeling.  One key aspect of a pedagogical model is the relationship between assessment items and the cognitive skills they require (and assess).  Given such information, multi-dimensional IRT can be employed, but even articulating a single learning objective per assessment item and normalizing those learning objectives over thousands of assessment items is a major component of the cost of developing curriculum sequencing solutions.  (We’ll be announcing progress on this problem “soon”.)

In addition to multi-dimensional IRT, which promises more cognitive tutors, there are other aspects of modeling assessment, such as in hierarchical or multi-level IRT.   Although the terms hierarchical and multi-level are sometimes used interchangeably with respect to models of assessment, we are more comfortable with the former being with respect to skills and the latter with regard to items.  A hierarchical model is similar to a multi-dimensional model in that an item involves multiple skills but most typically where those skills have some taxonomic relationship.  A multi-level model allows for shared structure or focus between assessment items, including multiple questions concerning a common passage  or scenario, as well as drill-down items.  All of the issues discussed in this paragraph are prevalent in the Andes Physics Tutor data.  Many other data models available at PLSC’s Datashop also involve hierarchically organization of multiple skills (aka, “knowledge components”).

And we have yet to address other critical aspects of a robust model of assessment!  For example, we have not considered how the time taken to perform an assessment reflects on a learner’s skil nor graded responses or grades other than pass/fail (i.e., polytomous vs. dichotomous models).  The former is available in the data (along with multiple attempts, hints, and explanations that we have not touched on).  The latter  remains largely unaddressed despite being technically straightforward (albeit somewhat sophisticated).  All of these are important, so put them on your assessment or pedagogical modeling and personalized learning checklist and stay posted!

[1] http://www.andestutor.org/

[2] Vanlehn, Kurt, et al. “The Andes physics tutoring system: Lessons learned.” International Journal of Artificial Intelligence in Education 15.3 (2005): 147-204.

[3] available on-line at the Pittsburgh Learning Sciences Collaboration (PLSC) Datashop

[4] Likert scales & items at Wikipedia

[5] the item response function of ITR at Wikipedia

[6] see the references to Monte Carlo Markov Chain estimation of ITR models in this post from Knewton’s tech blog

[7] Kelderman, Henk, and Carl PM Rijkes. “Loglinear multidimensional IRT models for polytomously scored items.” Psychometrika 59.2 (1994): 149-176.

Making Cognitive Tutors Easier

There are (essentially) two types of e-learning systems:

  1. Cognitive tutors – which emphasize deeper problem solving skills
  2. Curriculum sequencing  – which don’t diagnose cognitive skills but adapt engagement in various ways

For a good overview, see:

  • Desmarais, Michel C., and Ryan SJ d Baker. “A review of recent advances in learner and skill modeling in intelligent learning environments.” User Modeling and User-Adapted Interaction 22.1-2 (2012): 9-38.

The following is discussed in that article with regard to two approaches to deeper cognitive tutoring.

One of the reasons that cognitive tutors are not more pervasive is that the knowledge engineering required for each problem is significant.  And that knowledge engineering involves a lot of technical skill.

We’re trying to bend the cost curve for deeper (i.e., more cognitive) tutoring by simplifying the knowledge engineering process and skill requirements.  For example, here’s a sentence translated into under-specified logic:

The underspecified quantifier here is a little sophisticated, but the machine can handle it (i.e., the variable ?x7 refers to the pair of angles, not the individual angles).

We’re hoping that a few hundred (or even thousands) of these sentences with the reasoning infrastructure (akin to Watson’s) will allow deeper tutors to be developed more easily by communities of educators.

Personalized e-Learning Styles

Although a bit of a stretch, this post was inspired by the following blog post, which talks about the Facebook API in terms of learning styles. If you’re interested in such things, you are probably also aware of learning record stores and things like the Tin Can API.  You need these things if you’re supporting e-learning across devices, for example…

Really, though, I was looking for information like the following nice rendering from Andrew Chua:

There’s a lot of hype in e-learning about what it means to “personalize” learning.  You hear people say that their engagements (or recommendations for same) take into account individuals’ learning styles.  Well…

Continue reading →

Wiley PLUS and Quantum

I was recently pleased to come across this video showing that Quantum has done a nice job of knowledge engineering in the domain of accounting with Wiley.

Most of the success of cognitive tutors has been confined to mathematics, but Quantum has an interesting history of applying knowledge-based cognitive tutoring techniques to chemistry.  In this case, they’ve stepped a little away from abstract math into accounting.

It was a little surprising, but then not, to learn that the folks of Quantum are AI folks dating back to Carnegie Group.  They’re based here in Pittsburgh!

Nice work.  The question I find myself wondering is whether they’ve changed the cost curve for cognitive tutors….

Authoring questions for education and assessment

Thesis: the overwhelming investment in educational technology will have its highest economic and social impact in technology that directly increases the rate of learning and the amount learned, not in technology that merely provides electronic access to abundant educational resources or on-line courses.  More emphasis is needed on the cognitive skills and knowledge required to achieve learning objectives and how to assess them.

Continue reading →

Cognitive modeling of assessment items

Look for a forthcoming post on the broader subject of personalized e-learning.  In the meantime, here’s a tip on writing good multiple choice questions:

  • target wrong answers to diagnose misconceptions.

Better approaches to adaptive education model the learning objectives that are assessed by questions such as the following from the American Academy for the Advancement of Science which goes the extra mile to also model the misconceptions underlying incorrect answers: