Super Crunchers: predictive analytics is not enough

Ian Ayres, the author of Super Crunchers, gave a keynote at Fair Isaac’s Interact conference in San Francisco this morning. He made a number of interesting points related to his thesis that intuitive decision making is doomed. I found his points on random trials much more interesting, however.

In one of his examples on “The End of Intuition”, a computer program using six variables did a better job of predicting Supreme Court decisions than a team of experts. He focused on the fact that the program “discovered” that one justice would most likely vote against an appeal if it was labeled a liberal decision. By discovered we mean that a decision tree for this justice’s vote had a top level decision as to whether the decision was liberal, in which case the program had no further concern for any other information.

Credit lemmings

Mr. Ayres went on to rationalize such simplistic criteria as indicating that humans tend to underestimate the importance of key variables, in this case the disdain of the Supreme Court for the Ninth Circuit. Note that Mr. Ayres rationalization is closer to the truth than the model, but the model was good enough to do better than experts at predicting incomes. Judge for yourself whether you are comfortable with his conclusion that we should therefore trust such models, even if they lack depth or intuitive aspects, to make decisions for us.

Fair Isaac and others in the predictive analytics / decision management space would have us commit our fortunes to such models, of course. That is their position and it is not completely without merit. But the lemming approach of believing in models is behind the credit crunch and mortgage implosions. Mr. Ayres mentioned this but not what had gone wrong with his thesis. Maybe in the book…

When models fail

Things change. And models are never perfect in the first place.

One model does not best fit all.
Reality is not stationary.
There are special cases.

How many models do you need?

As an example of one model not fitting all, consider how many different kinds of people are there. There might be one kind of person, in which case a score, perhaps even a FICO score, would be enough to describe how good or bad that person is, such as concerning credit worthiness. If there are several kinds of people, we need to identify what kind of person is at hand and apply a model that is specific to such people. If we don’t have a separate model for different kinds of people, then some people will lay outside the assumptions that a model must make in order to have a prediction with a useful probability.

Note that Mr. Ayres was emphatic about the need for probabilities or other statistic measures describing the uncertainty of a conclusion produced by a model. For example, a FICO score might be better expressed as the probability of default with a range, such as the standard deviation of that estimate. I intend to address this point in more depth separately.

So, one model does not best fit all. If there is only one score its uncertainty should increase as the person at hand varies from the population used to develop the model, whether there is one model or many. But how many models are needed? Ideally, there would be one model for each different kind of person. This brings us back to how many different kinds of people there are. Answering this question is obviously problematic. Failing to answer it is also problematic. There are two things to consider:

Unsupervised learning techniques, such as clustering, can answer the question objectively.
The number of different types of people changes over time.

Models are rarely stationary

The number of different types of people changes as the demographics of a population (e.g., our customers or web site visitors) changes. We either understand their differences and similarities or our models will underperform, especially if they do not convey any uncertainty in their recommendations.

Predictive analytics may use clustering as an early step in modeling.
Adaptive decision management benefits from real-time clustering.

The evolution of people and demographics over time as an organization, its markets, and the more global political economy changes over time is only one example of change over time. For example, what caused the mortgage crisis and credit crunch? It was less a change in people and more the dynamics of bubble growing and bursting with respect to credit risk versus return.

Predictive modeling without risk modeling is dangerous

In conversation with the director of analytics at one of the top insurers afterward, we discussed that a good modeler would have avoided much of the mortgage implosion by having segmented the mortgage market into clusters of boom and bust rather than a relatively short window that covered only the boom in real estate over the prior ten years. In his talk, Mr. Ayres also mentioned the implosion of quant funds that take his super-crunching approach but he did not address the need for the model to conservatively predict and how this was missed by many quants. The answer is that they did not adequate cluster market conditions, just as we need to cluster people as discussed above, into boom and bust markets, for example. Many quants’ models did not go back far enough to consider the possibility of conditions like those during the savings and loan crisis.

The preceding paragraphs discuss the need to know (or discover) how many different models are needed for each cluster of distinguishable circumstances. In addition, they address how such clusters may shift in definition or relevance over time. In principle, the need for multiple models to cover clusters is independent of the shift in models over time, even if in practice they are correlated.

Models are generally smooth or approximate

The third challenge to accurate modeling above is the existence of special cases. In the Supreme Court, for example, some justices have well established positions on certain issues, such as abortion or federalism, for example. Being able to override the predictive analytic model with rules is a critical improvement, not only for compliance but for incrementally improving the performance of models in ways that are operationally relevant but beyond the ability of statistics or other machine learning to discover, especially given limited data or sudden change.

Incrementally reducing errors of models is an obvious application of rules. As another example, consider a stock selection model that identifies a stock that is outside of a discrete boundary in which you are willing to invest. Either you have to teach the model learning algorithm that you don’t want to consider such targets by providing lots of negative examples, or you simply tell it with a rule.

Rules can select models or make them more precise

Rules are a very convenient and straightforward means of reducing the false positives of any model. And introducing such rules incrementally, as false positives are discovered, is extremely straightforward. Rules can also address situations where a decision should be positive but the model produces a negative result (i.e., false negatives). The problem here is that it is typically more difficult to identify when a decision not to take action was wrong. For example:

In Fair Isaac’s case, learning that the denial of a credit card or mortgage was a poor decision is unlikely.

It’s up to us to determine whether to tighten up the model so as to minimize false positives and thereby increase false negatives or to get a good enough model and then to manage false positives downwards using rules to handle special cases. In either case we can model false negatives upwards using rules, if we have the insight, or if the impact (e.g., lost potential) of a negative decision can be measured.

Random Trials

As I mentioned, Mr. Ayres most interesting points had to do with random trials. The idea is that experimentation with bounded risk allows insights to be discovered, even when they are counter-intuitive. As examples, he showed e-commerce trials that the audience agreed would fail, but where experimentation revealed excellent performance. Although most of his examples were web-oriented, his point is irrefutable. To quote him:

If you are not experimenting using random trials then you are presumptively screwing up.

Champion Challenger

Fair Isaac tries to address this with its concept of champion / challenger. This is a good, but limited approach. Here’s how it works:

Come up with a new rule that makes a different decision under certain circumstances.
Choose a percentage of the times the rule should be applied when it is applicable
1. i.e., # times to be executed / # times applicable
Choose a metric for comparing outcomes of the alternative to the original decision.
Measure the outcomes using the metric across a number of decisions.
Decide whether to use the suggested rule 0% or 100% of the time.

The problem here is that there is no learning or predictive analytic benefit. Everything is manual.

Google Analytics versus trials

Mr. Ayres used Google’s web page optimizer and its support for trials of web ads to demonstrate the benefits of coupling random trials with adaptive decisions management (ADM). Specifically, Google takes weights on which ad to prefer and then it adaptively decides which one is more effective. It doesn’t stick to the original estimates, as champion-challenger does. Google changes the weights as experience indicates. In effect, Google facilitates experimentation by allowing low weights to be given to new ideas. The ideas with merit will naturally evolve into preferences.

Predictive is not adaptive

ADM is focused on closing the loop between random trials and an improving decision process. In ADM, any number of trials can be experimented with concurrently according to an initial probability distribution. And the adaptive aspect of ADM automatically learns how to adjust that probability distribution based on outcomes. In effect, ADM takes the next step from Enterprise Decision Management (EDM), where leaders like Fair Isaac are integrating predictive analytics with rule-based decisions.

Unlike EDM, ADM actually closes the loop. EDM gives us the tools, but we still have to drive the nails all on our own.

Mr. Ayres is right on the money with random trials, but recent market experience demonstrates how risky it can be to trust predictive modeling without a deep understanding of clusters and market conditions and how they change over time.

A panelist from a major UK bank later reminded us of the old financial services adage:

Past performance is not an indicator of future results.

The bottom line:

Predictive analytics may help make good decisions but adapting makes decisions better.