In Big Data: A Revolution That Will Transform How We Live, Work and Think, Viktor Mayer-Schönberger and Kenneth Neil Cukier write:
"Society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what. This overturns centuries of established practices and challenges our most basic understanding of how to make decisions and comprehend reality."
Really? I don’t think so.
Not in risk management.
Not if the financial crisis taught us anything at all.
You may recall that naïve reliance on unrealistic models of collateralized debt obligations blew a mile-wide crater in the financial system.
And models based only on simple correlations are as naïve as they come.
Looking hard at big data
I don’t doubt that big data can be powerful--in some contexts.
Big data helps Amazon understand how its customers respond to different offerings and this knowledge is applied to increase product sales. Big data helps Google improve the relevance of its search results and increase its ad revenue. Facebook uses its wealth of data on its members to sell ads that are tailored to each member’s revealed preferences.
But naïve correlation-based models are utterly dependent on the persistence of correlations over time.
From Amazon’s perspective, my preferences for books appears stable over time because I like mystery stories and buy them frequently. I never have--and never will--buy books on knitting.
Amazon does not need to know why I behave this way. It only needs to know that I will continue to behave this way in the future. That’s a pretty good bet. And even if they are wrong, it won’t cost them very much.
Unfortunately in banking, reliable persistence of correlations is the exception, not the rule.
How much would you pay for a credit scoring model that says “Mr. Brown has made his mortgage payments every month for the past three years, therefore we predict that he will make all his future mortgage payments”?
How much would you pay for an asset allocation model that says “stocks have gone up for the last 6 months while the US dollar strengthened, therefore we recommend that you sell all your bonds, invest 100% in stocks and short the dollar”?
These examples may seem silly.
However, the same fatal flaw of unreliable correlations may be hiding in some “sophisticated” big data models.
A lesson from the past
Long ago, when I was doing risk modeling, the benefits of diversification were well known. But knowing how to diversify was a black art.
I decided I could improve upon the crude rules of thumb that portfolio managers were using at the time. So I commissioned a statistical correlation analysis of loan defaults across different categories of loans.
Using these correlations, I thought, would tell me which loans should be under-weighted in the portfolio and which loans should be over-weighted and by how much.
The analysis came back and it showed that, indeed, some loans had high correlations with other loans and some loans had low correlations with other loans.
I thought I had cracked the code … until I started asking questions.
What historical time period was used? Answer: “The last five years.”
What about the previous five year period? Answer: “Oh, yes we ran that too, but the correlations were completely different.”
Fortunately, I abandoned that project before any harm was done.
The economy isn’t a book preference
One reason that so many correlations in finance are unstable over time is simply that the very structure of economy is always changing.
The conditions that will produce next year’s correlations may be quite different than the conditions which produced last year’s correlations.
If the economy’s structure was fixed, like a roulette wheel, predicting the odds of future events by looking at the frequency and correlations of past events might yield good risk estimates.
But the structure of the economy is not fixed.
Changes in technology; regulation; demographics; politics; social trends; trade relationships; economic growth; and animal spirits all combine with many other factors to frustrate the search for reliably persistent correlations.
When playing roulette, you know all the possible outcomes and the probabilities of those outcomes. Risk assessment is a purely statistical exercise.
When playing the economy, you do not know all the possible outcomes and the probabilities and correlations of possible outcomes are ambiguous. Next year may not be at all like last year.
More data may not help at all. Risk assessment is not just a statistical exercise, it is also an exercise in applying seasoned judgment to bridge the inevitable gap between the statistical world and the actual world.
Skate to where the puck is going, not to where it has been.