16 May 2014

Conflict of Interest

A theme of these endeavors, which emerged from them rather than motivated their creation, is that when policy conflicts with data, policy wins and data loses. Now, this observation is, generally, not true when data of the physical world is the subject. Brownian motion and microarrays spring to mind. In general, then, one can observe that data about physical processes is mostly unambiguous, in the sense that the data reflect rules which are external to the process of interest. You can't fool Mother Nature (or God, if you're so inclined).

Data about human processes is a whole other story. As The Great Recession demonstrated, both the data and the rules of engagement of the activities behind the data are subject to change by the humans exercising the engagement. Both data and rules are fungible. In such a circumstance, is data analysis even worth the trouble? To earn a paypacket, I suppose so. To understand, better, the real world, perhaps not so much.

I've been making a desultory trip through Kuhn and Johnson and Hastie, et al over the last while. In a nutshell, the former is more narrative while the latter is more algebra.

What's germane here are a couple of snippets from K&J Introduction (some, courtesy of my ten fingers). First, they quote from Rodriguez:
Predictive modeling, the process by which a model is created or chosen to try to best predict the probability of an outcome, has lost credibility as a forecasting tool.

Rodriguez continues his paragraph thus:
Overly simplistic models have failed to account for the sheer complexity of human interaction and the degree to which most people behave irrationally. Most predictive economic models presume that people behave rationally most of the time, a premise which is terribly flawed but which serves as the intellectual foundation of many current economic models (See the Wall Street Journal article on this issue, here).

But they then end the section thus:
While the primary interest of predictive modeling is to generate accurate predictions, a secondary interest may to interpret the model and understand why it works. The unfortunate reality is that as we push towards higher accuracy, models become more complex and their interpretability becomes more difficult. This is almost always the trade off we make when predictive accuracy is the primary goal.

IOW, we know this is crap, but we're going to do it anyway. I'll grant that the authors work in biotech, not finance, so they're (likely) coming at the issue from a justifiable Mother Nature perspective. Still.

This imposed bifurcation didn't exist when I sat my stat and econometrics classes. What was clearly understood then was that human processes aren't Brownian motion, and that prediction was restricted to the range of the independent variable(s). Over the years, quants have been willing to ignore that last restriction and predict way beyond the range of data. It turned out to be a lucrative practice. Until it blew up the world. One of the reasons they gave us The Great Recession. Today mostly looks like yesterday, and so on.

As more failed out physical scientists moved into the social sciences (the maths are easier over here, and many social scientists don't bother to look behind the curtain), they brought with them the paradigm of inviolate rules of engagement. Thus, house prices can continue to rise beyond reason because they have been and there must be some mechanism, which we don't understand and don't need to understand, which makes it all work out just fine. After all, The Invisible Hand is just like Mother Nature, yes? Until it doesn't.

Without understanding, we get leeching to cure disease and virgin sacrifice to palliate The Gods. Which brings us Krugman today.
I've been thinking a lot lately about the power of doctrines -- how support for a false dogma can become politically mandatory, and how overwhelming contrary evidence only makes such dogmas stronger and more extreme.

He's ranting about Republicans and climate change (mostly), but the point is more general. I've long forgotten which class/teacher/professor said it but, "Data doesn't displace a theory, only another theory does that". Or words to that effect. Contradictory data may bring a theory into question, but doesn't replace that theory. The problem with policy and data is that policy doesn't rise to the importance of theory. Policy is merely imposition of will. Sometimes that will is of the majority. Sometimes not.

No comments: