Thursday, December 6, 2018

per-capita failure (a pre-theory)

Hi all,

I recently posted a short piece describing my thinking about how statistics and science relate to each other. While writing the post, I struggled with whether today's topic fit into the larger 'science/statistics' idea. I eventually decided the best way was to separate the idea into its own post, call it a 'pre-theory', and explore it further today.

Before I begin, I should clarify for the reader what I mean with these 'pre-theory' blogs. 'Pre-theory' is the best phrase I came up with to generally describe my many half-formed ideas. In other words, a 'pre-theory' is like a guess about the rustle you just heard in the bushes. It is obvious something is there and it might even be possible that the guess is completely accurate. But because the truth remains just beyond my field of vision, it is impossible to be entirely sure of just exactly what it is making all noise.

Let's hope it isn't a skunk.

Tim

*********

I once addressed the crucial difference in how statistics and science approached the challenge of aggregation. For statistics, aggregation is like air. Without it, the field would cease to exist. But because of this dependence, statistics is very poor at recognizing the critical nature of certain individual observations. As a field, statistics struggles to reconcile these outliers with the theories or practices established as part of its status quo.

In science, aggregation also plays a vital role. But the right observation made under strictly defined conditions is sufficient to undo thousands of years of knowledge. In science, the right single observation is given veto power to reject centuries of convention. Truth is the final arbiter ahead of others concerns such as consensus or convention and there are no convenient hiding places to conceal the inconvenient fact (such as The 95% Confidence Interval).

I have no illusions that my post about science and statistics broke any new ground (longtime readers will know most of what I post here fails to break new ground). But like the case of the unhealthy eater who regularly clears out the fridge of rotting vegetables or the couch potato who pays each month for the yet-unused gym membership, knowing what to do and actually doing it remain unconnected concepts for many. The end result is aggregation applied to tiny sample sizes or a 95% probability becoming synonymous with absolute certainty. What are the obstacles preventing us all, especially those educated in one field or the other, from throwing our hands up and demanding the methods of science or the tactics of statistics be applied to the right problems?

One possible obstacle I worry about is the regularity with which aggregated statistics are used to uniformly describe its varied component parts. We describe 'the average American' so frequently that we never bother to question if using the collected information about hundreds of millions to describe individuals represents clear thinking. Every time the numbers are added up so that individuals become families, communities become states, and states become US, we lose the critical details required to break the data back down to the individual level. And yet, we still divide the numbers back down, perhaps in one fell swoop with the national population serving as the denominator, to produce neat 'per-capita' representations about the American you will most likely run into next, on average, most of the time.

I remember the first time I thought about this. It was during a high school statistics class. I read aloud the following basic detail about the 'average' American family - it owns two and a half cars! The statistic simultaneously applied to everyone and described no one. But instead of talking about why people seek out such figures in the first place or why no one with any power bothered to challenge a system mass-producing such inane metrics, the finding was laughed away as a quirk of statistics.

Silly stats!

They can lie about anything...

How much cuter can statistics get?

It was a tiny, irrelevant lie, and we all accepted it. Unfortunately, accepting a tiny little lie doesn't make it true. But each time it happens, the slightly bigger fib applauds. Students forced to describe the mythical garage of two and a half cars will be forced to inhabit two worlds. In the first world, they go to mom or dad and ask to borrow the keys for the WHOLE CAR they want to drive. In the second world, they sit in class and describe a world of half-windshields to the fourth decimal place in the name of greater 'precision'.

Like any skill, practice today improves performance tomorrow. Is statistics an exception? No chance, I think, so to me teaching students how to blubber convincingly about misleading metrics in the present prepares students to blubber convincingly about misleading metrics in the future. The truly creative types might even come up with some of their own such metrics!

Since schools tend to follow a certain progression, generally most students learn about increasingly important matters with each new grade. Over time, a properly trained 'statistician' emerges, armed with the tools to describe individuals using 'per capita' characteristics gleaned from averaging the larger groups these people belong to. What else explains the stories of banks just a decade ago, despite having access to all the statistical know-how needed to make good decisions, approving mortgages for prospective home owners without asking for documentation of sufficient income, a good credit score, or substantial savings? I mean, even just one bit of evidence? Well, you see, as a group everyone was good, so individually everyone must be good...

I suspect these sorts of things happen because of how casually... the accordion rhythm... of individual... to aggregate... to per-capita... is pounded... into our heads... from the earliest age. Most people are not statisticians and just go along with the methods they see used by the experts every day.

I understood, sort of, how this worked from an early age based on a simple mistake people made about me. Many referenced my Japanese mother as an explanation for my math skills while others pointed to my active father as the source of my own athletic ability. Those comparisons were perfectly fine. My dad still runs today and I'm sure if I'd asked my mom would have revealed that she did long division in her head fun.

But my mom was all-Japan in high school tennis and my dad's math skills put him into the top percentile among his high school peers. You'd think basic facts about my background available to anyone with the time to ask two questions might be more commonly known. But the per-capita 'statistics' about Asian basketball players were, like the per-capita 'statistics' about Asian math students, more easily obtained. Jeremy Lin didn't become 'LINSANITY' by getting into Harvard.

But even though these little miscues are difficult to understand, they are pretty easy to explain. Most people prefer to make a bad assumption rather than ask an open question. It is certainly true of me and its a habit I'm trying to change. But it has been hard work, no doubt. Making assumptions is easy if only because it excuses asking open questions. Or maybe the hard thing about it is asking the open questions.

A person who becomes accustomed to making assumptions will become good at making assumptions. After some time, this person will have a tough time even coming up with an open question. Curiosity, like many skills, atrophies if left unused over time. And in a world which seems to encourage per-capita thinking, the first casualty is curiosity about individuals.

I worry this is the cause of the many problems created by statistics. Many successful applications of aggregation are tempting its proponents to step outside the field's area of expertise. It's not automatically a bad thing because per-capita metrics are useful for explaining or describing the characteristics of a group in a way accessible to individuals. Per-capita measurements can put huge numbers into context and can be a powerful way to explain otherwise difficult concepts. But as a tool for understanding others, the science of putting everything on a curve often falls flat.