True On Average: more or less about my tb test

Morning,

In the past six weeks, I've taken two TB tests (*). The tests are really simple- you get a small injection of whatnot and then, 48 to 72 hours later, the injected area is examined by a nurse. If there is a significant reaction, your test is positive, and you have to follow up with a chest X-ray. If it is negative, you are disease-free and exempt from testing for another year.

*No need to ask why. And TB stands for tuberculosis, BTW- which stands for 'by the way', FYI- which stands...

The first test went fine- left arm, no reaction. The second test, just a month later, should not have happened. It went fine, but not as much- right arm, a reaction, but not enough to be 'positive'. No X-rays for me this year. Still, the nurse made a note on my form about it before I went on my merry way.

On the way home from my second test, I listened to what is the 'objectively best' podcast I listen to, More Or Less, a BBC produced show that primarily dissects how statistics are presented in the news (1). This particular episode broke down Simpson's Paradox, a statistical phenomenon where trends in subgroups disappear or reverse when combined to form a larger group.

The episode used an example of possible gender discrimination to illustrate the concept. This example was perhaps oversimplified yet made the intended point about how easily this paradox can come about.

The example went like this- suppose you are in charge of a chorus and your next assignment is to fill eighteen open positions. Twelve of the positions require lower voices, which tend to attract male candidates, while the remaining six require higher voices, which tend to attract female candidates.

You post the positions and, as the applications pile up, put in charge the most unbiased hiring manager you find to fill the roles based solely on singing ability. Thanks to the distribution of applicants, you end up with the following results:

12 spots- lower singing voice

8/16 male applicants hired (50%)

4/8 female applicants hired (50%)

6 spots- higher singing voice

4/8 female applicants hired (50%)

2/4 male applicants hired (50%)

Almost an ideal scenario from the perspective of a meritocracy- each subset of candidates was hired at a 50% rate.

But the next day, you open the newspaper and see- 'Chorus employing discriminatory hiring practices against women' with this breakdown:

18 open positions

10 men hired = 10/18 = 55.6%

8 women hired = 8/18 = 44.4%

Oh boy (*).

*Let's try it from another perspective- let's suppose the following represents the racial breakdown-

12 spots- lower singing voice
9/18 white applicants hired (50%)
3/6 non-white applicants hired (50%)

6 spots- higher singing voice
5/10 white applicants hired (50%)
1/2 non-white applicants hired (50%)

So far, so good...

18 open positions
14 white hires = 14/18 = 77.8%
4 non-white hires = 4/18 = 22.2%

I'm sure there is no need for my speculation on the headline those numbers would produce.

So, what went wrong?

Simpson's Paradox, I suspect, is like many formally named phenomena encountered in fields like statistics- a narrowly defined application of a generalized problem. In this case, the general problem is inconsistent denominators. The example merely highlights the need to keep denominators consistent when making comparisons or conducting analysis.

The 'damning' statistic presented above uses total open positions as the denominator. The denominator relevant to the hiring manager is applicants for a given open position so the influence the hiring manager has on the 'open positions' denominator is, at best, indirect. Use of 'open positions' as the denominator fails to accurately reflect decisions made within the hiring process- unless the hiring process takes the status of other open positions into account, which in this example was not the case.

A more accurate reflection, or at least one which fairly represents the approach of the hiring manager, would compare final hires to the total applicants.

18 total hires / 36 total applicants

10 men hired / 20 male applicants = 50% male applicants hired

8 women hired / 16 female applicants = 50% female applicants hired

This breakdown might be 'more true' but it doesn't change the ten-eight male-female ratio. Statistics is tricky in this way because it is never clear which of many interpretations is the most truthful illustration of a given situation.

What role does the posting of positions play in this story? Is the manner in which more open positions were posted that 'tend' to attract male candidates evidence of bias from those running the chorus? I suppose it is possible but I don't know for sure (even if it is my hypothetical).

The angle I find more interesting here is to suppose the best, most unbiased intent among the parties involved. One person posts the positions by looking only at the needs of the chorus. The other hires solely based on singing ability. And yet, as we see in the hypothetical, the numbers might come out skewed, hinting at a bias that only emerges when the individual initiatives of two unbiased individuals act in concert (*).

*Pun intended, of course. The pun is always intended around here.

What I like most about the Simpson's Paradox example is how simply it highlights the possibility of even the best intended individual actions aggregating up to system-level problems. Understanding the example makes it easier to appreciate both the magnitude our actions have in the world and how little influence we might have over those impacts.

My mini-adventure with TB testing highlights the idea from a different angle. In this case, the goal of preventing those with TB from further involvement in a certain organization was met imperfectly- those with TB would be caught through multiple screenings at a cost that is likely higher than an alternate method.

In this case, individual units within the organization, each likely tasked with doing its fair share to helping the organization reach this goal, acted in the unit's best interest and ended up overlapping or duplicating the efforts of a parallel unit. Although the goal was met, the aggregate of the unit actions is a higher overall cost to the organization.

As I stated at the top, the second test should not have happened. But that is a remark growing entirely out of my own point of view and it is hard to see what changes could prevent unnecessary second tests in the future. The organization, given how it divides itself into units, is opting through its design to suffer duplication errors rather than risk missing out on testing someone entirely. The cost, although certainly higher to the organization, is also passed on to me, someone outside the organization, so the decision makers do not feel the true cost of their system's setup (2).

The argument for the US government continuing to increase its role in health care (and join most of the rest of the developed world in the process) invokes examples like mine to highlight the easy cost reductions a streamlined system could achieve- instead of testing twice, test once, and cut costs in half. Simple.

But health care, or perhaps the financing of health care bills, tends to find much more complex examples. Decisions about research, insurance, end of life care, and much more will increasingly rely on statistical analysis to state their premise and sway the undecided. Having studied the subject as my college major (3), I am very much aware of its appeal and will even concede its occasional usefulness.

But, I think statistics is much more frequently used to manipulate, confuse, or distort. It is a tool often wielded by those with no educational or practical background in the subject. The beauty sometimes found in mathematical proofs and theorems is double-edged in the way a figure can seduce someone looking for a simple answer to a complex problem.

What I have always liked most about the More Or Less podcast is its trust of the listener. The process of un-crunching the numbers leaves only the facts. At this point, the listener must decide what is true. It is both educational, even for someone like me who has some grounding in statistical concepts, and refreshing in the way that it strips the sensationalism out of the newsworthy to bring the discussion back to its basic truths.

And sometimes they talk about soccer, too, which I appreciate.

Back on Friday with a quick look back at a couple of life changing books.

Tim

Footnotes???

1. 'More Or Less' tangent...

'Objectively best' is essentially a meaningless term, so I'll clarify- I think this podcast consistently produces the most useful information per episode among the podcasts I listen to. It does not necessarily always entertain, though it usually does, and it does not strike me as over-produced, which I admit for some reason I consider a plus.

It doesn't hurt that, being a UK based show, the shows often have enough clever little puns to keep me interested, either.

2. When the cost is passed along tangent...

One way to manage a system is to track the costs closely and take action when the value generated by a component of the system is offset by the cost. Sounds almost like a perfect approach, no?

One problem with the approach is the underlying assumption that tracking costs is done accurately. Another way to look at the same problem is the assumption that value generated is correctly attributed to the merits of the successful system.

The government's system to addressing income inequality is an easy example to pick on and allows me to highlight both of the above ideas at once. US history is filled with countless examples of various government initiated programs and policies with the explicit aim of reducing income inequality by addressing both ends of the equation (reducing poverty and redistributing compensation earned in excess of the wealth created). But on the other hand, the government continues to allow state sponsored lotteries, an institution whose consistent outcome (*) seems to be increasing income inequality.

*Inconsistent outcomes might include development of or preying on gambling addictions. Help is available, though, according to the scrolling text at the bottom of Keno screens.

If you believe, as I do, that government does genuinely wish to reduce income inequality, the existence of state-run lotteries is a constant challenge to that conviction. I imagine the counter-argument from someone adopting the opposite point of view would contend that such lotteries are entirely unrelated to policies addressing income inequality.

It is hard for me to accept this hypothetical argument because a lottery that distributes massive payouts to a few individuals at the cost of dollars to many is the exact opposite of how I understand the concept of addressing inequality. It also ignores the interconnected nature of government programs- lottery tax revenues fund policies which redistribute wealth and treating gambling addictions comes at a cost to government-funded programs.

3. Mathematical sciences tangent...

Sort of. At Colby, Mathematical Sciences is a major that was identical to Mathematics with one exception- instead of a 400-level (pure) math class, the math-science major could instead take a 200-level statistics class. This makes it the closest thing to a statistics major I could have done, if memory serves me correctly, but does not actually make it a pure statistics major.

Postscript? Cost or downside tangent...

When the light turns green (*), we go, because on average, it is safe more than 99% of the time. If everyone stopped, got out of their cars, looked both ways...(**)

*Or blue?

**The world would be much safer but we would have more traffic?

At some point, we proceed with less than full information because the cost of acquiring additional units of information becomes too high. Without this ability to act in uncertain circumstances, many of us would become paralyzed in the face of a decision with too many options.

The other angle is downside. If I am wrong about my action, what is the likely result? In cases where the downside is minimal, the action is justifiable.

The TB test story is a combination of the opposite of the two above points- the cost of administering additional screenings is very low while the downside of my having TB is very high- so it is not particularly surprising for such a thing to happen nor should there be any expectation that a massive effort to improve this system is coming around the corner.

The only way to ever really know for sure is to gather more information. Unfortunately (or perhaps not) most questions cannot ever be fully answered- we can only approach the full answer before deliberations become debilitating.

Tuesday, May 17, 2016

more or less about my tb test