Monday, July 24, 2023

leftovers - the bias spotlight on the dumb smart trade (qualitative and quantitative evaluation methods)

A final thought here that spans the previous two posts. It occurred to me that one issue in evaluating someone like Smart is how his best qualities as a basketball player are not easily measured by the sport's conventional data analysis methods. This means comparisons against players with more easily measured skills will never quite reach the apples-to-apples standard. In a sense, when you evaluate a player like Marcus Smart you actually need to evaluate your evaluation methods before you can reach a meaningful conclusion about the player.

This obviously makes little sense so let's try a metaphor of measuring students. A common method is using a letter grade for each course, which is then translated to a cumulative GPA. If you have one student with a 3.6 GPA and another with a 3.3 GPA, the scores offer a strong suggestion of the better student. But what if the 3.3 GPA student also took one class per semester pass/fail? All of a sudden, you need to evaluate the evaluation method - how does the pass/fail detail influence the overall evaluation of the student? I see players like Smart as the equivalent of the 3.3 GPA student with a handful of "pass" grades where we know "pass" puts him at the top of the cohort. How do you compare him to a player whose top attributes are supported with clear data (essentially, the equivalent of a 3.6 GPA) yet we know that despite having identical "pass" grades those should carry less weight than Smart's?

The right answer here is probably that you can't make a meaningful comparison, at least with data alone. The wisest course of action would be to reduce the importance of data in the decision. However, I think the data analytics mentality encourages pushing on rather than stepping back, and this insistence is where the method approaches one of its shortcomings. What if we came up with a way to measure hustle, or dedication, or leadership, with more precision than pass/fail? Why can't we translate pass/fail grades into a GPA-type score? These are reasonable questions and their implied directions deserve some consideration. However, there is a point where some factors should be acknowledged as inherently qualitative, and it's a major mistake to treat these as quantitative just because you've devised a method for assigning a numeric value. In other words, what I consider the biggest failure in the data analytics mentality is the inability to recognize situations where quantitative data is essentially meaningless because a qualitative method was used to create that data (1).

If you always push forward and try to refine the data such that the numbers will have the last word then you become a bit like the drunkard refusing to search outside the visibility of the street light. What gets lost in the absurdity of the drunkard's tale is a subtle commentary on ideology - by standing under the light, the drunkard demonstrates his conviction that without lighting he will never locate the keys. This means other approaches that do not fit within the ideology - such as a thorough retracing of steps - are dismissed out of hand. Likewise, being data-driven has its benefits as long as the data itself remains a reliable source of unbiased information. But if you don't have a way to include other evaluation styles in your decisions, then you allow a certain bias to infiltrate your evaluations - you will tend to prefer options whose strongest qualities just so happen to be the most easily measurable. If this is the way you make data-driven decisions then what you have is an ideology, not a methodology, and you will find that rather than eliminating bias you are simply trading one form of it for another.

Footnotes

1. TOA was well-reviewed, 4 stars out of 5 

I notice this problem during your average corporate performance review, where a manager will be asked to evaluate a team member using a numerical scale of 1-5 for essentially subjective skills such as "communication" or "empathy" or "wastes company time writing TOA". There are some aspects of these (or any) skills where a numerical measurement would be appropriate, but I don't think anyone will convince me that an overall evaluation distilled into a single number would have any meaningful value whatsoever, particularly if that number is used to make comparisons against peers in discussions regarding recognition, bonuses, or promotions.