Reading Notes: Weapons of Math Destruction

My first non-fiction of the year is one of the often talked about books in the data circle, Cathy O’Neil’s Weapons of Math Destruction. I don’t necessarily like the fear mongering metaphor, but the book does point out serious repercussions of our society’s increasing reliance on data and algorithms to make decisions that impact people’s lives.

O’Neil’s definition of WMDs must meet three criteria — opacity, scale, and damage. The examples that she gave include models that access:

  • Recidivism risk for prisoners
  • University rankings
  • Teacher quality
  • Credit risk
  • Insurance premiums
  • Facebook news algorithms
  • Ad targeting
  • Social credit system

The chapters are generally organized to give a brief history of the impetus for the model, its initial envisioned use, then moves on to describe the hidden biases that may not have been apparent to the designer and finally its disproportional effect on a certain sub group. It’s especially alarming because almost all of these examples are things that directly affect me, and for the most part, I have benefited from the algorithms.

Not that I don’t realize the algorithms are imperfect. About 10 years ago, there was a credit card balance of $15.95 that I forgot to pay for 3 months (I never used the card and simply forgot it was not on autopay) that significantly dropped my credit score. I fought to remove it to no avail, and it took 7 years for that record to finally be expunged from my credit history. If anyone cared to look at the details, they would see that it was simply a careless error and does not speak to my financial health or my ability to pay back loans. Thankfully, I did not have to take out any loans during that period, or I would have been screwed simply because of an oversight (a sixteen dollar mistake that could have cost me tens of thousands in higher interest payments). As a finance and accounting major, I would say I’m definitely above average in terms of my financial savviness. If I can make a mistake like this simply by oversight, just imagine the people who are less educated in these matters, and in addition have a lot more things to worry about in their lives.

I guess I’ve already forgotten that lesson now that my credit score is back up. It’s easy to agree to a system that works in your favor. This book points out a lot of the issues that don’t affect me as fortunate as me, but can be significantly devastating for the already disenfranchised. For example, credit scores can count as a higher portion of your auto insurance premium assessment than your driving record, and people who live in poor areas with lower credit scores would be expected to pay high premium simply because they don’t have much choice (and need to drive to get to places). Same thing with ad targeting. I remember telling a friend that I no longer care about giving away data to have targeted ads because at least I’m getting ads that are tailored to my preference. However, it did not occur to me certain predatory industries such as for profit universities and payday loans use data to target poor people who may not realize they have other choices.

Two ideas in the book are especially thought provoking for me. One is the “birds of the same feather” theory, where a lot of predictive models use limited data to extrapolate tendencies of a small group to the larger group. I actually just encountered this at work, where a consulting company created an algorithm that was supposed to group our customer base into four segments. However, when this algorithm was used on the original data set, the accuracy was only 40%. That means more people would have been put into the wrong group instead of the right one! Even with the most accurate models that have a accurate prediction rate of 90%+, what about the 10% of people who end up getting screwed because they happened to have made some mistakes in the past, or lived in the wrong zip code, or simply didn’t provide enough data to land in the right bucket?

The other idea, which I never thought about too much, is the increasing accuracy in terms of predicting individual risks when it comes to insurance. The whole idea of insurance is based on pooled risk, where a group of people basically share the burden of paying premiums to guard against the potential risk of a few of those people needing help some point down the line. As more health data and predictive attributes are provided to the insurance companies, the companies are segmenting users into more buckets, and the group with the largest risk will start seeing higher and higher premiums, or denial of coverage. This goes against the purpose of group insurance, and eventually we might see a time where the healthy people decide they don’t need insurance at all, and the people who actually need it wouldn’t be able to afford any.

As a data scientist, the author does not think we need less data, or less modeling. She just thinks we can do a better job of monitoring what’s being used, and not let these models be yet another way to perpetuate the victimization of the most vulnerable groups in our society. Some of the things that she proposes are better (or any) regulation, ethical considerations, and at the root of it all, reconsidering the key metrics of these models, or perhaps our society’s focus. Since after all, these models are doing what they are supposed to do, maximize revenue. If we want more than that though, if we want to give people opportunities to lift themselves out of bad situations, then we have to give up some of that short term revenue to look towards long term goals.

I had an interesting conversation with some friends on New Year’s eve regarding the social credit system China’s currently testing. The Black Mirror episode “Nose Dive” depicted some of the potential effect of such a system, and during the debate, my conclusion was that it’s really the implementation of the system that causes such negative effects, not the idea of the system itself. I guess as a data person, I’ve always believed in the power of more information. I still think that data is a powerful tool that can do great things, positive things. But like any newly found tool, it’s important for us to think about the ethical implication that accompanies wielding this new power. Perhaps there is no rush to push forward at lightning speed to turn our entire society into a data driven one if is means leaving those with no voice and no power behind, once again.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.