Weapons Of Math Destruction: Cathy O’Neil on how people go wrong with Big Data

The Hype Cycle, beloved by technoskeptics such as myself. Big Data is somewhere around the “Peak of Inflated Expectations” point–maybe just starting down towards the trough. Picture source: https://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg
It’s tough to read technology news these days without hearing about the wonders of Big Data and how it’s going to revolutionize our world.  Apparently it will soon predict epidemics, prevent terrorist attacks, and boost farm production.

In truth, though: it’s not so clear that it’s a great thing.  One of the problems with Big Data is a special case of a general problem in the ethics of technology: the kinds of things that can go wrong when the public perception of how well/poorly technology performs doesn’t match well with the truth. In particular: when the public thinks that technology performs way better than it does.

You will occasionally hear people talking about how algorithms are going to take our jobs, bring about the zombie apocalypse prematurely, etc. More commonly, technology gee-whizzers will tell you the opposite: that they will remove bias and introduce complete objectivity to sentencing guidelines, for instance.  In fact, an algorithm is nothing more (or less) than a defined set of procedures.  In the case of an algorithm for computing, it’s typically a set of calculations. An algorithm can’t be biased. It can’t be unbiased, either. The data, though: that can be biased. An example from the interview: train an algorithm to evaluate resumes from applicants for jobs at an engineering firm. You could imagine training it with the resume of everyone who has ever been hired in the past, and the following piece of information for each person: whether or not they were a successful employee. If the engineering firm is a typical one, those previous hires are mostly going to have been males. Now the program learns the characteristics of a successful hire, and among other things, the program will conclude that a successful hire is going to be a male, since that’s all that it’s ever seen. Is the algorithm biased?  No. Is the person who programmed it biased?  No. What’s biased?  The data. Not biased in the way that a person is biased–rather, biased in the statistical sense: not every member of the population had an equal likelihood of being included in the training set.

Where people get seduced by things with the Big Data label on them is by the bigness. Most people know that the bigger your data set is, the more reliable the statistical model that comes out of it will be. A lot of people look at Big Data and think: there’s a LOT of data, so it’s GOT to be good. That’s where the trouble comes from.

I like this interview because it’s neither a gee-whiz-this-technology-is-so-great story, nor an ignorant oh-my-God-the-data-miners-are-going-to-kill-us story. The interviewee, Cathy O’Neil, knows what she’s talking about, and she explains it well.  The unbiased sentencing program?  It didn’t work out so great–see a very detailed story about it here.

Link to the interview with Cathy O’Neil:


French notes:

  • le big data: Big Data.
  • les mégadonnées: Big Data.
  • les données massives: Big Data.

English notes:

  • to sentence to (a punishment): to assign a punishment or penalty to someone.  Examples: A 46-year-old man threw feces in a Clark County, Ohio, courtroom Wednesday after learning he was being sentenced to 40 years in prison for armed robbery.  (Story here.)  Alan Turing, the pioneering computer scientist and cryptanalyst who cracked the Nazis’ Enigma code, was sentenced to chemical castration as a punishment for his homosexuality.
  • sentencing guidelines: instructions for how to determine the length of the jail or prison sentence of someone who has been convicted of a crime.  How it was used in the post: More commonly, technology gee-whizzers will tell you the opposite: that they will remove bias and introduce complete objectivity to sentencing guidelines, for instance. 

2 thoughts on “Weapons Of Math Destruction: Cathy O’Neil on how people go wrong with Big Data”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues

%d bloggers like this: