The Poisson distribution describes the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event (definition from Wikipedia.com). Who cares? As Wikipedia puts it, with some highlighting by me: The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. How many such events will occur during a fixed time interval? Under the right circumstances, this is a random number with a Poisson distribution. If you’ve been reading this blog for a while, you know that (a) a language has a lot of words, and b) most of the words in a language are rare–that’s why we can use Zipf’s Law to describe the distribution of words in a language, and that’s why I write this blog, which keeps track of the obscure words that I learn in the course of my day. (Just some of them–there are far too many in any given day for me to track them all.) So, you could imagine using the Poisson distribution to predict things like how many new words I will run into today.
There are many practical applications of the Poisson distribution. For example, most of my colleagues work with genomic data of one sort or another. Say you’re looking at the number of mutations in a particular stretch of DNA. Mutations are rare. You have a stretch of DNA that you think has a lot of mutations, and you think that you know what caused them. Before you draw conclusions about whether or not the mutations were, in fact, caused by that, you need to be sure that the stretch of DNA couldn’t have acquired that large (you think) number of mutations by chance. The Poisson distribution lets you assign a probability of that number of mutations occurring by chance in that one stretch of DNA. If the Poisson distribution suggests that the probability of that number of mutations occurring by chance is greater than, say, 5%, then you probably shouldn’t draw the conclusion that you were considering concerning what caused it. On the other hand, if the Poisson distribution suggests that the probability of that number of mutations occurring by chance is, say, 0.00001%, then you may be onto something. Poisson distributions have been used in many fields; the most famous application was a study of the number of Prussian soldiers killed by horse-kicks. Suppose that you suddenly have a large number of soldiers being killed by getting kicked by horses. Do you need to be training your soldiers differently? Has someone been selling you lousy horses? If the incidence of deaths by horse-kicks follows a Poisson distribution (and deaths by horse-kick are rare events that are presumably independent of each other, so they do follow a Poisson distribution), then you can calculate the probability of the aforementioned large number of horse-kick deaths having occurred by chance. If the probability of them having occurred by chance is large, then you probably don’t need to retrain your soldiers or start looking for a lousy horse-dealer. If the probability of them having occurred by chance is low, then you might want to look into retraining your soldiers, or reconsidering your horse-buying practices, or whatever. (I don’t know how the study turned out–see this Wikipedia page for a reference to the book.)
One of the practical consequences of the Poisson distribution is that even rare events will occasionally occur together. The classic example: three rock stars die in the same month. Here are some of the rock stars who died last month (January 2016):
…and there’s your classic three-rock-stars-in-one-month phenomenon. Actually, it’s even weirder—three rock stars actually died on one day that month. January 17th, 2016 saw the loss of Blowfly, Mic Gillette, and Dale Griffin.
What’s going on? Is someone killing off the rock stars of the Anglophone world? Probably not–the Poisson distribution tells us that such events, which are both rare and independent, will sometimes occur in bursts, despite their rarity and independence.
Some implications for the world of Zipf’s Law:
- I have to admit that I’ve been mischaracterizing the Poisson distribution somwhat in previous posts. Briefly: I’ve been ignoring the independence assumption. More on that later, because it’s a really big deal in language in general.
- When you’re learning a second language, you’re going to have some good days and some bad days. On the bad days, you’re going to run across a lot of words that you don’t know. The Poisson distribution tells you to not get down on yourself about this fact: it’s just the nature of rare events (including words) to show up in clusters sometimes.
- All of these dead rock stars have brought a new word into my life: la disparition. As you probably know, this can mean “disappearance.” What you might not be aware of is that it can also mean “death, passing,” or “demise.” So, on the radio this morning, the host of Les Matins de France Culture was talking about la disparition of Umberto Eco.
Reviewing some relevant vocabulary (definitions from WordReference.com):
- disparaître: to disappear; to die out.
- disparu (adj.): vanished
- le disparu: missing person; the deceased.