How to irritate a linguist, Part 4

The conditional probability of “dog” is higher if the preceding word is “my” than if the preceding word is “artichoke.”

Screen Shot 2017-11-27 at 18.59.13Screen Shot 2017-11-27 at 18.59.24

Here’s the closest that we come to “complexity” in linguistics: take a big sample of some language.  Build a statistical model of the conditional probabilities of all two-word sequences (“conditional” probability is the probability of some word given that the preceding word was X.  The conditional probability of dog is higher if the preceding word is my than if the preceding word is artichoke).  For that statistical model, you can calculate something something called perplexity.  It’s as close as linguists come to having any notion of “complexity” of language.  Here’s a bit of the Wikipedia page on perplexity:

In natural language processing, perplexity is a way of evaluating language models. A language model is a probability distribution over entire sentences or texts.

Using the definition of perplexity for a probability model, one might find, for example, that the average sentence xi in the test sample could be coded in 190 bits (i.e., the test sentences had an average log-probability of -190). This would give an enormous model perplexity of 2190 per sentence. However, it is more common to normalize for sentence length and consider only the number of bits per word. Thus, if the test sample’s sentences comprised a total of 1,000 words, and could be coded using a total of 7.95 bits per word, one could report a model perplexity of 27.95 = 247 per word. In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.

Sheesh.

2 thoughts on “How to irritate a linguist, Part 4”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s