How to irritate a linguist, Part 4

Here’s the closest that we come to “complexity” in linguistics: take a big sample of some language. Build a statistical model of the conditional probabilities of all two-word sequences (“conditional” probability is the probability of some word Y given that the preceding word was X. The conditional probability of dog is higher if the preceding word is my than if the preceding word is artichoke). For that statistical model, you can calculate something something called perplexity. It’s as close as linguists come to having any notion of “complexity” of language. Here’s a bit of the Wikipedia page on perplexity:

In natural language processing, perplexity is a way of evaluating language models. A language model is a probability distribution over entire sentences or texts.

Using the definition of perplexity for a probability model, one might find, for example, that the average sentence x_i in the test sample could be coded in 190 bits (i.e., the test sentences had an average log-probability of -190). This would give an enormous model perplexity of 2¹⁹⁰ per sentence. However, it is more common to normalize for sentence length and consider only the number of bits per word. Thus, if the test sample’s sentences comprised a total of 1,000 words, and could be coded using a total of 7.95 bits per word, one could report a model perplexity of 2^7.95 = 247 per word. In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.

Sheesh.

3 thoughts on “How to irritate a linguist, Part 4”

	Anonymous on The many ways to spell “…
	Anonymous on Nightmare after nightmare: How…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …

	Anonymous on The many ways to spell “…
	Anonymous on Nightmare after nightmare: How…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …

Share this:

3 thoughts on “How to irritate a linguist, Part 4”

Leave a comment Cancel reply