Randomly Googling Zipf’s Law, I came across this web page that talks about one aspect of the significance of Zipf’s Law for natural language processing–that is, getting computers to deal with human language.
The page is on the web site for A.L.I.C.E., a computer program that uses frequently-occurring patterns to give the appearance of understanding, and replying to, things that are “said” to it in English. The page points out that for A.L.I.C.E., there’s an advantage that comes from Zipf’s Law: it means that a relatively small number of patterns encoded into A.L.I.C.E. allow it to process a very large percentage of the things that people say to it. Here are the most common things that people “say” to A.L.I.C.E.:
531 WHAT IS YOUR NAME
352 WHAT IS MY NAME
171 WHAT IS UP
137 WHAT IS YOUR FAVORITE COLOR
126 WHAT IS THE MEANING OF LIFE
122 WHAT IS THAT
102 WHAT IS YOUR FAVORITE MOVIE
92 WHAT IS IT
75 WHAT IS A BOTMASTER
70 WHAT IS YOUR IQ
59 WHAT IS REDUCTIONISM
(I don’t know what the total count is–it would be nice if the web page gave percentages.) What is What is reductionism doing there? I’m guessing that it’s because A.L.I.C.E. is presented as an artificial intelligence application, and reductionism is a theoretical topic in artificial intelligence. (Here’s Neil Rowe‘s take on reductionism: “Perhaps the key issue in artificial intelligence is reductionism, the degree to which a program fails to reflect the full complexity of human beings. Reductionism includes how often program behavior duplicates human behavior and how much it differs when it does differ. Reductionism is partly a moral issue because it requires moral judgments. Reductionism is also a social issue because it relates to automation.”) Apparently a lot of geeks like to talk to A.L.I.C.E.–either that, or there are hella people in the world that are interested in reductionism.
Of course, the flip side of Zipf’s Law for natural language processing is that an enormous number of the inputs to your program will only occur very infrequently, and it’s going to be very difficult to cope with all of those. Zipf’s Law cuts both ways.
Here are some words that I didn’t know on the French Wikipedia page about artificial intelligence:
- se vouloir: to claim to be. L’intelligence artificielle est le nom donné à l’intelligence des machines et des logiciels. Elle se veut discipline scientifique recherchant des méthodes de création ou de simulation de l’intelligence. “Artificial intelligence is the name given to the intelligence of machines and computer programs. It claims to be a scientific discipline researching methods of creation or simulation of intelligence.”
- abréger: to shorten, abbreviate, abridge, summarize; to make (something) fly by. Le terme « intelligence artificielle », créé par John McCarthy, est souvent abrégé par le sigle « I.A. » (ou « A.I. » en anglais, pour Artificial Intelligence). “The term ‘artificial intelligence,’ created by John McCarthy, is often abbreviated by the acronym ‘I.A.’ (or ‘A.I.’ in English, for Artificial Intelligence).”