Can computers process language any better than humans? Probably not.

Natural language processing (sometimes also known as text mining or computational linguistics, although I disagree with that last one) is the use of computers to process language in some way, such as finding names of businesses, rating reviews as positive or negative, summarizing news stories, etc.

Natural language processing is often done with an approach called machine learning. Machine learning is a set of techniques for letting computers “learn” for themselves how to classify things (e.g., is this word a name or not? Is this review positive, or not? Does this sentence belong in a summary of this news story, or not?), versus having humans write explicit rules for the computer about how to classify things.

In natural language processing in general and machine learning approaches to natural language processing in particular, we often use data that has been labelled by humans with the correct answers. These humans are called “annotators.” For various reasons, we often compute the agreement between the annotators–if two annotators look at the same things, how often do they agree about the classification?

I sent the preceding text, along with the following question, to a company that offers answers to arbitrary questions.  They’re called Wonder:

I’d like as many citations as you can find to support the assertion that inter-annotator agreement is probably an upper bound on possible system performance in natural language processing.  Please give me the citation, and a text snippet that demonstrates that the citation does make this claim.

I was told in grad school that it is probably not possible for a computer program that does natural language processing (and probably machine learning more broadly) to perform at a rate that is higher than the inter-annotator agreement for the task. Do I have any good citations for this assertion? No. Please find me as many as you can. I’d like the citations, along with text snippets from the citations that show that they’re really making this assertion. Bonus points for Wonder if you give me the citations in BibText format!

Here’s what I got back from Wonder:

1) “It is generally agreed that human inter-annotator agreement defines the upper limit on our ability to measure automated performance…”

  • Resnik, Philip, and Jimmy Lin. “Evaluation of NLP Systems.” The handbook of computational linguistics and natural language processing 57 (2010): 271.

2) The above source includes a citation from Gale et al. 1992: “our ability to measure performance is largely limited by our ability [to] obtain reliable judgments from human informants.”

  • Gale, William, Kenneth Ward Church, and David Yarowsky. “Estimating upper and lower bounds on the performance of word-sense disambiguation programs.” Proceedings of the 30th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1992.

3) “The average inter-annotator agreement indicated by the kappa values of 0.66 for surface understanding and 0.64 for conceptual understanding should be seen as upper bounds on the accuracy of any classifier.”

  • Ormandjieva, Olga, Ishrar Hussain, and Leila Kosseim. “Toward a text classification system for the quality assessment of software requirements written in natural language.” Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting. ACM, 2007.

4) “Recent estimations of the inter-annotator agreement when using the WordNet inventory report figures of 72.5% agreement in the preparation ofhe English all-words test set at Senseval-3 (Snyder and Palmer, 2004) and 67.3% on the Open Mind Word Expert annotation exercise (Chklovski and Mihalcea, 2002). These numbers lead us to believe that a credible upper bound for unrestricted fine-grained WSD is around 70%, a figure that state-of-the-art automatic systems find it difficult to outperform.”

  • Navigli, Roberto. “Meaningful clustering of senses helps boost word sense disambiguation performance.” Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.

5) “Human judgments of semantic relatedness provide a gold standard for evaluating the results of automatic methods. The inter-annotator agreement defines an upper bound for the evaluation of automatic methods (Resnik, 1995).”

  • Gurevych, Iryna, and Hendrik Niederlich. “Computing semantic relatedness in german with revised information content metrics.” Proceedings of” OntoLex. 2005.

6) “We rather measured the agreement between our algorithm M and both the human annotators A and B. Besides the inter-annotator agreement A–B, which serves as an upper bound, we tried the naive baseline approach 0 that always chooses the first target word sense.”

  • Meyer, Christian M., and Iryna Gurevych. “Worth its weight in gold or yet another resource—A comparative study of Wiktionary, OpenThesaurus and GermaNet.” Computational linguistics and intelligent text processing. Springer Berlin Heidelberg, 2010. 38-49.

7) “We also compared our results to the upper bound given by the inter-annotator agreement on the calibration data set.”

  • Padó, Sebastian, and Mirella Lapata. “Cross-linguistic projection of role-semantic information.” Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005.

As you can see, most of the citations I can find basically make the same assertion (that machine learning cannot outperform inter-annotator agreement) and treat it as fact, without giving much evidence for that assertion. I hope this information was helpful!  –Alexandra G.

I was pretty pleased with these citations.  I should note that Alexandra G. also gave me links to all of the citations.  It’s also important to point out that if you’re going to use Wonder, you need to be careful about (a) what kinds of questions you ask–see their webs site for the kinds of questions that they feel they can be helpful with–and (b) ask for exactly what you want.  It took me several tries to get it right; after speaking with someone on their team, I now know how to formulate questions (and the types of questions that I can ask) to get what I’m looking for.  If someone reading this has the evidence that the Wonder person pointed out doesn’t seem to be out there, it would be great if you could add it to the Comments.

Zipf’s Law and burning at the stake

Cathars being burnt alive at Montségur.  Picture source:
It’s the end of the peak publishing period in France–between late August and early November–and the beginning of the season of literary prizes.  Mathias Enard took the Goncourt prize yesterday for his novel Boussole (“compass; barometer (figurative), indicator”–definitions from, and this morning all of the guests on my news program were writers.  One of them read an essay on the subject of death as a revolutionary act (my favorite French news show is even more cerebral–like, by a long shot–than National Public Radio, the most intellectual of American news shows).  Along the way, he talked about the shift from cremation in the the classical world to burial in the Christian world, pointing out that in the Christian world, only witches and heretics die at the stake.  (If you’re reading this and are not a native speaker of English: the word “stake” typically means a piece of wood that has been sharpened at one end–what you kill a vampire with, right?  If the stake is stuck in the ground and someone is tied to it and burnt alive, it’s called “THE stake.”)

It was a perfect Zipf’s Law moment.  The word for “the stake” is le bûcher.  Is that an obscure word, in the sense of being one that most people wouldn’t know?  Not at all.  Is it a common word?  Not at all.  Why the hell would anyone know this word in a foreign language, though?  Last summer or so, I chanced upon my father reading Le bûcher de Montségur, the classic book on the siege of the Cathars at the Montségur fortress in southern France.  (Yes, there is a classic book on the siege of the Cathars at the Montségur fortress, in English, as well as in French.)  The Cathars were considered heretics, and when they surrendered to royalist troops, about 220 of them were burnt to death in a massive bonfire.  I didn’t know what the word bûcher in the title of the book meant, so I looked it up, then duly memorized it, despite my confidence that I would never see it again.  Over a year later, it’s literary award season, and I run into it again, on the morning news…  Zipf’s Law at its best.

So, in the spirit of the literary prize season, let’s look at some meanings of bûcher, both as a noun and as a verb.  Definitions from

  • le bûcher (tas de bois où on brûle les morts): funeral pyre.  This is presumably the sense in Le bûcher de Montségur.
  • le bûcher (tas de bois où on exécutait les coupables): stake.
  • le bûcher (abri pour bois): woodshed.
  • le bûcheron: lumberjack, woodcutter.  (Like that derived noun?)
  • bûcher: to work your butt off.  (In Quebec, it can also mean “to log, fell trees; to chop wood.”)

If you’re reading this and you’re French: please note that to us Americans, the word bûcher (stake, funeral pyre) and boucher (butcher) sound the same, so please have mercy on us if we say one when you’re sure we mean the other.  Feel free to correct our pronunciation, though–it’s good for us.

Why Paris might have more in common with Manhattan than it does with Clichy-sous-Bois

A cover of France-Amérique, a magazine for French lovers of America. The title of the cover story is "The American heart: an investigation of philanthropy." Picture source:
Lately I’ve been obsessed with the general incorrectness of the common American belief that the French look down on everything about us.  A couple days ago, I talked about the popularity of English words in France.  The topic of French attitudes about America came up again this morning.  Listening to the news on the way to work, there was a long segment on the banlieus défavorisés of France–the poor suburbs where much of the French underclass lives.  2015 is the 10th anniversary of the 2005 riots, which were very much a feature of the banlieus (unlike, say, the student riots of 1968, which were very much an urban phenomenon).  There was a guest who had been invited to talk about his theories of the geographic aspects of the banlieus.  His take on it is that part of what makes life in the banlieus what it is is that they have very little in the way of public transportation–as Wikipedia puts it in its article on the infamous Clichy-sous-Bois banlieu, where the 2005 riots started, “Clichy-sous-Bois is not served by any motorway or major road and no railway and therefore remains one of the most isolated of the inner suburbs of Paris.”  So, you have this paradox that Clichy-sous-Bois is maybe 10 miles from Paris, but has less in common with life in Paris–culturally, politically, economically–than Manhattan does.  Here, the guest saw the situation in America much more positively–his take was that in America, the low-income areas are mostly urban, not suburban; they have the same public transportation as the rest of the city does, and therefore the residents of an American ghetto have the same access to universities, museums, etc., as more well-off residents of the city do.  Obviously, it’s more complicated than that–we have seriously low-income and culturally disconnected little towns scattered throughout Appalachia and elsewhere, and if you live in a poor urban area of an American city, you probably have other obstacles to your access to universities, museums, etc., besides transportation.  But, the guest was right in that the geographic facts that keep the residents of the banlieus isolated from the rest of French life don’t generally have the same ill effects in an American urban ghetto.

Public transportation (transport un commun) is an important aspect of life in France–let’s look at a little bit of vocabulary from the French Wikipedia page on the subject (translations from

Le transport en commun, ou transport collectif, consiste à transporter plusieurs personnes ensemble sur un même trajet. Il est généralement accessible en contrepartie d‘un titre de transport comme un billet, ticket ou une carte.

  • le trajet: journey; plane flight; car or bus ride
  • en contrepartie de: in return for, in exchange for
  • le titre de transport: ticket

What’s the difference between billet and ticket?  No clue.  Perhaps someone can tell me in the Comments section?

Resources for learning French: One Thing In A French Day

From the home page of the
The world is full of books, YouTube videos, etc. for people who speak English and want to start learning French.  It’s harder to find materials that are suited for somewhat advanced students of the language.  It turns out, however, that there are some very good resources out there, if you can find them.

One that I think is good for people at about the intermediate level is the podcast One thing in a French day.  The podcast consists of a short, read essay (i.e., it was written and then read out loud, versus being spontaneous speech) about some thing or another in the podcast creator’s day.  Perhaps Laetitia buys a new printer.  Maybe she meets a friend at a patisserie for coffee and a pastry.  Maybe one of her daughters loses her Pass Navigo.  Whatever it is, Laetitia tells you about it, and Zipf’s Law strikes–you are quite likely to learn some new words in every podcast.

The podcasts are free, and a transcription of the beginning of the podcast is available on the web site, also gratis.  New ones come out 2-3 times a week.  For 3 euros a month, Laetitia will email you a full transcript of every podcast, along with some grammatical or vocabulary points of interest, and will answer questions.

As I said, Laetitia reads the essays, and her pronunciation is quite clear.  This makes One thing in a French day quite good for intermediate students, but possibly not as challenging as it could be for advanced students.  (Laetitia does point out that Il est vrai que c’est un texte lu, par contre je ne fais pas d’effort particulier pour ralentir le débit. Le rythme est mon rythme naturel.  “It’s true that it’s a read text.  On the other hand, I don’t make any particular effort to reduce the speed.  The rhythm is my natural rhythm.”) Still, no matter what level you are at, you will learn stuff from the  podcasts, and it’s nice to keep up with what’s going on with Laetitia and her family, as well as to get a little glimpse into the life of a normal French family.  To give you the flavor of the podcast, here are some words that I learnt from the most recent one (definitions from

  • entrecoupée par: broken up by.
  • ensoleillé: sunny, bathed in sunlight.
  • la voie verte: I had to write to Laetitia herself for this one.  Here’s her answer: Pour répondre à votre question : La voie verte est une ancienne voie de chemin de fer entre Chalon-sur-Saône et Mâcon qui a été goudronnée et qui est maintenant réservée aux piétons, aux vélos, aux fauteuils roulants ou aux rollers. Il y a plus de quarante kilomètres de promenade.
  • prendre le pli: to get used to something, to get into the groove of something.  Prendre le pli de faire qqch: to get into the habit of doing something.
  • perché: perched, sitting on.

Nous avons fait d’autres belles visites pendant cette semaine en Bourgogne, entrecoupée par deux voyages à Lyon et de longues promenades ensoleillées sur la voie verte. Lisa est une bonne marcheuse, une fois qu’elle a pris le pli. Son record sur la voie verte : sept kilomètres.

Nous avons visité le site médiéval de Brancion. Un petit village perché sur une colline.

That’s 5 words in the first 5 sentences–as I said, you will learn stuff from One thing in a French day! 

It’s tough to be dispositive about dispositif

In which a French word means nothing like the English word that it looks like, and I remain puzzled even after looking it up.

dispositif?” Read this post and you’ll see why it strikes me as funny. Picture source: cover of the book by Giorgio Agamben.” The title of the book means “what is a dispositif?” Read this post and you’ll see why it strikes me as funny. Picture source: cover of the book by Giorgio Agamben.

In France, I often ran across the word dispositif, but somehow never got around to looking it up.  I see now why I had trouble even guessing at its meaning–besides looking like an English word that it has nothing in common with semantically, it has a number of quite different meanings.  Here’s how it showed up in an email at work one day:

Nous avons commencé une expérience sur la collaboration et cohabitation des deux utilisateurs dans un même dispositif immersif (EVE) et on cherche toujours des participants.

In the context of where I work, it’s likely that the intended meaning is a “device, machine, apparatus,” as in this example from C’est un dispositif de chauffage très perfectionné “It’s a very sophisticated heating device.”

If you’re talking about the police or the military, it would translate as “presence,” as in this example from L’Etat a prévu un gros dispositif policier pour le prochain G20 “The government has organized a significant police presence for the next G20.”

It can also mean “plan.”  Here’s an example sentence, again from Le dispositif de défense aérienne est revu tous les ans “The air defense plan is reviewed every year.”

The meanings can be more diverse than that, though, including things like “measures,” “system,” and others.  Here are some examples from

  • Pour les véhicules équipés d’un dispositif antiblocage…  For vehicles with anti-lock systems… (Source:
  • Le Fonds monétaire international (FMI) participera au dispositif de financement et devrait fournir un montant correspondant à la moitié au moins de la contribution de l’UE.  The International Monetary Fund (IMF) will participate in financing arrangements and is expected to provide at least half as much as the EU contribution. (Source:
  • Ce sera probablement la mise en place d’un véritable dispositif de financement This will probably entail setting up a real funding scheme.  (Source:

So, what’s the English word that I was confusing this with?  Dispositive.  Something is “dispositive” if it brings something to a resolution–it “disposes” of the issue, in essence.  Here are some examples from the enTenTen corpus (19.7 billion words of English):

  • The dispositive issue in these cases, simply put, is whether, for purposes of allocating its finite resources, a state has a legitimate reason to differentiate between persons who are lawfully within the state and those who are unlawfully there.
  • First, particularly in a highly hierarchical employment setting such as law enforcement, whether or not the employee confined his communications to his chain of command is a relevant, if not necessarily dispositive, factor in determining whether he spoke pursuant to his official duties.
  • To the recently admitted student: embrace your cultural heritage, and know that test scores and GPA were not dispositive factors in your acceptance.
  • One data point is not dispositive.
  • The Court found it dispositive, for instance, that 1-40-121 did not regulate candidate elections, and that the risk of corruption so prevalent in such elections was minimal in the initiative context.

The term for term is terme

The technical terminology of kitchen sinks. Picture source:
The technical terminology of kitchen sinks. Picture source:

In the United States, many people have the conception that France is somehow opposed to the English language.  This couldn’t be further from the truth.  Sprinkling your French with English is considered cool and au courant; so many French singers now record in English that it’s increasingly difficult for French radio stations to find French-language music to play; and you see advertisements on TV in English in France more than you would believe.  (One morning in Paris this summer, I had the news on the TV while I was eating breakfast.  As usual, I was struggling pretty hard to understand anything.  Suddenly, I was understanding everything, and the past year and a half of intensive French study had clearly paid off, and I was finally, finally, getting it.  Then I realized: I was hearing an advertisement, and it was in English.  Sigh!)

As you might suspect, the area where the greatest incursion of English into French happens is in technical terminology.  The leaders in creating French-language equivalents for English technical terms are actually not the French, but the Canadian folks at the Office Québécois de la langue française.  They maintain the Grand Dictionnaire Terminologique web site.  This is an on-line dictionary that lets you search for technical terms in a specific domain, or in all domains simultaneously.  Jean-Benoît Nadeau and Julie Barlow say in their book The story of French that the French Academy’s web site gets two million hits a year, while the Grand Dictionnaire Terminologique gets fifty million hits a year.  Quebec’s work in keeping French terminology up-to-date and a viable alternative to English terminology has been adopted as an approach by countries all over the world.

  • le terme: term, word; also term, date, or limit.
  • la terminologie: terminology, in the sense of specialized vocabulary.
  • le vocabulaire: vocabulary.
  • le lexique: lexicon, vocabulary; glossary; small pocket bilingual dictionary or phrase book.  I think it’s also the set of words in a text, but I can’t prove that right at this moment.

(Yes, the title of this blog post is an Ursula K. Le Guin reference: The word for world is forest.)