
Recently we’ve been talking about questions—in particular, some of their weird social dynamics (see this, too) and how you can get computers to answer them. In one post, we talked about the fact that you can get computers to answer factoid questions, but the computer has to be able to figure out what you’re asking about. In that post, we talked about how the computer can help itself to do this by figuring le focus and le thème of the question—for example, that When was Mozart born? is looking for some expression of time.
Another thing that computers can do is to recognize what we call named entities in the question. A named entity is a mention of some specific semantic class of things. For example, Mozart is a named entity, specifically a person; National Institutes of Health is a named entity, specifically an organization; Paris is a named entity, specifically a place. If a computer knows that something in a question is a named entity, then it knows that it is likely to find the answer to a question in a sentence that contains that named entity. Here’s the intuition behind the approach: given a question like When was Mozart born?, we don’t care that much about sentences that contain the words when or was, but sentences that contain Mozart might be useful to us. The way that the computer can tell that it should care about Mozart more than it cares about when or was is by recognizing that Mozart is a named entity.
In a previous post, we ran across the term repérage d’entités nommées, meaning “named entity recognition” (more literally, spotting or finding). Here’s another way of saying the same thing, from the French Wikipeda page on named entity recognition:
• La reconnaissance d’entités nommées: named entity recognition.
La reconnaissance d’entités nommées est une sous-tâche de l’activité d’extraction d’information dans des corpus documentaires. Elle consiste à rechercher des objets textuels (c’est-à-dire un mot, ou un groupe de mots) catégorisables dans des classes telles que noms de personnes, noms d’organisations ou d’entreprises, noms de lieux, quantités, distances, valeurs, dates, etc.
“Named entity recognition is a sub-task of the extraction of information in document corpora. It consists of searching for textual obects (that is to say, a word, or a group of words) categorizable into classes such as names of persons, names of organizations or enterprises, names of places, quantities, distances, values, dates, etc.”