France’s ability to keep on keeping on after the Paris attacks has been amazing. In that spirit, here’s a post about something other than how horrified I am.
In a recent post, we talked about factoid questions–questions that typically start with words like who, what, when, or where, and typically have answers that are just a short phrase. We’re pretty good at getting computers to answer those kinds of questions.
Once upon a time, the assumption in trying to get computers to answer questions (which we’re going to call question-answering in English, or questions-réponses in French) was that there was a database that contained the answers, and you were going to get the computer to process the question in such a way as to retrieve the answer from the database. Today, the assumption is that there is a web page somewhere that has the answer. So, how do you get to that answer?
The first step in question-answering is usually to figure out what kind of question you’re dealing with. This lets your system know what kind of answer it should be looking for.
Where is Paris? and Where is the spleen? call for very different kinds of answers. On the other hand, Where is the capital of France? and What is the capital of France? need the same kind of answer. So, it’s not as simple as just checking whether the question starts with who, what, when, or where. (Of course, there are many other ways that you can ask a factoid question—When was Mozart born? could more or less equivalently be asked as What year was Mozart born? You can see how difficult this can get.)
The French Wikipedia page on questions-réponses talks about some things that are helpful in making these kinds of distinctions between question types (and types of expected answers), and of course Zipf’s Law comes into play, so we’ll need to learn some new words (or, at least, I will–I don’t know about you):
- le focus: As far as I can tell, this is an unassimilated English loan word that means “focus.” Le focus d’une question correspond à la propriété ou l’entité recherchée par la question. “The focus of a question corresponds to the property or the entity sought by the question.”
- Le thème: theme, subject, or topic. Le thème de la question (ou topic) est l’objet sur lequel se porte la question. “The theme of the question (or topic) is the thing that the question is about.”
|Who is the president of Benin?||Who||the president of Benin|
|When was Mozart born?||When||Mozart|
|When do cells divide?||When||cells|
|How much does a kimono cost?||How much||a kimono|
|How much does an elephant weigh?||How much||an elephant|
You can see from even a few examples that this is hard for a computer to do. When was Mozart born? requires a very different answer from When do cells divide?, despite the fact that the focus looks the same in both questions. Similarly, How much does a kimono cost? and How much does an elephant weigh? have focuses (foci?) that look the same, but they require very different types of answers. However, determining the focus and the theme of a factoid question are a good start. We’ll see in another post how identifying what are known as named entities can help to refine our understanding of the question.
3 thoughts on “How to get a computer to answer a factoid question”
not sure where you’re going with this, but I’m interested so far …
I feel interest in this topic.