How to get a computer to answer a factoid question

Computers can now answer factoid questions–if they can tell what’s being asked…

architecture_question_reponse
An architecture for a question-answering system. Picture source: https://commons.wikimedia.org/wiki/File:Architecture_question_reponse.png (By Tinmn (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)%5D, via Wikimedia Commons)

France’s ability to keep on keeping on after the Paris attacks has been amazing.  In that spirit, here’s a post about something other than how horrified I am.

In a recent post, we talked about factoid questions–questions that typically start with words like who, what, when, or where, and typically have answers that are just a short phrase.  We’re pretty good at getting computers to answer those kinds of questions.

Once upon a time, the assumption in trying to get computers to answer questions (which we’re going to call question-answering in English, or questions-réponses in French) was that there was a database that contained the answers, and you were going to get the computer to process the question in such a way as to retrieve the answer from the database.  Today, the assumption is that there is a web page somewhere that has the answer.  So, how do you get to that answer?

The first step in question-answering is usually to figure out what kind of question you’re dealing with. This lets your system know what kind of answer it should be looking for.  
Where is Paris? and Where is the spleen?  call for very different kinds of answers.  On the other hand, Where is the capital of France? and What is the capital of France? need the same kind of answer.  So, it’s not as simple as just checking whether the question starts with who, what, when, or where. (Of course, there are many other ways that you can ask a factoid question—When was Mozart born? could more or less equivalently be asked as What year was Mozart born? You can see how difficult this can get.)

The French Wikipedia page on questions-réponses talks about some things that are helpful in making these kinds of distinctions between question types (and types of expected answers), and of course Zipf’s Law comes into play, so we’ll need to learn some new words (or, at least, I will–I don’t know about you):

  • le focus: As far as I can tell, this is an unassimilated English loan word that means “focus.”  Le focus d’une question correspond à la propriété ou l’entité recherchée par la question.  “The focus of a question corresponds to the property or the entity sought by the question.”
  • Le thème: theme, subject, or topic.  Le thème de la question (ou topic) est l’objet sur lequel se porte la question.  “The theme of the question (or topic) is the thing that the question is about.”
Question Focus Theme
Who is the president of Benin? Who the president of Benin
When was Mozart born? When Mozart
When do cells divide? When cells
How much does a kimono cost? How much a kimono
How much does an elephant weigh? How much an elephant

You can see from even a few examples that this is hard for a computer to do.  When was Mozart born? requires a very different answer from When do cells divide?, despite the fact that the focus looks the same in both questions.  Similarly, How much does a kimono cost? and How much does an elephant weigh? have focuses (foci?) that look the same, but they require very different types of answers.  However, determining the focus and the theme of a factoid question are a good start.  We’ll see in another post how identifying what are known as named entities can help to refine our understanding of the question.

What’s making us cry today

For context, (a) 9/11 didn’t make me cry per se; I was shocked, I was horrified, I gave blood, but I didn’t cry; and (b) I’m a middle-aged American male and hence have been raised from childhood not to cry.  But, this blog post brought me to tears.  It’s an excellent French lesson from the Lawless French web site.  It’s more or less a perfect lesson for intermediate-to-advanced students of French who already speak English–it includes a video segment, the French transcription, the English translation, a vocabulary section, a grammar points section, and links to further reading.  You couldn’t ask for more.

The tears-inducing part is that the lesson is centered completely around Hollande’s address to the French people right after the Friday 13 November 2015 terrorist attacks in Paris.  What got me was the contrast between the topic of the material and the topic of the material in French lessons when I was taking French 101 in college.  We had lessons centered around eating in a restaurant, renting an apartment, or meeting interesting French people (a little unrealistic, since Americans in France practically never meet French people on a personal level, but perhaps that’s part of its charm).  That was French language instruction when I was a younger man–French instruction today seems often to be about terrorism.  (If you look back through the previous year and a half’s worth of posts on this blog, you’ll find a disturbing number about vocabulary that I learnt in the context of news story about terrorist attacks.)  This is not the world that I grew up in…

Some vocabulary from Lawless French’s lesson on expressing sorrow and regret:

  • être désolé(e) de qqch: to be sorry about something.  Desole de ne pas vous avoir donne des nouvelles pendant un mois mais l’explication de ce sejour prolonge sera bientot l’objet d’un prochain article.  [NOTE: the author didn’t use any accents.]  “Sorry not to have given you any news for a month, but the explanation of this prolonged stay will soon be the object of a forthcoming article.”  (Source:tour-du-monde-autostop.fr)
  • être navré(e) de qqch: to be sorry about something.  Étant député d’une région où le taux de chômage est relativement élevé, je suis extrêmement navré de constater que ce gouvernement laisse tomber toute une catégorie de la population. “As a member of parliament for a region where the unemployment rate is relatively high, I am greatly distressed to see that this government is ignoring a whole category of the population.” (Source:www2.parl.gc.ca, via linguee.fr)  Je suis navré d’apprendre la mort de L.B. Je le connaissais depuis plus de 30 ans.I am very sorry to learn of the death of L.B. I knew him for more than 30 years.”

In American English, “I’m sorry” can be an admission of guilt (“I’m sorry I broke your foot”) or an expression of sympathy (“I’m sorry you broke your foot”).  As far as I can tell, désolé tends more towards the former, but also works for the latter.  Navré seems to be more for the latter.  I haven’t found firm native speaker judgements about this; WordReference.com gives examples of both for désolé.

 

What’s surreal today

Surreal is when you’re memorizing kitchen vocabulary and find out that what you really need to know how to say is “rocket launcher.”

2015-11-16 16.49.31
PIcture source: screen shot of my cell phone.

Random words that showed up on the screen of my cell phone yesterday as France continues to deal with the aftermath of the 11/13 attacks.  Definitions from WordReference.com.  It seems surreal that lately I’ve been trying to memorize the name of every single thing in my kitchen, while these are apparently the words that I need:

  • la perquisition: police search, police raid
  • le lance-roquettes: rocket launcher

 

One extra thing this time

Sign in Narita Airport baggage claim. Photo source: me.

This is my tenth trip to Japan, so I more or less have the routine down. Turn left out of Arrivals and go around the corner to the ATMs. Figure out which machine both takes American cards and has an English-language interface, since you don’t speak Japanese. Turn around, go back around the corner the other way, and down the escalator. Walk past the cobbler (why the hell is there a cobbler in the airport?) to the Keisei ticket counter. Crap–remember that you’re taking JR this time.  Get to the JR ticket counter. Eigo-ga wakarimasu ka?  “(Giggle) a little.”  “Mishima, please.”  Buy red bean paste manju and a can of coffee.  (Love those coffee vending machines.)  Find that funny glassed-in waiting area that must have once been a smoking lounge and settle in to wait for the train.  Oh, one extra thing to do this time: check the news from Paris to see if there’s been any more news about the terrorist attacks while you were in the air. Fuck–the current bilan is 129 dead, 352 injured.  Dicks. 

I’m fine, as are my friends in Paris

Thank you to everyone who wrote to check on me.  I’m in the US, and am fine.  My family and friends in Paris are all accounted for. 

  • la fusillade: shootout, exchange of gunfire; fusillade.  Des fusillades ont éclaté vendredi soir vers 21h20 dans le Xe puis dans  le XIe arrondissement de Paris, près de la rue Bichat, puis au Bataclan, faisant 42 morts selon la préfecture de police, et des dizaines blessés.  “Shoot-outs broke out Friday evening around 9:20 PM in the 10th arrondissement of Paris, then in the 11th arrondissement, near rue Bchat, then at the Bataclan, killing 42 and injuring dozens, according to police headquarters.”  (Source: Liberation)

Um…

Peplums. Picture source: http://heartsandpurses.blogspot.com/2015/01/peplum.html
Peplums. Picture source: http://heartsandpurses.blogspot.com/2015/01/peplum.html

I’ve loved a small number of women and an even smaller number of dogs, and I’ve loved them a lot.  But, I’ve never loved anyone as much as I love the word peplum.  Wikipedia defines peplum as “a type of elongated hem resembling a short skirt, worn to lie over another garment, either another skirt such as a petticoat or underskirt, or breeches.”  Mind you, I don’t love peplums–they can go really, really wrong.  It’s the word that makes me smile.

So, you can imagine my excitement when I learnt that peplum (actually, péplum) is a French word.  It refers to a kind of movie–what we would call in English a sword-and-sandal movie.  That’s a genre of movie set in the classical world.  Once I met my love in French, though, I had a problem: how do I pronounce it?

You see, I always get thrown off by the pronunciation of French words that end with -um.  They sound odd–not typically French.  I’ve now looked up a number of them, and it appears that they systematically violate the normal rules of pronunciation of word-final vowel-nasal sequences.  Briefly, they get pronounced with a final [ɔm].

Why this is surprising: usually, if something is spelt in French with a vowel and then a nasal at the end of a word, it’s pronounced as a nasalized vowel.  For example (pronunciations from WordReference.com):

  • un: [œ̃] one; a, an
  • le poing: [pwɛ̃] fist
  • le sein: [sɛ̃] breast
  • le flan: [flɑ̃] pudding, custard
  • la faim: [fɛ̃] hunger, famine, desire

Furthermore, an m at the end of a word would not normally be pronounced as a consonant–it just nasalizes the vowel in front of it:

  • la pomme d’Adam: [pɔm dadɑ̃] Adam’s apple
  • le prête-nom: [pʀɛtnɔ̃] front man, figurehead

Also, the written vowel u is almost universally pronounced as the high tense front rounded vowel that is written as [y] in the International Phonetic Alphabet.  The only counterexample to this that I know of is -um at the end of words, actually, so I won’t even bother with examples.

I found a web page with the amazing title 1,006 French words ending in M.  If the page is to believed, the majority of French words that end in m actually end in um.  Many of them have the air of the Scrabble word about them, though, and don’t have pronunciations listed in the dictionary.  Here are the ones that I could find–as you can see, the pattern is consistent.

  • le péplum: [peplɔm] sword-and-sandals movie
  • le symposium: [sɛ̃pozjɔm] symposium
  • le forum: [fɔʀɔm] forum; fair, show, exhibition; discussion group (Internet); court
  • l’uranium (m.): [yʀanjɔm] uranium
  • le sébum: [sebɔm] sebum, smegma
  • le sternum: [stɛʀnɔm] sternum, breastbone

So, say the word peplum all you want, but think twice before you wear one–and be careful about asking for one in Paris, because if you ask for a sword-and-sandals movie in a clothing store, you’re going to get some funny looks.

To the store went Curious George

Curious George.  Picture source: http://www.curiousgeorge.com/~/media/sites/CG/Resources/Inspire_Curiosity_Kit.pdf
Curious George. Picture source: http://www.curiousgeorge.com/~/media/sites/CG/Resources/Inspire_Curiosity_Kit.pdf

I just sent this email to a coworker: The data sets are linked to from this article.  If you’re a native speaker of English, that sentence probably doesn’t look particularly unusual, and you probably agree that it could be paraphrased as There are some data sets.  There are links to the data sets.  The links are from. 

If you’re not a native speaker, though, you might well have stumbled on this sentence due to the sequence to from:

The data sets are linked to from this article.

How can to from possibly be meaningful?  It works because of one of the most important observations in the history of linguistics.  In a 1957 book, Verbal Behavior, the famous psychologist B.F. Skinner, the father of behaviorism, maintained that the syntax of English could be described by something called a finite state machine.  Finite state machines basically describe the possible sequences of something–words, in this case.  In a review that he wrote in 1959, Noam Chomsky poked a huge hole in Skinner’s claim.  He made the point that syntax is not about sequences, per se, at all–rather, it’s about structures.  The data sets are linked to from this article works because the sequence to from doesn’t figure into how that sentence is produced or interpreted.  Rather, the sentence is composed of a set of structures: [[The data sets] [[are [linked to] [from [this article]]]].  The sentence is not a sequence of words; to the extent that it’s a sequence of anything, it’s a sequence of phrases (syntactic structures), and it’s really not so much about the sequence of anything as it is about structure.

Within that sentence, linked to is what most people would call an idiom.  Rather than a preposition, it contains what English-language linguists call a particle.  (To see how it’s different from a preposition, consider these pairs of sentences:

  • Curious George went to the store.
  • To the store went Curious George.

No problem–both of those are perfectly acceptable English.  Now think about these:

  • Curious George linked to the blog post.
  •  * To the blog post linked Curious George.

The asterisk (*) is the linguist’s way of indicating that * To the blog post linked Curious George is completely unacceptable–you just can’t say that in English.  Why?  It’s because in go to, the to is a preposition.  In link to, it is a particle.  Particles and prepositions behave differently.  One of the ways in which they behave differently is that prepositions and their objects have some ability to move away from their verb, as in To the store went Curious George.  Particles can’t do that.  So, the native speaker of English has no problem with the to from in The data sets are linked to from this article–there really isn’t any other way that the native speaker can analyze it.  If you want more information on what I’ve called idioms (verb + particle), as well as other English constructions that use particles, see this article on the Linguistics Girl blog.

Chomsky went on to become the most important linguist of the 20th century.  Not the best linguist of the 20th century–but, unquestionably the most important linguist of the 20th century.  He redefined what the field was about, in almost every respect.  Remember this cartoon from a couple days ago?  It was about him.  I can tell you that only a vanishingly small number of linguists are the subjects of cartoons—we just don’t typically get very famous.  “Movement phenomena” like the examples that we saw in this post were a big part of the motivation for his theories, and there are movement phenomena in French, as well as in English.  We’ll see some movement phenomena in the formation of various types of questions, in particular.  But, more on that later–time for dinner.  For now, let’s just remind ourselves that the Curious George books were originally written in French.

Calling questioning into question

It bothers me that this isn't the most disturbing tattoo I've ever heard of.  Picture source: http://dandygoat.com/have-you-seen-anyone-with-a-huge-question-mark-tattoo-on-their-face-heres-what-its-about
It bothers me that this isn’t the most disturbing tattoo I’ve ever heard of. Picture source: http://dandygoat.com/have-you-seen-anyone-with-a-huge-question-mark-tattoo-on-their-face-heres-what-its-about

As we saw the other day, asking a question can put you in a position of power–it demands a response from another person.  Perhaps that’s why small children will ask why? over and over–it lets them make adults do something for a change–talk.

But, questions are more complicated than that.  Asking a question can also put you in a position of weakness–it’s an admission that there’s something that you don’t know.  It’s not an accident that you’ll rarely see the president of the US ask a question. You might use questions in both of these ways in a single context.  Maybe you’re in a meeting, and you ask a pesky underling a question about what they’ve accomplished that week–you’re showing that you’re above them in the hierarchy.  Then you ask your status-conscious boss a question that shows that you implicitly admit that they know more than you do.  Pay attention as you go about your day today, and see if you can guess what motivates people to ask the questions that they ask–what they gain by asking them, and what they surrender.

It’s sort of a stereotype that men are not as willing to ask questions as women, at least in America.  A quick Google search reveals no actual data on this, though, and I would be really surprised if there weren’t all sorts of interactions with many variables that go into determining something like this–is it a male/male, female/female, or male/female conversation; are we talking about behavior in a group, or behavior in private; are we talking about people with higher or lower status; people of the same age, or of different ages; maybe it’s different in different parts of the country.  And, do we count all questions equally?  Is an actual request for information to be counted the same as browbeating?  Is a perfunctory question to be counted the same as a genuine question?  When you start trying to count things in language, it can be a lot harder than you would think.

One of my linguistic themes these past few days has been trying to figure out the many ways to translate the word question from English into French.  Here’s one that you might not have seen before:

  • remettre en cause: to call into question.  Cela inclut la communication de nouvelles informations sur les propriétés dangereuses, ainsi que d’informations susceptibles de remettre en cause la pertinence des mesures de gestion des risques recommandées par le fournisseur.  This includes the communication of new information on the hazardous properties that become available as well as of information that may call into question the appropriateness of the risk management measures recommended by the supplier.  (From the linguee.fr web site.  Original source quidance.echa.europa.eu)

Don’t bother asking your computer “why” just yet

Picture source: screenshot of Google results.
Picture source: screenshot of Google results.

We’re pretty good at getting computers to answer certain kinds of questions.  A good example is what people in my field call factoid questions.  These are questions that have a clear answer, typically some sort of a noun, and usually pretty short.   They tend to start with words like who, what, when, or where.  What year was Mozart born?  Where was the first McDonald’s?  Who wrote Pride and Prejudice?  There’s been a ton of research on how to get computers to answer these kinds of questions, much of it organized by the National Institute of Standards and Technology’s annual Text Retrieval Conference.

Although we’re pretty good at getting computers to answer factoid questions, it’s still really difficult to get a computer to answer a why question.  Unlike factoid questions, whose answers are typically a short phrase, why questions are usually answered by an explanation or a procedure, and these are typically longer than a single phrase–Suzan Verberne, a researcher at Radboud University Nijmegen and an expert on why questions, found that answers to why questions are typically at least one full sentence in length, and can easily be as long as a paragraph.  We’re not nearly as good at finding these longer stretches of text as we are at finding those little “factoids.”

In English, the field of research that deals with getting computers to answer questions is called question-answering.  In French, it’s called questions-réponses.  Here is a verb that I learnt from the French Wikipedia page on question-answering:

  • se fonder sur qqch: to be based on something, to be built on something.

Les Systèmes de réponse à des questions explorent de nouvelles méthodes de recherche d’information exploitant des requêtes formulées à l’aide du langage naturel et non plus en se fondant uniquement sur des mots clés (comme c’est le cas avec les moteurs de recherches actuels).  “Question-answering systems explore new methods of information retrieval, using requests formulatd in natural language and not being based only on keywords (as is the case with current search engines).”

Why ask why?

Picture source: http://michigan.247sports.com/ImageUrl/1180443?View=Detailed
Picture source: http://michigan.247sports.com/ImageUrl/1180443?View=Detailed

Why do children who can barely speak yet ask why, when they probably can’t understand the answer?  Why do they ask it over and over again?

Linguists who have talked about this generally think of it as the child having discovered a novel power.  In general, small children can’t really do anything to the world.  Adults, on the other hand, can do pretty much anything–drive, operate a microwave, read, write, produce money, distribute candy, make little children go to bed/get dressed/get in the car seat/you name it.  Small children can do an amazing thing with the word why, though: they can make adults talk.  One little word repeated over and over will make adults talk, and talk, and talk.  On this analysis, the answers don’t matter–what’s important is the child’s ability to affect the world, to “make” adults do something, rather than the other way around.

It’s a hypothesis that can’t be tested, so it’s not a scientifically interesting one, in some sense.  However, it’s an interesting one in another way, because it opens up a subject that’s quite interesting: questions.  We think about questions as being things that one produces in order to get information, but in reality, it’s far more complex than that.  More on this after I have a cup of coffee—it’s reeeeeeally early where I live.

There are many, many ways to translate “question” into French, and this is something that I mess up all the time.  Here are some (but not all!) of the options:

  • la question: a query.
    • Mr. Rodger Cuzner: I’ll ask three quick questions and then I’ll step back and let you guys answer. M. Rodger Cuzner: Je vais poser trois courtes questions à la suite et vous laisser répondre. (Source:www2.parl.gc.ca)
    • For two and a half days the recurring question was: “Where are you from?”  Durant deux jours et demi, la question récurrente aura été : “T’es d’où ?  (Source: circostrada.org)
  • le point, une incertitude: a question in the sense of a matter or doubt.
    • This was not a question in the survey, but the overall survey results imply that people are not yet able to fully use such services to their advantage.  Ce point n’a pas fait l’objet d’une question de l’enquête, mais les résultats globaux indiquent que les personnes ne sont pas encore en mesure de tirer pleinement profit de ces services. (Source: unaids.org)
Ukrainian Humanitarian Resistance

Resisting the russist occupation while keeping our humanity

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE