How to get a computer to answer a factoid question

Computers can now answer factoid questions–if they can tell what’s being asked…

architecture_question_reponse
An architecture for a question-answering system. Picture source: https://commons.wikimedia.org/wiki/File:Architecture_question_reponse.png (By Tinmn (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)%5D, via Wikimedia Commons)

France’s ability to keep on keeping on after the Paris attacks has been amazing.  In that spirit, here’s a post about something other than how horrified I am.

In a recent post, we talked about factoid questions–questions that typically start with words like who, what, when, or where, and typically have answers that are just a short phrase.  We’re pretty good at getting computers to answer those kinds of questions.

Once upon a time, the assumption in trying to get computers to answer questions (which we’re going to call question-answering in English, or questions-réponses in French) was that there was a database that contained the answers, and you were going to get the computer to process the question in such a way as to retrieve the answer from the database.  Today, the assumption is that there is a web page somewhere that has the answer.  So, how do you get to that answer?

The first step in question-answering is usually to figure out what kind of question you’re dealing with. This lets your system know what kind of answer it should be looking for.  
Where is Paris? and Where is the spleen?  call for very different kinds of answers.  On the other hand, Where is the capital of France? and What is the capital of France? need the same kind of answer.  So, it’s not as simple as just checking whether the question starts with who, what, when, or where. (Of course, there are many other ways that you can ask a factoid question—When was Mozart born? could more or less equivalently be asked as What year was Mozart born? You can see how difficult this can get.)

The French Wikipedia page on questions-réponses talks about some things that are helpful in making these kinds of distinctions between question types (and types of expected answers), and of course Zipf’s Law comes into play, so we’ll need to learn some new words (or, at least, I will–I don’t know about you):

  • le focus: As far as I can tell, this is an unassimilated English loan word that means “focus.”  Le focus d’une question correspond à la propriété ou l’entité recherchée par la question.  “The focus of a question corresponds to the property or the entity sought by the question.”
  • Le thème: theme, subject, or topic.  Le thème de la question (ou topic) est l’objet sur lequel se porte la question.  “The theme of the question (or topic) is the thing that the question is about.”
Question Focus Theme
Who is the president of Benin? Who the president of Benin
When was Mozart born? When Mozart
When do cells divide? When cells
How much does a kimono cost? How much a kimono
How much does an elephant weigh? How much an elephant

You can see from even a few examples that this is hard for a computer to do.  When was Mozart born? requires a very different answer from When do cells divide?, despite the fact that the focus looks the same in both questions.  Similarly, How much does a kimono cost? and How much does an elephant weigh? have focuses (foci?) that look the same, but they require very different types of answers.  However, determining the focus and the theme of a factoid question are a good start.  We’ll see in another post how identifying what are known as named entities can help to refine our understanding of the question.

What’s making us cry today

For context, (a) 9/11 didn’t make me cry per se; I was shocked, I was horrified, I gave blood, but I didn’t cry; and (b) I’m a middle-aged American male and hence have been raised from childhood not to cry.  But, this blog post brought me to tears.  It’s an excellent French lesson from the Lawless French web site.  It’s more or less a perfect lesson for intermediate-to-advanced students of French who already speak English–it includes a video segment, the French transcription, the English translation, a vocabulary section, a grammar points section, and links to further reading.  You couldn’t ask for more.

The tears-inducing part is that the lesson is centered completely around Hollande’s address to the French people right after the Friday 13 November 2015 terrorist attacks in Paris.  What got me was the contrast between the topic of the material and the topic of the material in French lessons when I was taking French 101 in college.  We had lessons centered around eating in a restaurant, renting an apartment, or meeting interesting French people (a little unrealistic, since Americans in France practically never meet French people on a personal level, but perhaps that’s part of its charm).  That was French language instruction when I was a younger man–French instruction today seems often to be about terrorism.  (If you look back through the previous year and a half’s worth of posts on this blog, you’ll find a disturbing number about vocabulary that I learnt in the context of news story about terrorist attacks.)  This is not the world that I grew up in…

Some vocabulary from Lawless French’s lesson on expressing sorrow and regret:

  • être désolé(e) de qqch: to be sorry about something.  Desole de ne pas vous avoir donne des nouvelles pendant un mois mais l’explication de ce sejour prolonge sera bientot l’objet d’un prochain article.  [NOTE: the author didn’t use any accents.]  “Sorry not to have given you any news for a month, but the explanation of this prolonged stay will soon be the object of a forthcoming article.”  (Source:tour-du-monde-autostop.fr)
  • être navré(e) de qqch: to be sorry about something.  Étant député d’une région où le taux de chômage est relativement élevé, je suis extrêmement navré de constater que ce gouvernement laisse tomber toute une catégorie de la population. “As a member of parliament for a region where the unemployment rate is relatively high, I am greatly distressed to see that this government is ignoring a whole category of the population.” (Source:www2.parl.gc.ca, via linguee.fr)  Je suis navré d’apprendre la mort de L.B. Je le connaissais depuis plus de 30 ans.I am very sorry to learn of the death of L.B. I knew him for more than 30 years.”

In American English, “I’m sorry” can be an admission of guilt (“I’m sorry I broke your foot”) or an expression of sympathy (“I’m sorry you broke your foot”).  As far as I can tell, désolé tends more towards the former, but also works for the latter.  Navré seems to be more for the latter.  I haven’t found firm native speaker judgements about this; WordReference.com gives examples of both for désolé.

 

What’s surreal today

Surreal is when you’re memorizing kitchen vocabulary and find out that what you really need to know how to say is “rocket launcher.”

2015-11-16 16.49.31
PIcture source: screen shot of my cell phone.

Random words that showed up on the screen of my cell phone yesterday as France continues to deal with the aftermath of the 11/13 attacks.  Definitions from WordReference.com.  It seems surreal that lately I’ve been trying to memorize the name of every single thing in my kitchen, while these are apparently the words that I need:

  • la perquisition: police search, police raid
  • le lance-roquettes: rocket launcher

 

I’m fine, as are my friends in Paris

Thank you to everyone who wrote to check on me.  I’m in the US, and am fine.  My family and friends in Paris are all accounted for. 

  • la fusillade: shootout, exchange of gunfire; fusillade.  Des fusillades ont éclaté vendredi soir vers 21h20 dans le Xe puis dans  le XIe arrondissement de Paris, près de la rue Bichat, puis au Bataclan, faisant 42 morts selon la préfecture de police, et des dizaines blessés.  “Shoot-outs broke out Friday evening around 9:20 PM in the 10th arrondissement of Paris, then in the 11th arrondissement, near rue Bchat, then at the Bataclan, killing 42 and injuring dozens, according to police headquarters.”  (Source: Liberation)

Calling questioning into question

It bothers me that this isn't the most disturbing tattoo I've ever heard of.  Picture source: http://dandygoat.com/have-you-seen-anyone-with-a-huge-question-mark-tattoo-on-their-face-heres-what-its-about
It bothers me that this isn’t the most disturbing tattoo I’ve ever heard of. Picture source: http://dandygoat.com/have-you-seen-anyone-with-a-huge-question-mark-tattoo-on-their-face-heres-what-its-about

As we saw the other day, asking a question can put you in a position of power–it demands a response from another person.  Perhaps that’s why small children will ask why? over and over–it lets them make adults do something for a change–talk.

But, questions are more complicated than that.  Asking a question can also put you in a position of weakness–it’s an admission that there’s something that you don’t know.  It’s not an accident that you’ll rarely see the president of the US ask a question. You might use questions in both of these ways in a single context.  Maybe you’re in a meeting, and you ask a pesky underling a question about what they’ve accomplished that week–you’re showing that you’re above them in the hierarchy.  Then you ask your status-conscious boss a question that shows that you implicitly admit that they know more than you do.  Pay attention as you go about your day today, and see if you can guess what motivates people to ask the questions that they ask–what they gain by asking them, and what they surrender.

It’s sort of a stereotype that men are not as willing to ask questions as women, at least in America.  A quick Google search reveals no actual data on this, though, and I would be really surprised if there weren’t all sorts of interactions with many variables that go into determining something like this–is it a male/male, female/female, or male/female conversation; are we talking about behavior in a group, or behavior in private; are we talking about people with higher or lower status; people of the same age, or of different ages; maybe it’s different in different parts of the country.  And, do we count all questions equally?  Is an actual request for information to be counted the same as browbeating?  Is a perfunctory question to be counted the same as a genuine question?  When you start trying to count things in language, it can be a lot harder than you would think.

One of my linguistic themes these past few days has been trying to figure out the many ways to translate the word question from English into French.  Here’s one that you might not have seen before:

  • remettre en cause: to call into question.  Cela inclut la communication de nouvelles informations sur les propriétés dangereuses, ainsi que d’informations susceptibles de remettre en cause la pertinence des mesures de gestion des risques recommandées par le fournisseur.  This includes the communication of new information on the hazardous properties that become available as well as of information that may call into question the appropriateness of the risk management measures recommended by the supplier.  (From the linguee.fr web site.  Original source quidance.echa.europa.eu)

Don’t bother asking your computer “why” just yet

Picture source: screenshot of Google results.
Picture source: screenshot of Google results.

We’re pretty good at getting computers to answer certain kinds of questions.  A good example is what people in my field call factoid questions.  These are questions that have a clear answer, typically some sort of a noun, and usually pretty short.   They tend to start with words like who, what, when, or where.  What year was Mozart born?  Where was the first McDonald’s?  Who wrote Pride and Prejudice?  There’s been a ton of research on how to get computers to answer these kinds of questions, much of it organized by the National Institute of Standards and Technology’s annual Text Retrieval Conference.

Although we’re pretty good at getting computers to answer factoid questions, it’s still really difficult to get a computer to answer a why question.  Unlike factoid questions, whose answers are typically a short phrase, why questions are usually answered by an explanation or a procedure, and these are typically longer than a single phrase–Suzan Verberne, a researcher at Radboud University Nijmegen and an expert on why questions, found that answers to why questions are typically at least one full sentence in length, and can easily be as long as a paragraph.  We’re not nearly as good at finding these longer stretches of text as we are at finding those little “factoids.”

In English, the field of research that deals with getting computers to answer questions is called question-answering.  In French, it’s called questions-réponses.  Here is a verb that I learnt from the French Wikipedia page on question-answering:

  • se fonder sur qqch: to be based on something, to be built on something.

Les Systèmes de réponse à des questions explorent de nouvelles méthodes de recherche d’information exploitant des requêtes formulées à l’aide du langage naturel et non plus en se fondant uniquement sur des mots clés (comme c’est le cas avec les moteurs de recherches actuels).  “Question-answering systems explore new methods of information retrieval, using requests formulatd in natural language and not being based only on keywords (as is the case with current search engines).”

Why ask why?

Picture source: http://michigan.247sports.com/ImageUrl/1180443?View=Detailed
Picture source: http://michigan.247sports.com/ImageUrl/1180443?View=Detailed

Why do children who can barely speak yet ask why, when they probably can’t understand the answer?  Why do they ask it over and over again?

Linguists who have talked about this generally think of it as the child having discovered a novel power.  In general, small children can’t really do anything to the world.  Adults, on the other hand, can do pretty much anything–drive, operate a microwave, read, write, produce money, distribute candy, make little children go to bed/get dressed/get in the car seat/you name it.  Small children can do an amazing thing with the word why, though: they can make adults talk.  One little word repeated over and over will make adults talk, and talk, and talk.  On this analysis, the answers don’t matter–what’s important is the child’s ability to affect the world, to “make” adults do something, rather than the other way around.

It’s a hypothesis that can’t be tested, so it’s not a scientifically interesting one, in some sense.  However, it’s an interesting one in another way, because it opens up a subject that’s quite interesting: questions.  We think about questions as being things that one produces in order to get information, but in reality, it’s far more complex than that.  More on this after I have a cup of coffee—it’s reeeeeeally early where I live.

There are many, many ways to translate “question” into French, and this is something that I mess up all the time.  Here are some (but not all!) of the options:

  • la question: a query.
    • Mr. Rodger Cuzner: I’ll ask three quick questions and then I’ll step back and let you guys answer. M. Rodger Cuzner: Je vais poser trois courtes questions à la suite et vous laisser répondre. (Source:www2.parl.gc.ca)
    • For two and a half days the recurring question was: “Where are you from?”  Durant deux jours et demi, la question récurrente aura été : “T’es d’où ?  (Source: circostrada.org)
  • le point, une incertitude: a question in the sense of a matter or doubt.
    • This was not a question in the survey, but the overall survey results imply that people are not yet able to fully use such services to their advantage.  Ce point n’a pas fait l’objet d’une question de l’enquête, mais les résultats globaux indiquent que les personnes ne sont pas encore en mesure de tirer pleinement profit de ces services. (Source: unaids.org)

Jeb jab and the half-mythical 35-hour French work week

Picture source: article.wn.com
Picture source: article.wn.com

It’s the time when candidates for party nominations for presidential candidacy are in full swing in the US.  The Republican contest has been especially bizarre, featuring rampant attacks by the various and sundry candidates for the nomination against each other.  (Not so many attacks about the presumptive Democratic candidate, Hillary Clinton.  This tendency amongst Republicans to savage each other during the primaries often bites them in the general election, but I guess that’s their business.)  As I mentioned in a previous post, last week there was a debate amongst the contenders for the nomination for Republican presidential candidate.  Jeb Bush (son of US President #41, George H.W. Bush, and brother of US President #43, George W. Bush) attacked his erstwhile protege, and now opponent, Marco Rubio over his attendance record in the Senate, saying “The Senate, what is it like a French work week? You get, like, three days where you have to show up?””  This was an especially timely gaffe, since France is in the midst of a restructuring of its labor laws, one feature of which is modifications to the 35-hour work week for which France is famous/notorious in the US.

As it is, the French 35-hour work week is half-mythical.  Only about 50% of the French work in positions that are subject to the 35-hour definition.  The lab that I visit when I’m in France is eligible for it in theory, but in practice we have a 37.5-hour work week, justified by having more official holidays than is the norm.  In fact, the French typically work just under 40 hours a week–slightly more than the typical German.

I get an article or two about the proposed modifications to the French labor laws in my (English-language) Google News feed every day lately, and there was a segment about them on the news show that I was listening to on the way to work today.  The announcer used the expression heures sup’.  This is an abbreviation for heures supplémentaires: “overtime.”  Here are some examples from the linguee.fr web site:

  • …la progression sur douze mois des gains horaires moyens (heures supplémentaires non comprises) pour les employés permanents… “…that the year-over-year increase in the average hourly wage (excluding overtime) for permanent workers…” (Source: banqueducanada.ca)
  • forfaitaires et les rétributions aux taux horaires concernant les heures supplémentaires prestées par les fonctionnaires et agents… “This appropriation is intended to cover the fixed allowances and hourly-rate remuneration for overtime worked by temporary agents in categories…” (Source: eur-lex.europa.eu)

Jeb has since apologized for his jab at the French workweek.  As a commentator on MS-NBC pointed out last night, apologizing to the French is not necessarily a great strategy in the struggle for power in the Republican party, but I guess that’s his business.

Zipf’s Law and burning at the stake

Cathars being burnt alive at Montségur.  Picture source: http://vivre-au-moyen-age.over-blog.com/article-13095618.html
Cathars being burnt alive at Montségur. Picture source: http://vivre-au-moyen-age.over-blog.com/article-13095618.html

It’s the end of the peak publishing period in France–between late August and early November–and the beginning of the season of literary prizes.  Mathias Enard took the Goncourt prize yesterday for his novel Boussole (“compass; barometer (figurative), indicator”–definitions from WordReference.com), and this morning all of the guests on my news program were writers.  One of them read an essay on the subject of death as a revolutionary act (my favorite French news show is even more cerebral–like, by a long shot–than National Public Radio, the most intellectual of American news shows).  Along the way, he talked about the shift from cremation in the the classical world to burial in the Christian world, pointing out that in the Christian world, only witches and heretics die at the stake.  (If you’re reading this and are not a native speaker of English: the word “stake” typically means a piece of wood that has been sharpened at one end–what you kill a vampire with, right?  If the stake is stuck in the ground and someone is tied to it and burnt alive, it’s called “THE stake.”)

It was a perfect Zipf’s Law moment.  The word for “the stake” is le bûcher.  Is that an obscure word, in the sense of being one that most people wouldn’t know?  Not at all.  Is it a common word?  Not at all.  Why the hell would anyone know this word in a foreign language, though?  Last summer or so, I chanced upon my father reading Le bûcher de Montségur, the classic book on the siege of the Cathars at the Montségur fortress in southern France.  (Yes, there is a classic book on the siege of the Cathars at the Montségur fortress, in English, as well as in French.)  The Cathars were considered heretics, and when they surrendered to royalist troops, about 220 of them were burnt to death in a massive bonfire.  I didn’t know what the word bûcher in the title of the book meant, so I looked it up, then duly memorized it, despite my confidence that I would never see it again.  Over a year later, it’s literary award season, and I run into it again, on the morning news…  Zipf’s Law at its best.

So, in the spirit of the literary prize season, let’s look at some meanings of bûcher, both as a noun and as a verb.  Definitions from WordReference.com:

  • le bûcher (tas de bois où on brûle les morts): funeral pyre.  This is presumably the sense in Le bûcher de Montségur.
  • le bûcher (tas de bois où on exécutait les coupables): stake.
  • le bûcher (abri pour bois): woodshed.
  • le bûcheron: lumberjack, woodcutter.  (Like that derived noun?)
  • bûcher: to work your butt off.  (In Quebec, it can also mean “to log, fell trees; to chop wood.”)

If you’re reading this and you’re French: please note that to us Americans, the word bûcher (stake, funeral pyre) and boucher (butcher) sound the same, so please have mercy on us if we say one when you’re sure we mean the other.  Feel free to correct our pronunciation, though–it’s good for us.

Why Paris might have more in common with Manhattan than it does with Clichy-sous-Bois

A cover of France-Amérique, a magazine for French lovers of America. The title of the cover story is "The American heart: an investigation of philanthropy." Picture source: france-amerique.com
A cover of France-Amérique, a magazine for French lovers of America. The title of the cover story is “The American heart: an investigation of philanthropy.” Picture source: france-amerique.com

Lately I’ve been obsessed with the general incorrectness of the common American belief that the French look down on everything about us.  A couple days ago, I talked about the popularity of English words in France.  The topic of French attitudes about America came up again this morning.  Listening to the news on the way to work, there was a long segment on the banlieus défavorisés of France–the poor suburbs where much of the French underclass lives.  2015 is the 10th anniversary of the 2005 riots, which were very much a feature of the banlieus (unlike, say, the student riots of 1968, which were very much an urban phenomenon).  There was a guest who had been invited to talk about his theories of the geographic aspects of the banlieus.  His take on it is that part of what makes life in the banlieus what it is is that they have very little in the way of public transportation–as Wikipedia puts it in its article on the infamous Clichy-sous-Bois banlieu, where the 2005 riots started, “Clichy-sous-Bois is not served by any motorway or major road and no railway and therefore remains one of the most isolated of the inner suburbs of Paris.”  So, you have this paradox that Clichy-sous-Bois is maybe 10 miles from Paris, but has less in common with life in Paris–culturally, politically, economically–than Manhattan does.  Here, the guest saw the situation in America much more positively–his take was that in America, the low-income areas are mostly urban, not suburban; they have the same public transportation as the rest of the city does, and therefore the residents of an American ghetto have the same access to universities, museums, etc., as more well-off residents of the city do.  Obviously, it’s more complicated than that–we have seriously low-income and culturally disconnected little towns scattered throughout Appalachia and elsewhere, and if you live in a poor urban area of an American city, you probably have other obstacles to your access to universities, museums, etc., besides transportation.  But, the guest was right in that the geographic facts that keep the residents of the banlieus isolated from the rest of French life don’t generally have the same ill effects in an American urban ghetto.

Public transportation (transport un commun) is an important aspect of life in France–let’s look at a little bit of vocabulary from the French Wikipedia page on the subject (translations from WordReference.com):

Le transport en commun, ou transport collectif, consiste à transporter plusieurs personnes ensemble sur un même trajet. Il est généralement accessible en contrepartie d‘un titre de transport comme un billet, ticket ou une carte.

  • le trajet: journey; plane flight; car or bus ride
  • en contrepartie de: in return for, in exchange for
  • le titre de transport: ticket

What’s the difference between billet and ticket?  No clue.  Perhaps someone can tell me in the Comments section?

Ukrainian Humanitarian Resistance

Resisting the russist occupation while keeping our humanity

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE