Bambi. The text at the top reads “Bambi, tell me your story again.”

Who knows what randomness Zipf’s Law will bring into your day?  The path from the train station to the lab takes you through a bit of beautiful forest.  A couple of weeks ago, one of my coworkers saw a biche in the forest on the way to work.  After some discussion, we concluded that the English word for this is a doe.  This got me thinking about just how rare some of the words that you know in your native language are.  I guess that if you belong to a hunting culture in North America, the words for deer might be a lot more common in your life than they are in mine–I don’t hunt, and the incidence of these words in my world must be incredibly low.  Yet: I have a reasonably well-developed vocabulary for talking about deer, like any other English speaker, I imagine.  Let’s see some French equivalents:

Sorry, this page is still under construction.

  • le cerf: deer.  (You probably already know that one, but I include it for completeness.)
  • la biche: doe.
  • la venaison: venison.

Where can you run into stars?

croiserIn a typical weekend in Paris, I don’t speak to a single person other than (a) anyone that I might call on the telephone, or (b) waiters and sales clerks.  Even during the work week, if my office mate and my host don’t come in and no one has lunch with me, I might not speak to anyone all day.  So, the word that I learnt over lunch today seems quite apropos.

I’ve run across the verb croiser in a couple of senses in the past.  One of the senses is “to cross,” e.g. crossing a street, as well as crossing one’s legs or folding one’s arms.  In biology, it can refer to crossing in the sense of hybridization.  I’ve also heard it used in a technical context of machine learning for natural language processing as validation croisée , in the context of what we call in English “cross-validation.”

The meaning that I ran into today shows up when the object of the verb is a person.  In this case, it means “to run into or bump into someone.”  So, the picture of a magazine cover in this post reads “Where can you run into stars?” I was at lunch with some co-workers when one of them mentioned that she had been in the lab on Saturday and had only croiser‘d two people.  So glad that I croiser‘d those coworkers at lunch, since that meant that I actually got to talk to someone today!  However, when I asked my coworker, she said that I can’t say that I croiser’d a new word sense–it has to be a person.

Update, July 2nd, 2015: there will be a segment on the news channel that I watch tonight titled Ils ont croisé les terroristes, about people who in one way or another crossed the paths of various and sundry perpetrators of recent terrorist attacks in France.

Stereotypes about the French–in French

I posted a film about things that people (OK, mostly Americans) shouldn’t do in France.  Here’s one about stereotypes about the French, as explained in French.  I have to say: they nailed it.  Scroll down for some vocabulary, and then watch the video!  (Good luck understanding these folks–reeeeally hard to follow, if you’re not a native speaker, but good practice.)

  • parler fort: to speak loudly.
  • impoli: rude, in the sense of “impolite.”
  • vulgaire, grossier/grossière: rude, in the sense of “crude” or “vulgar.”
  • rude: hard, tough, harsh; dreadful, unkind, coarse; or, in slang, “great.”
  • pervers: perverse, wicked, grotesque; or, in the film, “pervy.”
  • marinière: striped shirt, striped sweater, French sailor’s shirt.
  • le truc: thing, thingy.  (I can’t believe that this word has never come up before–it’s, like, the most French word ever, and they don’t teach it to us in school.)

What not to do in France

This is pretty good!  Scroll down for some vocabulary items, and then watch the video! (If you’re American: the advice is quite good.  If you’re French: it’s probably pretty amusing.)

  • se moucher: to blow your nose, to wipe your nose.
  • There’s some talking about kissing and hugging–the bise that the French are comfortable with and Americans are very uncomfortable with, and the hug that Americans are very comfortable with and the French are very uncomfortable with.  These words are a potential minefield for non-native speakers, and it is not difficult for an American to find themselves talking about fucking someone when they meant to talk about kissing them.  I can’t actually hear the word that the speakers use in the film, but here’s a quick review.  I read somewhere a joke along the lines of French words for this kind of thing having undergone a sort of “promotion,” so that the word that formerly meant “to hug,” embrasser, can still mean that in a literary sense, but is more likely to mean “to kiss,” while the word that they teach us in French class means “to kiss,” baiser, can definitely mean “to fuck.”  If you want to talk about hugging, it’s probably safest to use prendre dans ses bras (to take into one’s arms).  To say hello or goodbye to someone with a kiss is to faire la bise.
  • la joue: cheek.
  • serrer la main à qqn: to shake hands with someone.

Tea for tu and tu for tea

tu_vousThis past week I attended a French conference.  Almost all of the talks were in French, which means that I spent most of the week hurriedly scribbling words in my notebook to look up later. From a linguistic perspective, the most surprising thing to me was that during the questions-and-answers after a talk, people in the audience addressed the speaker with the informal pronoun tu, and the speaker addressed people in the audience with the informal pronoun tu, as well.  Like many languages, French has formal and informal forms of the word you.  Tu is the informal form of the word, and vous is the formal form of the word.   (Some languages also have plural formal and informal forms of the word you.  However, in French, both of those are vous (the same as the singular formal form)).  You will hear lots of allegedly simple explanations of when to use each one of these, but in practice, it is far more complicated, and I often goof.  Part of the reason that I love my French tutor back home is that the first two words that she taught me were the verbs tutoyer, meaning “to address someone as tu,” and vouvoyer, meaning “to address someone as vous;” these have turned out to be really useful, because often the first thing that someone who I know professionally says to me is “we can tutoyer.

I asked some native speakers why vous isn’t used in this (pretty) formal situation, i.e. a presentation at an academic conference.  Answers that I got were that everybody at the conference knows everyone else, and it’s weird to call someone that you know tu, and that besides, it’s not that large of a conference, actually.

Other than that particular linguistic observation, I mostly marveled at my own ineptitude.  Why is it that I can read books about lexical semantics in French, but the only spoken things that I understood one particular day were J’ai trop mangé (“I ate too much,” said by someone a couple of seats over from me after lunch), and on t’entend pas (“we can’t hear you”–said by an attendee to a soft-spoken speaker–note the informal t’ (a shorter form that occurs before a vowel), rather than the formal on vous entend pas).  If it weren’t for the PowerPoint slides, I would be lost all day…

In light of the embarrassment of riches as related to words that I didn’t know over the course of the past week, I’m just going to focus on verbs today:

  • repérer: lots of meanings related to noticing, spotting, or finding things.  In my field, it shows up in the related nominal form repérage d’entités nommées,  which we express as “named entity recognition” in English.
  • nettoyer: to clean or, figuratively, to clean out.  You might need to nettoyer the data from a web page.
  • constater: to note, notice, or observe; to record or certify.  Nous avons évalué l’ensemble de ces résultats et nous constatons une amélioration sur l’acquisition de paraphrases sous-phrastiques.  “We have evaluated the set of these results, and we note an improvement in the acquisition of sub-phrasal paraphrases.”  Bouamor, Max, and Vilnat, Combinaison d’informations pour l’alignement monolingue. 
  • attraper: various meanings having to do with grabbing hold of or catching something.  This came up in the context of a discussion of a French convention that I’m not sure I understand, by which students can choose to skip an exam and take a make-up.  Apparently this happens often at the university level, if I understood correctly.  I also heard réattrapage in the same context.
  • prendre en compte: to take into account, but also to take on board.  Elle permet aussi de prendre en compte les positions relatives du nom et de l’adjectif (postposition ou antéposition) dans le calcul du sens.  “It also allows taking into account the relative positions of the noun and the adjective (postposition or preposition) in the calculation of the meaning.” Venant (2007).  Utiliser des classes de sélection distributionnelle pour désambiguïser les adjectifs.
  • engendrer: to cause, produce, or create; to lead to or bring about; to engender.  There are also some meanings related to procreation.  STAG a été utilisé avec succès dans une grammaire anglaise qui permet d’engendrer simultanément les analyses syntaxique et sémantique d’une phrase… “STAG has been used successfully in an English grammar, which permits producing syntactic and semantic analyses of a sentence simultaneously.”  Danlos, STAG: un formalisme pour le discours basé sur les TAG synchrones.  (Don’t quote my translation of the relative clause in this one–I’m not sure that I got it right.)
  • disposer de: to have available, or to have at your disposal; to manage, run, or order.  (There are other meanings if you don’t have the preposition, as well as reflexives.)  Ce dont on a le plus besoin en TAL, c’est de disposer de lexiques à large couverture… “What we need the most in [natural language processing] is to have available large-coverage lexicons…”  Maurel and Tran, Prolexbase: Un lexique syntaxique et sémantique de noms propres.
  • se démarquer: to distinguish yourself, differentiate yourself, distance yourself, stand out from; in sports, to free yourself or to get free.  (Very different non-reflexive senses, as well.)  Un accent peut être stigmatisé, dévalorisé et générateur de ségrégation, ou au contraire revendiqué pour affirmer son identité, sa loyauté, son intégration à une communauté et se démarquer d’un autre groupe… “An accent can be stigmatized, deprecated, and a factor of segregation, and quite the opposite, claimed [remember that we saw this word used in a news story that talked about ISIS claiming credit for a terrorist attack] to affirm one’s identity, one’s loyalty, one’s integration into a community, and to differentiate oneself from some other group…” de Mareüil, Vieru-Dimulescu, and Adda-Decker, Accents étrangers et régionaux en français.
  • remarquer:to note or see; to notice; also to relabel or remark.  A l’aide de cette courbe, nous pouvons remarquer que globalement, au dessus de 2000 mots, l’information mutuelle des mots se “stabilise”… “With the aid of this curve, we can see that globally, above 2,000 words, the mutual information of the words “stabilizes…” Brun, Smaili, and Haton, WSIM: une méthode de détection de thème fondée sur la similarité entre mots.  A few related expressions:
    • faire remarquer: to make your point
    • faire remarquer qqch à qqn: to point something out to someone
    • se faire remarquer: to stand out, to get yourself noticed
  • déclencher: to trigger, to cause, or to set something off.  In the conference, it was used in the sense of “triggering” the execution of a rule.  During the time of the conference, a couple of guys attacked an emergency vehicle in southern France, and the verb declencher was used to describe the action of the police initiating a wide search for the miscreants.  Here’s an example from Twitter, just because I’m getting tired of typing citations from journal articles.  It means “Dannyl Roof has admitted that he wanted to set off a race war:”
  • Screenshot 2015-06-28 19.45.42
  • rapprocher: lots of meanings having to do with bringing something closer to you.  (In the reflexive, it’s approaching something.) a été effectuée en s’efforçant de rapprocher les jeux d’étiquettes de ces deux corpus…  “…has been carried out while endeavoring to bring together the tag sets of these two corpora…”  Falaise, Intégration du corpus des actes de TALN à la plateforme ScienQuest.
  • ajouter: to add or (in computing) append. …il est utile d’ajouter à l’annotation…  Bonfante, Guillaume, Morey, and Perrier, Enrichissement de structures en dépendances par réécriture de graphes.

That’s an awful lot of words!  And, that’s just some of the verbs–imagine how many nouns there were, too…  If you don’t recognize the cultural reference in the title to this blog: it comes from an old musical number called Tea for Two.  You can hear Doris Day sing it while wearing a dress with sparkly hems here.  If memory serves, the lyrics include Tea for two, and two for tea, me for you, and you for me…

Charity and leather goods

2015-06-06 15.09.00In America, there is a huge genre of books about France and the French.  What American hasn’t at least heard of Mireille Guiliano’s book French women don’t get fat, and perhaps even read it?  (643 reviews on Amazon, average of four stars.) Many of them are pretty much just full of stereotypes, but some attempt real analysis.  One of the things that I’ve read in such a book is that the French are, in general, less charitable than Americans.  The explanation given is that in America, the assumption is that people in need will be taken care of by their community and religious organizations and the government will just take up the slack, while the assumption in France is that people will be taken care of by the government, while communities and religious organizations will just take up the slack.

This is belied somewhat by the fact that the streets of Paris are full of beggars, and there is money in their cups.  Yesterday I found some more evidence that the French are perhaps not so much less charitable than Americans as is thought to be the case.  The picture in this post was taken from the side of a bin for collecting things for the poor.  Of course we have such bins in America, too, but I think that what it says on the side of this bin is compelling.  Such bins in America typically have a sign saying something like “please do not put trash in this bin.”  The sign on the bin that I saw yesterday instructs you that the following can be placed in the bin:

  • “Clean and dry clothes and household linens in a closed sack”
  • “Shoes tied together by pair”
  • “Leather goods”

What I thought was so striking about this was the idea that people would donate leather goods–presumably jackets and the like.  I can’t imagine an American donating a leather jacket to charity–they’re far too expensive to give away.  Words that I learnt in the course of the day:

  • la maroquinerie: this can mean leather goods, and also a leather goods shop.
  • sortir en boîte: to go clubbing.  This has nothing to do with charity–I saw it in an advertisement in a newspaper that was sticking out of a trashcan.  Yes, linguists collect data constantly, even from newspapers sticking out of trashcans.  No, I did not take it out of the trashcan.

My apartment reeks of camembert and paint fumes

Camembert is sold in wooden boxes.  Here is one with a picture of a poilu, or French soldier from World War I, on the lid.
Camembert is sold in wooden boxes. Here is one with a picture of a poilu, or French soldier from World War I, on the lid.

France has hundreds of cheeses.  You hear lots of exact numbers, but I suspect that no one really knows how many there are.  Camembert is perhaps the most French of the French cheeses–it is the Frenchman’s stereotype of a French cheese.  (If you’re French: Americans think that the stereotypical French cheese is a brie.  We can’t get camembert worth the name in America–raw-milk cheeses aged less than 60 days are illegal.  Yes, illegal.)

Every French cheese has a story.  The story of camembert is that it was created by one Marie Harel when a priest fleeing to England around 1790 gave her some suggestions based on how they made cheese back in his home in Brie.  (The Church was gone after with a vengeance after the French Revolution.  Over 200 priests were killed in the September Massacres in Paris in 1792.  I went to a beautiful Vivaldi concert nearby.)  According to Kathe Lison’s delightful The Whole Fromage: Adventures in the Delectable World of French Cheese, camembert makers distributed it for free to soldiers in the trenches during World War I, hoping to create loyalty, and it worked.

Part of camembert’s charm for Americans (when we can actually buy it, which is when we come to France) is that it smells like we think a French cheese ought to smell: pretty bad.  The hallways in the apartment that I’m renting were just painted, and the combination of the smell of the camembert sitting on my kitchen counter and the fresh paint is…intoxicating, and not in a good way.  Still, the camembert made for a great dinner tonight with the stereotypical baguette and red wine–shoot me, I’m a tourist.  Here are some words that are helpful for reading about camembert:

  • puisque: since, because, seeing as; just as, just like.
  • le convive: guest.

Devenu le symbole de la France avec la baguette de pain et le verre de vin rouge, il a une taille idéale pour un fromage, puisqu‘on peut le manger en une seul fois à quatre ou cinq convives“Having become the symbol of France along with the baguette and the glass of red wine, it has the ideal size for a cheese, because one can eat it at one sitting with four or five guests.”

Hawaiian shirts turn out not to be the way to go in the Parisian workplace

2015-06-22 19.50.43In France, it’s important not to look like everyone else.  It’s also important to be in style.  This, obviously, creates a conflict. I was feeling whimsical when I packed, and decided to structure my summer wardrobe around my collection of Hawaiian shirts.  (I can only bring so many pieces of clothes for a six-week stay, so packing well is really an issue.)  Today I happily put on one of those shirts–a bright blue one that matches my eyes.  It didn’t go over well at the lab. My office mate Brigitte: “So, what’s up with your shirt?” Me: “I…um…likes Hawaii.” Brigitte: “That’s not a work shirt, that’s a vacation shirt!  So, you’re here on vacation?” Me: “I…um…works?” Brigitte: “You need to change your stock of shirts.” Brigitte is a scream.  Of course, Zipf’s Law struck in this conversation, as in any other:

  • renouveler: to renew, change, or (in the case of a contract) extend.  This is the verb that Brigitte used.
  • le stock: believe it or not, this is a French word, and it’s spelt stock, which is about as un-French of a spelling as you can imagine.  There are actually some related words:
    • le stockage: storage, store.
    • stocker: to store or hoard; to stock up on; to stock something (with an intent of selling it).

…or even read my own writing

One of the many things that is embarrassing in a foreign language: not being able to read your own writing.  I recently wrote a paper with a couple of my fellow computational linguistics folks, one of whom is French.  I wrote the first draft; she translated it into French and then added more material, made it into a better paper, etc.  It was discouraging when Zipf’s Law struck in a translation of my own writing, and I couldn’t read my own paper!  Happily, the Poisson distribution struck, too, and the great podcast Coffee Break French had a segment on one of the words that I didn’t know: voire.  This word translates as something like or even or and even.  Here’s an example from my paper:

Les chercheurs qui ont organisé la campagne ont également été touchés, voire bouleversés par leur contact avec ce corpus.  “The researchers who organized the project have also been affected, and even devastated by the corpus.”  (A corpus is a collection of analyzed linguistic data.)

Or, le Web 2 a permis l’apparition de plate-formes de myriadisation du travail parcellisé (microworking crowd-sourcing), dont Amazon Mechanical Turk, qui proposent à des demandeurs (Requesters) d’accéder à une «foule» de travailleurs (Turkers), qui sont très peu, voire pas du tout, rémunérés.  “But, the Web 2.0 has allowed the appearance of microworking crowd-sourcing platforms, among them Amazon Mechanical Turk, which offers “Requesters” access to a “crowd” of workers (Turkers), who are paid very little, or even not at all.”  (Couillault, A., & Fort, K. (2013, July). Charte Éthique et Big Data: parce que mon corpus le vaut bien!. In Linguistique, Langues et Parole: Statuts, Usages et Mésusages (p. 4).)

Some Twitter examples:

Screenshot 2015-06-04 14.57.52

“I also commit myself to review a little bit every evening in order to pass excellently my bac (high school exit exam) on French, and even the science one.”

Screenshot 2015-06-04 14.58.47

“The only interesting courses are those on history and French–the English is at a kindergarten level, or even nonexistent.”

Screenshot 2015-06-04 15.00.25

“There is a chasm–or even two chasms–between the French YouTubers and the English-speaking YouTubers.”

Thanks to Coffee Break French for clearing this Zipf’s Law example up for me–if you’re interested in learning French at any level, from complete beginner to advanced, check out their podcast and web site.

Don’t be shy: Ask your Parisian taxi driver about Uber

taxi-parisienI should start by saying that I have had some great experiences with taxi drivers in Paris.  The West African immigrant who got me from the airport to my apartment for 40 euros when it should have cost 50, the guy who plowed through downtown traffic like a crazy man to get me to the opera on time–I’ve never really felt like I was getting ripped off here.

But, who doesn’t hate taking a taxi in a strange city?  I always feel like I have to do something to demonstrate that I’m not some tourist to be driven in circles around the périphérique for two hours.  In Paris, the obvious way for an American to do that is by speaking French.  But, besides the fact that I don’t speak French well, there’s also the issue that unlike in the United States, where you might know your taxi driver’s children’s names and grade point averages by the time you get where you’re going, it’s culturally weird to have a conversation with someone you don’t know here.  So, how to establish your Parisian bona fides?  My latest hypothesis is that you do this by asking your taxi driver if they have Uber here yet.  It turns out that they do, and if your taxi driver is anything like mine was this morning as I made my way into town from the airport, he’ll have a lot to say about it.  I tried the Uber approach this morning.  It did start up a conversation, and when the traffic became completely impossible–the quarter finals of the French Open are today, and the King of Spain is driving through town for some reason–my taxi driver became a madman and got me where I was going faster than might have happened otherwise.  Words that I learnt in the course of the ride from the airport:

  • boucher: this is a noun, meaning “butcher,” but it’s also a verb, with meanings that have to do with blocking things.  So, it can mean “to cork” a bottle, and “to plug” or “to seal” a hole or a crack.  In the case of traffic, it is “to block.”  As my taxi driver said in frustration as he tried yet again to get off of the freeway: C’est bouché partout, partout, partout!  “It’s blocked everywhere, everywhere, everywhere!”
  • le débouché: an outlet, opening, or exit.  There’s also a verb déboucher that means things like “to unblock” and “to uncork.”  When we broke free of traffic thanks to the driver’s heroic exertions, I happily said débouché!  He responded glumly, pour le moment–“for the moment.”
Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues