Burstiness in language and the liberation of Paris

In which language displays interesting statistical properties, some people get fired, and I learn a few words about the Army.

Twenty-plus years ago, I got my first job as an actual, card-carrying linguist, working for a company that did things with big collections of linguistic data, using them to improve computer programs that did speech recognition, i.e. figuring out what words a person is saying.

One fine day the people that gave us the vast majority of our income sent their big-collection-of-linguistic-data specialist to visit us. We demonstrated to him the computer program that we had built to answer the question how can you tell when a big collection of linguistic data is big enough? We pointed out how to spot the tell-tale sign on a graph that means “it’s big enough.” “Oh, that just means that linguistic data is bursty.”

The blue line shows a big collection of linguistic data that is not nearly big enough. The other lines show big collections of linguistic data that are big enough. The telltale sign: a line that has gotten flat. Picture source: Irina Temnikova, Negacy Hailu, Galia Angelova, and K. Bretonnel Cohen. “Measuring closure properties of patent sublanguages.” In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 659-666. 2013.

What did he mean by “bursty?” We had a guess, but weren’t exactly sure, and given that his company paid us a lot of money and he was their expert, my boss thought it best not to push back. A few months later, they declined to renew our contract, and our owner laid everyone off and went away to do something else. Was it because we didn’t push back on the big-collection-of-linguistic-data expert’s dismissiveness? Probably not–our little company committed far bigger errors, and on a sadly regular basis. Whatever–the job market for computational linguists was not terrible in those days (it’s pretty wonderful now), and I found my second job as an actual, card-carrying linguist pretty quickly. But: burstiness is pretty important, and it continues to bump into my life today, in various and sundry ways, some of which will be of interest to readers of this blog.

What burstiness means: per Wikipedia,

In statisticsburstiness is the intermittent increases and decreases in activity or frequency of an event.

Wikipedia, Burstiness

In plain English: burstiness is present when something doesn’t happen for long periods of time, but then happens a lot, and then goes back to not happening very often. Some things that have this characteristic: hurricanes, and pandemics. Statisticians care about burstiness because bursty things are difficult to characterize with normal statistics, so you have to come up with new techniques to work with them; people like disaster planners and public health experts care about those statistics because it is difficult to predict, and therefore to plan for, things that have weird statistical properties.

From a computational linguist’s perspective, burstiness is important because in big collections of language, you don’t see new words very often, but when you do, sometimes you see a lot of them at once. If you’re trying to do something like build a dictionary for a computer program, you typically do that by finding all of the words in a big collection of linguistic data. But, how do you know when your collection of linguistic data is big enough? See above; the problem is that if you kept growing the collection, you know that there will be bursts of new words, but you can’t keep growing your collection forever–at some point, you have to stop and work with what you have at hand.

Many of our dear fellow readers are engaged in learning a language that they don’t already speak. I am one of them–if you have been reading this blog for a few years, you have followed my feeble attempts to learn la langue de Molière, also known as “French.” By now I know the language well enough that I can pick up a book in it and not have to turn to a dictionary very often. But, when I do, it typically happens like this…

Right at this moment, I’m reading Paris brûle-t-il ?, “Is Paris burning?,” the work of reference on the liberation of Paris. I typically get through about three pages before I have to look up a word. But, then this morning, I’m reading about the French 2nd Armored Division rolling from Normandy to Paris when I come across this sentence. I had to look up all of the words in bold face:

Glissant en silence sur leurs six roues de caoutchouc, les automitrailleuses des spahis à calots rouges, “chiens de chasse” de la division, ouvraient la marche.

Dominique Lapierre and Larry Collins, Paris brûle-t-il ?, published by Robert Laffont in 1964.
  • l’automitrailleuse: a light armored vehicle.
  • le spahi: native cavalry trooper of the Maghreb.
  • le calot: garrison cap in English; when I was in the Navy, we called them “cunt caps.” A calot has no brim or visor, and therefore can be folded flat and tucked under the epaulet of a military jacket.
A Russian garrison cap, or calot in French, pilotka in Russian.

After that, it was back to my normal rate: about one word every three pages. That certainly counts as “not very often,” and is pretty good for a non-native speaker. To then jump to three words in a single sentence, and then go back to my base rate of one word every three pages, is a good example of burstiness. Once again, we see why one might right a blog like this one–a blog about the statistical properties of language and their implications for people who are trying to learn one. What happened to the dismissive big-collections-of-linguistic-data expert? I don’t know for a fact, but I do know that people who are dismissive of the opinions of others don’t typically have much professional success. Personally, I took what I learned from the experience of working at a failed software start-up to do a better job of being a computational linguist, and have had a wonderfully fun time with it. Want to try a career in computational linguistics yourself? Start here if you are not a graduate student, or here if you are, and I hope you have as much fun with it as I have!

French notes:

Despite what its name would lead one to think, an automitrailleuse does not necessarily carry a machine gun. Here some pictures of modern automitrailleuses. You’ll notice that some of them look a lot like tanks. The salient differences are that (1) they weigh less, and (2) they have wheels, not treads.

How we’re sounding stupid today: On the propriety of examples

My first language is (American) English, but I speak French well enough that if I want French people to believe that I’m an American, I have to convince them of it. Comparing and contrasting French and American political appartenances helps, as does my ability to explain the difference between felonies and misdemeanors and how they affect the length of your prison sentence. Why it doesn’t occur to me to just speak English with them, I couldn’t tell you–I’ll have to try it some time…

My ability to speak French well doesn’t mean that I don’t make absolutely stupid mistakes, though. Case in point: propreté and propriété. One means “cleanliness” and one means “property,” but if I need to say either “cleanliness” or “property” in French, which of those two words propreté and propriété will come out of my mouth is pretty random. How random? I’d guess a 50/50 chance for either of them. So, how often do I say the right/wrong word? Let’s figure it out.

First, we have to make some assumptions. Assumption #1: the probability of me needing to say cleanliness and the probability of me needing to say property are equal. If we don’t make that assumption, then we have to adjust the calculation of how often I say the wrong word to account for how often each of those two words get said. By me. In French. Complicated? Yes. Hence: Assumption #1.

Assumption #2: the probability of me saying the right word and the wrong word are equal. Otherwise, we have to adjust our calculations of how often I say the wrong word to account for different probabilities for each. By me. In French. Complicated? Yes. Hence: Assumption #1, and Assumption #2.

With those assumptions in place, let’s figure out the possible outcomes in a situation where I need to say one of those words: “cleanliness:”

  1. I need to say “cleanliness” and I say propreté (the right word)
  2. I need to say “cleanliness” and I say propriété (the wrong word)

We have two possible outcomes (that’s the technical term), so the probability of either of them is 1/2, or 0.50, or 50%.

It works the same way if I need to say “property”–there are two outcomes:

  1. I need to say “property” and I say propreté (the wrong word)
  2. I need to say “property” and I say propriété (the right word)

Back to our original question: how often do I say the right/wrong word? Well… we need to change the question. To wit: to know how often I say the right/wrong word, we would need to know the probability of me saying every word that I say, and calculate the probabilities of me getting them right/wrong.

However: I don’t give a fuck about that. What seems funny to me about the fact that I am equally as likely to fuck up the words cleanliness and property is that they’re so fucking…common. I mean, I don’t have a problem with the vocabulary for talking about, say, why we have the Electoral College or why Beaux Arts Victorian houses aren’t built any more, but I can’t talk about the fact that if my little corner of New Orleans gets flooded in the next couple days, I am going to have some hot, sweaty, bug-infested work ahead of me as soon as I can get a plane ticket back there. Yes, friends and family: I am safe and sound in Colorado.

Note to self: propreté is pretty close to propre, “clean”–maybe that can help me remember? And for practice (just in case writing this blog post wasn’t enough), here are some sentences to practice with, courtesy of the Sketch Engine web site, your home for fine linguistic corpora and the tools for searching them. Scroll down for the answers:

  1. Je suis en train de vendre ma ______.
  2. Il y a des efforts à faire concernant la ______ de la piscine.
  3. Comment conserver la ______ d’une salle de bain ?
  4. Dans les années 60, on a étudié les ______ de trous noirs.
  5.  La couleur blanche est rattachée généralement à la pureté et à la ______ .
  6. Balai vapeur hyper polyvalent – pour plus de ______ dans la maison !
  7. Ton corps même n’est pas ta ______ ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
  8. Les jeux vidéo ne sont pas la ______ exclusive de ces hommes blancs cishétéros.
  9. Les oreilles : vérifiez régulièrement la ______ des oreilles de votre chien.
  10. Actuellement la ______ appartient à la commune.
  11. Tous les jeux flash présent sur le site restent la ______ de leurs auteurs respectifs.
  12. Votre langue, cher monsieur Walder, est révélatrice de l’état de ______ du sexe de votre femme, point barre.
  13. Les sanitaires sont d’une ______ immaculée et il y a même des machines à laver.
  14. …les métaphores qui transposent certaines ______ d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
  15. Entretien des trottoirs : Chaque Soiséen est responsable de l’état de ______ du trottoir qui borde sa ______.
  16. Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de ______ de la ______.
  17. Un matériau aux multiples ______ – résistance, ultra______ – et qui s’adaptent aux dimensions de vos projets.
  18. Il n’a cependant pas les ______ ou la ______ du biométhane naturel.

Picture source: https://www.accuweather.com/en/weather-news/how-a-hurricanes-dirty-side-factors-into-the-storm-surge-it-produces/801756. Scroll down for the answers to the exercise!

  1. Je suis en train de vendre ma propriété.
  2. Il y a des efforts à faire concernant la propreté de la piscine.
  3. Comment conserver la propreté d’une salle de bain ?
  4. Dans les années 60, on a étudié les propriétés de trous noirs.
  5.  La couleur blanche est rattachée généralement à la pureté et à la propreté .
  6. Balai vapeur hyper polyvalent – pour plus de propreté dans la maison !
  7. Ton corps même n’est pas ta propriété ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
  8. Les jeux vidéo ne sont pas la propriété exclusive de ces hommes blancs cishétéros.
  9. Les oreilles : vérifiez régulièrement la propreté des oreilles de votre chien.
  10. Actuellement la propriété appartient à la commune.
  11. Tous les jeux flash présent sur le site restent la propriété de leurs auteurs respectifs.
  12. Votre langue, cher monsieur Walder, est révélatrice de l’état de propreté du sexe de votre femme, point barre.
  13. Les sanitaires sont d’une propreté immaculée et il y a même des machines à laver.
  14. …les métaphores qui transposent certaines propriétés d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
  15. Entretien des trottoirs : Chaque Soiséen est responsable de l’état de propreté du trottoir qui borde sa propriété.
  16. Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de propreté de la propriété.
  17. Un matériau aux multiples propriétés – résistance, ultrapropreté – et qui s’adaptent aux dimensions de vos projets.
  18. Il n’a cependant pas les propriétés ou la propreté du biométhane naturel.

English notes: In my defense, a big part of my problem, I would guess, comes from the fact that English has the word propriety. The Merriam-Webster web site gives these synonyms for it: decencydecorumform. Examples:

  1. Zipf, I’m not sure about the propriety of that example about the cleanliness of Mr. Walder’s tongue.
  2. President Obama was the very *model* of propriety. Never once did he say or do anything to make America ashamed of him. (Source: Twitter)
  3. Even inside the nation’s prominent law firms preparing to help President Trump wage a legal war challenging the results of the election, concerns are intensifying about the propriety and wisdom of working for Trump, the New York Times reports. (Source: a tweet from the San Francisco Chronicle)

Pictures worth a thousand definitions

Google Images is not the *best* thing to happen to language learning, but it’s pretty fucking good sometimes.

Google Images is not the best thing to happen to language learning, but it’s pretty fucking good sometimes. Case in point: today I wanted to know what joufflu means in French. I might forget the definition I read, but I won’t soon forget the pictures that Google Images gave me when I searched for it:

Le Président, joufflu aux pommettes rosées, l’air austère, me regarde dans les yeux sans laisser paraître aucun sentiment.

Henri Charrière, Papillon

I wanted to check my guess that a cuistot is closer to a short-order cook than to a chef–Google Images pretty much confirmed it:

Picture source: unevieencuisine.com/actualites/cuisinier-vs-cuistot.html

You’re confused because WordReference says that a flingue is a gun, but the Frenchies around you keep using it to refer to pistols? Google Images will straighten you out–turns out WordReference doesn’t quite have it right this time:

Is, as they say, a picture worth a thousand words? As a scientist, I’m always skeptical of exact numbers, but it’s certainly worth a lot of definitions…

English notes

Joufflu is a noun–there’s also a feminine form, joufflue. According to my Quillet (a damn nice dictionary, by the way), it’s a person qui a les joues pleines. As far as I know, there is no equivalent noun in English. We would use the adjective chubby-cheeked if we didn’t mean anything bad by it, and jowly, from the noun jowl, if we did.

Now, I know what you’re about to ask: does the noun jowl come from joue (French for “cheek”)? I mean, we stole, like, 80% of our vocabulary from French (the percentage varies depending on whether you’re talking about the contents of a good dictionary or the (relatively) common vocabulary of everyday life–we rarely escape from Zipf’s Law), so why not this word, looking as much like joue as it does?

Merriam-Webster says otherwise. It’s helpful here to know that the noun jowl has multiple, but related, meanings. Here’s the most common one:

usually slack flesh (such as a dewlap, wattle, or the pendulous part of a double chin) associated with the cheeks, lower jaw, or throat

Merriam-Webster entry 1

Note that it includes the cheeks–that’s why it comes to mind for me in this context–but, other parts, too. Most pertinent to the current question: the throat. For this specific meaning, Merriam-Webster postulates the following etymology:

alteration of Middle English cholle, probably from Old English ceole throat

Merriam-Webster, again

Where it gets surprising to me is that the second entry has a different etymology for a related, but different, sense. Here’s entry 2 for jowl:

1a: CHEEK sense 1

b: the cheek meat of a hog

2a: JAW especiallyMANDIBLE

b: one of the lateral halves of the mandible

more Merriam-Webster

…and its etymology as per Merriam-Webster:

alteration of Middle English chavel, from Old English ceafl; akin to Middle High German kivel jaw, Avestan zafar- mouth

…and yet again, Merriam-Webster

Two distinct etymologies for two pretty clearly related senses of the same word? Well: we are occasionally visited here by an actual lexicologist, and a good one (with whom I had the pleasure of having a nice cup of coffee a couple weeks ago, but that’s another story). See the comments below for his response (I hope)!

Who comes to mind for me when I think of the word “jowly:” Richard Nixon. He used to be thought of as the worst American president EVER–his much-later successor Trump has certainly restored Nixon’s reputation… Picture source: https://www.usnews.com/news/special-reports/the-worst-presidents/articles/2014/12/17/worst-presidents-richard-nixon-1969-1974

…and one more thing, and I’ll shut up. Here’s a recording of Henri Charrière, author of the quote that I gave you above for the word joufflu. Charming Ardèche accent (I think it’s ardèchois–Phil d’Ange?):

How to get a master’s degree in computational linguistics

How do you get a master’s degree in computational linguistics? The first step is to apply to a master’s degree program–and here is one! “Natural language processing” is a field that overlaps a lot with computational linguistics, and many people use the terms interchangeably. (See this post from the Zipf’s Law blog for the subtleties of the differences between the two.) From the details of this call for applications, particularly the mention of internships at local tech companies, one would guess that this program is more oriented towards building software than towards exploring theory, which in my view of the world is the basic difference between the two. You’ll find some notes on the language of higher education in the English Notes at the end of the post. Bon week-end!

CALL for APPLICATIONS: 1 year M.S. Program in Natural Language Processing at UCSC Silicon Valley

Natural language processing (NLP) is a rapidly growing field with applications in many of the technologies we use every day, from virtual assistants and smart speakers to autocorrect. UCSC has created a unique Master’s program in NLP to provide students with the skills and in-depth knowledge of NLP algorithms, technologies, and applications that are in high demand in both industry and academia. Our program goes beyond the classroom by supplying students with industry-relevant projects for the kind of real-world experience that is essential for a successful career in NLP.

Please see the attached brochure and  https://grad.soe.ucsc.edu/nlp/ for details. 

Program Highlights:

– 1-year program, including a 3-quarter capstone project

– Core courses covering all aspects of NLP

– Instruction and capstone project collaborations with experts from industry giants like IBM, Microsoft, Amazon and Bloomberg

– State of the art facilities in the heart of Silicon Valley

We are seeking a diverse pool of applicants to this program.

Some partial fellowships may be available.

Applications are now open. See https://grad.soe.ucsc.edu/nlp/ for instructions. 

Application deadline is February 3, 2021 for Fall 2021 admission.

Questions can be directed to nlp@ucsc.edu

English notes

core course: A class that must be taken by all students in a program of study. In my graduate program, that means (a) an intensive one-semester course that is roughly equivalent to getting a master’s degree in biology–yes, it is fucking hard–and a two-semester course that covers the major fields of computational biology. In a typical linguistics program, the core courses include syntax, phonetics, and phonology.

capstone project: A research or software project that is meant to have you use all of the skills that you learned during your studies. (Capstone is another word for keystone–most literally, the stone in an arch that holds everything else together by being the focal point for the forces in the arc. See the picture at the top of the post.) In a doctoral program, that would be your dissertation; in a master’s degree program, it typically takes the place of writing a thesis.

Questions can be directed to… This is a pretty formal way of telling you who to send your questions to. (If you prefer: to whom to send your questions.) I think “to direct to” is maybe roughly equivalent to s’adresser à in French–a kind native speaker will tell me if I’m wrong about that.

Picture credit, directly from Wikimedia: This image comes from the Lexikon der gesamten Technik (dictionary of technology) from 1904 by Otto Lueger.

I had a dream: Subjunctives in English and elsewhere

I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all.

So, I laid in bed staring at the ceiling for what felt like an interminable amount of time.  Finally got up and looked at the clock: 4 AM. Before I woke up, I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all.

Before I woke up, I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all: English is my native language, but I’m not sure that what I just said makes sense.  It seems hopelessly unclear.  Is it the case that “in fact” I wasn’t sleeping well in the dream, or that “in fact” I wasn’t sleeping well in real life?  In fact, I wasn’t sleeping well in real life–I was just dreaming that I was.  I don’t know of a way to disambiguate that in English.  What is called for here: a language with a robust past tense of the subjunctive.

The subjunctive mood is the term that is usually given for grammatical structures that express things that are in the realm of wishes, desires, opinions, and possibilities, as opposed to things that are facts.  It just barely exists in English, and as far as I know, in English it is always optional.  To the best of my knowledge, the subjunctive only exists for the verb to be.  Here’s what it looks like, in typical American English and in the Pacific Northwest dialect.  This is a way that you can give someone advice:

  • Typical American: If I was you, I wouldn’t do that.
  • Pacific Northwest: If I were you, I wouldn’t do that.

The difference: in typical American English, you would use the past tense form for the first person singular: was.  In the Pacific Northwest, you use were.  We use was for the past tense, of course–it’s only in the subjunctive that you see this weird use of the were form.  You use it for other persons, too, in the subjunctive:

  • Typical American: If he was smarter, he wouldn’t have done that.
  • Pacific Northwest: If he were smarter, he wouldn’t have done that.

Well: English does not have a robust past subjunctive at all.  Some languages do, though.  How might I talk about my dream in one of them?  Let’s consider some options.

Modern colloquial French does not have a robust past subjunctive at all.  Literary French does, though–a leftover from earlier forms of the language, and what we would be looking at here is an ongoing action, so it would be the subjunctive imperfect that we’d be using.  (I think–again, I’m not a native speaker.)  Here’s an attempt at both of them, neither of which I speak natively, or even well:

Modern colloquial French: Je rêvais que je dormais bien, tandis que de fait je ne dormais point bien.

Literary French: Je rêvais que je dormisse bien, tandis que de fait je ne dormait point bien.

In contrast with modern colloquial French, modern colloquial Spanish does, in fact, have a robust past subjunctive.  “Robust” in the sense that people do actually use it.  Let’s try that:

Soñaba que durmiera bien, aunque de hecho no dormía nada de bien.

…aaaaaand, with that I see that in Literary French and in modern colloquial Spanish, you can express the case where in real life I wasn’t sleeping well at all, but I don’t see a good way in either language to convey the situation where it’s in the dream that I wasn’t sleeping well.  Have I fucked up all four languages (English, modern colloquial French, literary French, and modern colloquial Spanish) here?  Forgive me, ’cause it’s not even 5 AM, and I didn’t sleep well last night.

Scroll down past the video of the somewhat cute song L’Imparfait du subjonctif, “The Imperfect Subjunctive” (Pourtant je le pus et vous pûtes, hee hee hee) if you want to read the English notes.  Otherwise: go back to bed.

English notes

To disambiguate: To differentiate between two possible senses (meanings) of something (“of an utterance,” as a linguist would put it).  In computational linguistics, it usually means to find the intended sense.

  • In the French sentence L’étagère plie sous les livres (‘The shelf is bending under [the weight of] the books’), it is necessary to disambiguate the sense of livres (which can mean ‘books’ or ‘pounds’ and is masculine in the former sense, feminine in the latter) to properly tag it as a masculine noun. (Ide, Nancy, and Jean Véronis. “Introduction to the special issue on word sense disambiguation: the state of the art.” Computational Linguistics 24.1 (1998): 1-40.)
  • Lapata and Brew (1999) and others have shown that the different syntactic subcategorization frames of a verb such as serve can be used to help disambiguate a particular instance of the word. (Gildea, Daniel, and Daniel Jurafsky. “Automatic labeling of semantic roles.” Computational Linguistics 28.3 (2002): 245-288.)
  • When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? (Bollegala, Danushka, Yutaka Matsuo, and Mitsuru Ishizuka. “Extracting key phrases to disambiguate personal names on the web.” International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin, Heidelberg, 2006.)

For example: I giggled about the lyrics Pourtant je le pus et vous pûtes because when spoken, it is ambiguous: it could mean either however, I could and you could (the intended sense) or however, I stink of it and you whore.  In the latter sense–which, I will note, makes no sense, and we will return to that fact momentarily–it would be written pourtant je le pue et vous pute.  So, it’s not ambiguous in writing, but it is à l’oral. 

Now: almost everything that you will say, hear, write, or read today will be ambiguous in some way.  But, humans are so good at disambiguating that we notice that ambiguity only rarely.  How do we do it?  It’s mostly mysterious, but our behavior is consistent with the notion that we calculate the set of possible meanings and select the one which is most probable.  That’s a very different thing from our normal way of thinking consciously about this, in which I might say that “I stink of it and you whore” makes no sense.  “Makes no sense” implies that there is a binary distinction–either something “makes sense,” or it doesn’t.  When you talk in terms of probabilities, then you are thinking of meanings as something that can be more or less, which is very different from being, or not.

How do computer programs do this?  Computational linguists build systems that work more or less the way that we think humans work: determine the set of possible meanings, calculate a probability for each one, and select the most-probable of the set.  What happens if there’s a tie? Well…read this paper by Antske Fokkens.

Screen Shot 2020-07-09 at 5.33.57 AM

To and fro the hanging men go

Magpies and crows have dug our eyes out // And yanked out our beards and eyebrows

I’m not typically a fan of pie charts, but this one is…special… Scroll down for some notes on what to and fro means, as well as musings about potential French equivalents.

Poetically, my latest obsession is François Villon.  Having lived in the 1300s, the details of his life are not super-clear, beyond the facts that he was semi-adopted by an influential clergyman, then well-educated, in the process doing a lot of drinking, fighting, whoring,  some theft and a bit of murder.  A couple of pardons from the gallows let him live long enough to go into exile a couple times, in the process of which he disappears from the historical record entirely at the age of 31.  In the meantime, he wrote some truly amazing poetry. If you’re anglophone, you most likely know one line from his poetry, although perhaps nothing else:

…but where are the snows of yesteryear?

On the other hand, if you’re French and you only know of one thing by him, it’s probably La ballade des pendus, “The Ballad of the Hanging Men” (my translation, sorry).

La pluie nous a débués et lavés
Et le soleil désséchés et noircis
Pies, corbeaux, nous ont les yeux cavés
Et arrachés la barbe et les sourcils
Jamais nul temps nous sommes assis
Puis ça, puis là, comme le vent varie
À son plaisir sans cesser nous charie
Plus bécquetés d’oiseaux que dés à coudre
Ne soyez donc de notre confrérie
Mais priez Dieu que tous nous veuille absoudre.

Where this becomes relevant is puis ça, puis là, comme le vent varie.  Here’s my attempt at a translation:

The rain has — and washed us
And the sun dried us out and blackened us
Magpies and crows have dug our eyes out
And yanked out our beards and eyebrows
We can never, ever sit down
To and fro, as the wind varies
Carrying us around as it likes, without end
More pecked-out by birds than thimbles
So, don’t be of our fellowship
But pray to God that he absolve all of us

WordReference.com translates to and fro as d’avant en arrière, which is OK in a literal sense, but doesn’t capture the feeling of it at all.  Then again, I can’t swear that it’s a great translation for puis ça, puis là, either.  Here and there could work (ça et là); hither and yon works, but it’s somewhat humorous, which doesn’t fit here at all.  The mysteries of translation…

I’ll leave you with my favorite reading of La ballade des pendus. It’s by one Gérald Robert, who appears to be a voice actor by profession, and/but does one fuck of a good Ballade.   Thanks for the great pie chart, LJJ, and for telling me about Villon, Phil d’Ange, and if someone can tell me what débué means, I would be very appreciative!

Coronavirus binge-watching: Into The Night

“Lapider”: to stone. Such a beautiful word for such a horrible way to kill someone.

As I write this, most of the US has been under confinement for going on two months. For me, it has been two months, ’cause I spent the week before everything went to shit isolating myself voluntarily–I was coughing like crazy, with what turned out to be whooping cough–and I didn’t want to get stoned to death on the Washington DC Metro.  (Lapider–to stone. Such a beautiful word for such a horrible way to kill someone.)

So, the other night I’m watching a new apocalyptic series on Netflix.  The crisis is not realistic, the most unrealistic of the characters is especially irritating, and…well, in general, it’s just an irritating show.  I put down my iPad in frustration, step out on the porch, and light a cigarette.

I light my cigarette, and I’m thinking about what’s going on in the real apocalypse: people dying. People’s jobs disappearing. And all of it far worse than it has to be, because the Liar-In-Chief is characterologically incapable of seeing that the way for him to handle this is not by incessantly lying through his fucking teeth, but by telling our country the truth. By putting federal dollars into testing, not by claiming that there are plenty of tests available for everyone, which is manifestly false–all while having himself and his suppôts tested daily, while front-line medical personnel go without.  Asshole.

I light my cigarette, and I’m thinking all of that, and I realize: escaping for a little while into the space of an unrealistic apocalypse would feel far better than thinking about the real one…and back into the night I go.

The unrealistic apocalyptic Netflix show to which I am now completely addicted is called Into The Night (Dans la nuit in French).  It’s in French, and in a particularly interesting French, because many of the characters are not natively francophone, so they have accents, and that fucks me up totally.  For my fellow amerloques, here’s a bit of the vocabulary that I had to look up in the first episode.

The passport control guy in the Brussels airport recognizes one of the main characters, sees that she’s flying to Moscow, and asks her:

  • Tu vas mixer ? 
    • “You spinning there?” (from the English-language subtitles, ’cause I couldn’t find mixer in the dictionary)
    • “You DJing there?” (from the British English soundtrack, ’cause see above, plus there’s no American English soundtrack)
  • Non, c’est juste une apparition.
    • No, it’s just an appearance.  (subtitles)
    • No, just publicity. (British English soundtrack, and by the way, non-Americans never believe me when I say that Americans don’t necessarily understand spoken British English, but it’s nonetheless true)

One of the characters is buying a last-minute plane ticket, and the clerk says to him:

  • Le prix s’élève à 4.235 euros.
    • The price comes to 4,235 euros.
  • Je prends.
    • I’ll take it.

s’élever à: to come to, to amount to. You use these expressions in English primarily when a price has multiple components.  So, if you buy a hamburger, and a hamburger costs $5.00, then the kid at the cash register might say: ok, that’s $5.00.  But, if you add cheese at $1.00, a slice of tomato at $0.50, and pickles at $0.50 (I have no clue what the actual prices are–who orders a hamburger at a place like that?? Not that I haven’t worked in a couple of ’em), then the clerk might say: ok, that comes to $7.00.  When do you use s’élever à in French? I have no idea–Phil d’Ange?

Here one of the characters–a Flemish dude with heavily accented French, so I don’t know how correct this is–sees people boarding the plane before him, and says the following.  What I didn’t know the meaning of was ça, alors !

  • Oh, on peut monter avant les premières classes ? Ça, alors !
    • I didn’t think anyone got to board before first class. (subtitles)
    • Looks like some people are better than first class.  You know?  Huh? Huh? (British English soundtrack)

WordReference gives a number of meanings for it, all of which are expressions of surprise.  Of them, the best translation for this case is probably well, I never! …which would typically have some connotations of a disagreeable surprise. Like, someone does something totally rude to you, or tells you a story about something shitty that someone did to them–Trump’s replacement for his original Attorney General just had the charges dropped against a guy who had already pled guilty twice of lying to the FBI about his interactions with the Russians during the transition. –Well, I never!  Of course, that conversation implies that an American exists who can still be shocked by Trump’s betrayals of America…

…and I put out my cigarette, and back to Episode 5 I go.

Conflict of interest statement: I don’t have any conflicts of interest.  I pay for my monthly Netflix subscription just like everybody else, and the tobacco industry sure as hell isn’t giving me any freebies.

Fiche le camp, Jack: English idiomatic expressions with “to hit”

One of the most delightful books I have ever read in French is named Les Mots et la chose–“Words and The Thing.”  “The thing” is a euphemism for “sex.”  The conceit of the book is that an actress who earns her keep by dubbing pornographic movies has grown weary of the limited vocabulary that her job calls for, so she writes to a retired linguist who specialized in words for la chose to ask for suggestions.  He comes through in spades, with separate chapters for all of the relevant body parts, and of course for l’acte itself.  My favorite: Le détroit des Dardanelles,  the Strait of the Dardanelles, for that part of your body where poo comes out and where, between friends, other things might occasionally go in.

I keep seeing all of these articles in the paper about how to fight coronavirus-quarantine-related boredom.  I don’t get it–I haven’t been this busy in ages.  Telecommuting; reminding my father to eat, to take his medicine, and to let me do his laundry; making masked food runs to the grocery store; eating half of a chocolate babka in a single day (damn it, Zipf); sitting on the front porch smoking cigarettes and petting the dog–I barely have time to learn my 10 words per day of French vocabulary.

Of course, none of that has stopped me from spending inordinate amounts of time looking up French-language covers of classic American songs.  For example, Fiche le camp, Jack is a cover of Hit the Road, Jack, a favorite from before my childhood (and hence, a long fucking time ago).  A cover differs from a dubbed version in that where dubbing involves an original video version whose audio track is replaced, a cover is a de novo production.  So, if there is a video involved, too, then it will be shot anew for the new version.

So, the above-mentioned French actress is dubbing movies so that they have a French-language soundtrack, while the video below shows a version of Hit the Road, Jack, nicely covered by Richard Anthony and some great back-up singers. I hope that it brings a smile to your quarantine day.  Scroll down for the English notes if you are so inclined–today we will talk about some idioms involving the verb to hit, as well as discuss American Evangelical beliefs about what’s going to happen to us sinners.

English notes: idioms involving the verb to hit

In the following examples, note that hit is an irregular verb: its present tense, past tense, and past participle are all hit.

to hit the books: to study.

I can’t go to the party tonight–I gotta hit the books.

Gotta is colloquial language for to have to.

to hit the road: to leave.

This has been a great party, but it’s time for me to hit the road–I gotta go study for my stupid linguistics exam.

to hit bottom: to reach a/the really terrible part of your life. It is often used in conjunction with alcoholics and drug addicts–the belief is that before you can get dry (alcoholics)/clean (drug addicts), you have to “hit bottom.”

God had left her alone with the sinners, so she would sin.  But, she hit bottom after going on a drunken binge with two men she met at a Catholic-sponsored conference on Poverty in the World of Change.  She woke up naked in a hotel bathtub.

The Forsaken, Book Two of The Apocalypse Trilogy.  This is an amusing series of American Protestant fundamentalist fiction about The Rapture, an event in which non-sinners will be whisked up to Heaven, while the rest of us are left on Earth.  (I think that we get damned to eternal Hell at some point.)  The extract is fascinating to me, in that in three short sentences it evokes so many of the tropes of American Protestant fundamentalism: anti-Catholicism, resistance to social services for the poor, and of course loathing of sex.

to hit the sack: to go to bed.

I’m gonna hit the sack–I’ll study for that stupid linguistics test tomorrow.

to hit the hay: to go to bed.

Well, Jack finally hit bottom. He went to the party, but he hit the road early to go home and hit the books.  But, instead, he hit the hay and didn’t study at all.  So, he flunked the test, which dropped his final grade in the course, which dropped his overall GPA, so he lost his badminton scholarship.  He went to his professor and asked him to raise his grade, but his professor said “Surely my course isn’t the only one in which you earned a lower grade than you needed?  Why not go to one of your other professors, and ask them to raise your grade?”  I guess you gotta hit bottom before you get sufficiently motivated as to get your shit together.

I have changed some details to protect the guilty.  But, yeah–I was the professor.





Prévert and Les mystères de Paris: Best. Vocabulary. Word. Ever.

Normalcy through vocabulary. And poetry.

The fact that covid-19 has 50% of the world’s population under lockdown orders does not change the fact that in the US, it is National Poetry Month.  The French are getting cats to play tic-tac-toe (le morpion in French, which also means [genital] crab, and I cannot stop giggling like a schoolboy about that), Americans are watching Netflix, and the President of the United States is showing himself more and more to be le roi des cons–and Art goes on.

Jacques Prévert’s poem Pater noster has opening lines as good as any in the world of free verse (translations by me, sorry):

Notre Père qui êtes aux cieux

Our Father who art in heaven
Stay there

Et nous nous resterons sur la terre
Qui est quelquefois si jolie

And we’ll stay here on Earth
Which is sometimes so pretty

Avec ses mystères de New York
Et puis ses mystères de Paris

With its mysteries of New York
And then its mysteries of Paris

So, yeah: the cool neighborhood near me is now empty except for the homeless people living under tarps in the sheltered doorways of now-abandoned shops, Macron is urging the French to support health-care workers, and Trump is urging Americans to support airlines; and I am trying to restore some sense of normalcy to my life by learning my usual 10 words of French vocabulary per day.

So, I’m on a French-language furniture web site the other day trying to find a picture of some obscure item of furniture or another that I ran across while reading Colette’s Chéri, when I came across this: the mystères de Paris.  Literally, that means “the mysteries of Paris”–but it means so, so much more…and thus we have the Best. Vocabulary. Word. Ever.

It turns out that there is such a thing as a mystères de Paris–and it is a commode.  Not a commode in the French sense of the word–what’s called in English a dresser–but a commode in the English sense of the word–a bedside chair with a receptacle for pooping.  A bedside toilet, if you will.  It’s not just any kind of commode, though:

  1. It’s a disguised commode.
  2. It is usually made to look like a stack of books.

From the Meubliz.com web site (translations by me, sorry):

Ce siège d’aisance prend la forme d’une pile de livres simulés. La partie supérieure s’ouvre comme un abattant pour laisser apparaître la cuvette. Ce petit meuble repose sur des pieds bas tournés en balustre ou découpés.

Généralement, ce siège de commodité assez original était décoré de belles et luxueuses couleurs.

This commode takes the form of a pile of fake books. The upper part opens as a lid to access the bowl.  This small piece of furniture sits on feet that have been [not sure what those carpentry terms mean].

Typically, this rather unusual commode was decorated with pretty, luxurious colors.

Mystères de Paris bedside toilet. Source: Meubliz.com

If you’ve followed this site, you know that Prévert’s poetry is great for understanding what people mean when they talk about “the impossibility of translation.” This is a great example–I just can’t even imagine a way to render mystères de Paris into English, and forget about maintaining that rhyme:

….sur la terre
Qui est quelquefois si jolie

…on Earth
Which is sometimes so pretty

Avec ses mystères de New York
Et puis ses mystères de Paris

With its mysteries of New York
And then its mysteries of Paris

(Yes, jolie and Paris rhyme in French.)

A Dutch-made mystères de Paris bedside toilet from 1850. Source: Meubliz.com

(Wait, I forgot–more tic-tac-toe-playing cats…)


So…let’s all stay in, stay healthy, thank the people working in the grocery stores, thank the people working in the gas stations, thank the doctors, thank the nurses, thank the respiratory therapists–and ignore les maîtres de ce monde, les maîtres avec leurs prêtres, leurs traîtres et leurs reîtres–a line from later in the poem that is more than evocative of the coronavirus-era Trump.  And let’s take care of each other.

See this post for the full poem, as well as for a discussion of the line that I just mentioned.  You can exercise your oral comprehension skills with an English-language video, complete with subtitles, on how to make your own face mask here.

Mystères de Paris bedside toilet. Source: Meubliz.com

How to smile your way through the Parisian transit strike: Citymapper

The Internet has given us Trump, revenge porn, and catfishing; in recompense, it has also given us free on-line versions of a number of historical French dictionaries, and a way to weather public transportation strikes with a smile.

Executive summary: there’s an app called Citymapper available on the iPhone and Android that does an excellent job of staying on top of metro, train, and bus line operating hours.  Want to know about (1) linguistic trivia associated with strikes in French, and (2) public attitudes about the current action sociale?  Read on.

One of the things that I find very striking about Paris is that although the building located at any particular spot might change, the function carried out there can remain constant over centuries.  Millennia, even.  For example: the spot where Notre Dame de Paris is located has been a place of worship since the Druids were there.  The Palais de justice was the residence of the Roman administrator, and then the palace of the early French kings, before becoming the center of the French court system.  And, most relevant to today’s ravings: the location of the Parisian City Hall has been where the city was run out of for as long as Paris has been run by its bourgeois.

City Hall–in French, L’Hôtel de ville–is located on the Right Bank of Paris.  Although the Right Bank is very much the seat of Parisian power today, it started as mostly swampland.  (That fact figures into how the city was taken by the Romans–a story for another time.)  The expansion of Paris from the Left Bank to the Right in the early Middle Ages started with the area where the Hôtel de Ville is located today.  It was an early area of business, and the riverbank–la grève–in front of its current location was a gathering spot for laborers looking for work.  As the story goes (and I’m sorry that I can’t give you a citation for this, but I think that I ran across it in Metronome), over time the word for the place where laborers gathered became associated with strikes by laborers.

There’s some documentary evidence for this association.  Let’s work our way backward.  The Internet has given us Trump, revenge porn, and catfishing; in recompense, it has also given us free on-line versions of a number of historical French dictionaries.  Les-voilà.  Starting with the 8th edition of the Dictionnaire de l’Académie française, published 1932-1935, we have the following.  The first sentence is A level, flat surface covered with gravel or sand, going along the edge of a sea or a large river:

Screen Shot 2020-01-15 at 08.38.45
Screen shot from TheFreeDictionary.com. In the second paragraph (which I did not translate), they’re not shitting about the executions.  Notable ones that took place there that of include Jacques de Molay, the last grandmaster of the Knights Templar, who was burnt at the stake there on March 18th, 1314; and that of Robert-François Damiens, who was drawn and quartered there on March 28th, 1757.  (The event was extensively documented.  If you have a copy of Michel Foucault’s Discipline and Punish: The Birth of the Prison on your bookshelf, you’ll find an accurate description of the event in the first chapter.  The savagery was difficult to imagine–one of the professional executioners went into retirement after participating.)

Continuing back in time to the 18th century, we have this from Jean-François Féraud’s Dictionnaire critique de la langue française, published 1787-1788.  It contains the definition level and sandy beach:

Screen Shot 2020-01-15 at 09.21.13
Linguists will notice the prescriptiveness of the entry, which includes the observation that the verbal form of the word, which means “to harm,” is “not often used outside of the Palace, and in ordinary language is not good style,” as well as the facts that (1) Richelet found it a bit old (Phil dAnge, who was Richelet?), (2) Trév says that it was becoming a bit outdated (Phil dAnge: Trév.??), and (3) the Academy includes it without comment. Do note that he is talking about a verb, not about the “(river) bank” sense of grève.) Source: screen shot from https://fr.thefreedictionary.com/gr%c3%a8ve

Finally, going back to Jean Nicot’s Thresor de la langue française, published in 1606, we have the following, which includes words that I believe to mean “gravel, sand” (gravier and arena):

Screen Shot 2020-01-15 at 08.36.52
Nicot’s entry includes another meaning of the noun, which I think is a part of a suit of armor that goes on the legs. Source: https://fr.thefreedictionary.com/gr%c3%a8ve

If you haven’t been reading the news from France lately: public transport workers in and around Paris have been on strike for the past six weeks.  A public transport strike in these parts does not mean a complete cessation, but rather a diminution, of service.  A given metro line might be operating at half capacity, or maybe only 1 out of 3 trains on the line are running; those services might be only available during the morning and evening rush hours (en heures de pointe), or just in the evening.  Trains are packed to bursting, electric scooter rentals are maxed out; Uber is running, but the automotive traffic is so heavy that a 30-minute ride can easily take an hour.  As I write this in mid-January of 2020, the exceptionally convenient low-cost mobility that is such a delight of normal life in the City of Light is only a fond memory.

Are Parisians frustrated by the disruptions caused by the strike?  Of course.  Are they complaining about it a lot?  Not really.  Here are typical comments from my friends about the motivation for the strikes–a proposed reorganization of the admittedly convoluted French retirement system:

  1. The reforms won’t hurt me, personally–but, I’m worried for my child.
  2. The transportation workers are striking for all of us.
  3. The strike has to screw up Paris, or it won’t have any effect.

The comments reflect some underlying widespread French attitudes about their famous work stoppages: (1) Everybody has to earn a living, and (2) Your strike may be screwing up my life today, but my strike will be screwing up yours tomorrow.  So: in general, people are pretty tolerant of this kind of thing.

…and with that, I’m off to check Citymapper to find the best way to get to the Musée de la paléontology et de l’anatomie comparéeone of the three best museums in the world, in my humble but reasonably informed opinion.

The picture of an écartèlement (“drawing and quartering” in English) at the top of this page is of a bas relief from northeastern Spain. I found it at https://fr.vikidia.org/wiki/%C3%89cart%C3%A8lement.

Conflict of interest statement: I don’t have any.  Citymapper does not pay me, nor do they offer me free services.

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues