Guillaume Apollinaire: Exercice

2014-07-04 19.10.09
Street sign in the Saint-Germain-des-Prés neighborhood.

I grow weary of technical terminology.  I love World War I poetry in English, and thought that I would look at some in French.  Guillaume Apollinaire was a (very) famous French poet who fought in the artillery and in the trenches in WWI.  He suffered a head wound in 1916, never really recovered from it, and in his weakened condition, died in the influenza epidemic of 1918.  Here is one of his poems, Exercice.


Vers un village d l’arrière
S’en allaient quatre bombardiers
Ils étaient couvert de poussière
Depuis la tête jusqu’aux pieds

Ils regardaient la vaste plaine
En parlant entre eux du passé
Et ne se retournaient qu’à peine
Quand un obus avait toussé

Tous quatre de la classe seize
Parlaient d’antan non d’avenir
Ainsi se prolongeait l’ascèse
Qui les exerçait à mourir

•    la poussière: dust.
•    la plaine: plain.
•    se retourner: (tourner la tête) turn around, do a double take; (changer de sens, de position) turn over, toss and turn; (se mettre à l’envers) turn over, overturn
•    la peine: punishment, sorrow, trouble—but, that’s not what it means here—see the next entry.
•    à peine: scarcely, hardly
•    un obus: shell (artillery).
•    tousser: to cough
•    d’antan: of yesteryear, of long ago
•    se prolonger: continue; perpetuate itself; persist; linger; go on; be continued; be extended
•    ascèse: This word is a tough one.  It’s not in any of my French-English dictionaries.  In Anne Greet’s translation (see below), it’s rendered as “ascesis.”  I found it in a monolingual (French-French) dictionary; the definition seemed to be something like asceticism.
•    exercer: to train, exercise, practice

What should we make of the past imperfect tense that is used throughout the poem?
Greet’s notes suggest that it produces a detachment between the poet and the four men: “The poet…is not part of the graphic little scene he is painting.  The verbs, in third person and imperfect tense, indicate that he is an omniscient observer.  This role produces a…fine balance in the poem between compassion and detachment.”

Towards a village in the rear
Marched four bombardiers
And they were covered with dirt
From head to foot

They stared at the vast plain
As they talked about the past
And they barely looked around
When a shell made a coughing sound

All four of class sixteen
Spoke of the past not future time
Thus the ascesis dragged on
That practiced them in dying

Translated by Anne Hyde Greet

Reviewing in French

At conferences in my professional field, the number two topic of conversation over beers is how poorly funded scientific research is in the United States right now.  The number one topic of conversation is complaining about our reviewing loads.  Reviewing papers is one of the constant burdens in academia, but you don’t want to say no to a request to review, as being invited to review papers is one of the things that marks your transition into a fully-functioning professional.  So, we all have constant requests to review conference papers, journal articles, grants, book chapters, book proposals, books—on and on.

I have a paper to review for a French workshop, which led to me getting this email:

Voici un article à relire pour la JE ATALA Ethique et TAL.  Vous avez jusqu’au 20 octobre pour m’envoyer votre retour (accepté ou non, avec un commentaire).  Les articles ne sont pas anonymes, mais votre relecture le sera (vous êtes deux relecteurs par papiers).

First of all, Zipf’s Law brings us some vocabulary issues, as usual:

  • relire: normally, this is to reread, read over, or to proofread.  Here it is “review,” in the sense in which we use that word in academia—to read and provide an assessment.
  • le retour: basically, return—but, in this case, it’s an event that is the argument of another event—pour m’envoyer votre retour.
  • la relecture: normally, this is a rereading, proofreading/editing/revision, or a reinterpretation.  In this case, it’s a review in the academic sense—your assessment of the submission.
  • le relecteur: reviewer.

One of the fun things about today’s words is that they’re phonetically quite interesting. Note the high front rounded vowel in relecture and the mid front rounded vowel in relecteur, both of them before r—these are words that will basically be impossible for an American (like, say, me) to pronounce.  Another fun point is that the e after the initial r in all of these words can be deleted in the spoken form (click through to for the transcriptions), leading to an initial rl cluster—if you think that the French r is tough for Americans to pronounce, try putting it in a consonant cluster!

Next, there are some grammatical issues:

  • What is the function of the le in votre relecture le sera?
  • Note the use of par to mean “per.”  In a previous post, we looked at the use of par to mean “by”—here we have another sense.

You might be wondering how you could possibly review an article that’s not written in your native language.  In fact, scientific papers are routinely reviewed by non-native speakers, at least in English.  It’s the international language of science today, and many reviewers are not native speakers.  Without these non-native-speaker reviewers, the system couldn’t possibly handle the strain of the amount of reviewing that needs to be done.

A flaw named “Poodle”

One of the things that tickles me about written French is the accents.  I love writing them.  So, when I got an email this morning about a computer security flaw, the beginning actually made me smile:

Screenshot 2014-10-15 06.49.19
Screenshot of web page giving information about the Poodle security flaw

Hier a été révélée une faille de sécurité…

(Yesterday a security flaw was revealed…)  That’s a high density of accents aigus!  Let’s see what Zipf’s Law brings us in this email.  First, there’s the subject line:

Une faille nommée poodle

  • la faille: a flaw or loophole; in geology, a fault or rift.
  • nommé: named, called.

Hier a été révélée une faille de sécurité de SSL v3 qui affecte principalement les postes clients (bref, votre ordi) et pourrait permettre à un attaquant (au hasard entre votre ordi et votre banque) de vous forcer à utiliser ce protocole SSLv3 pour récupérer quelques informations intéressantes sensées être cryptées (mot de passe, code de carte bleue).

  • révéler: to reveal
  • se révéler: to turn out, prove to be, reveal itself; to come to fame
  • le poste client: this one has engendered a lot of chat on translation fora (forums?).  The consensus is that depending on context, it can be either a workstation or a client (in the computing sense of that term).
  • un attaquant: attacker, assailant
  • au hasard: random, aimlessly
  • récupérer: get back, retrieve, salvage, recover
  • une information: in this case: detail, data
  • sensé: this one turns out not to be straightforward.  I wrote to a native speaker about it, who had this to say: “Well, “sensé” (sensible, meaningful) is a word which is often confused with “censé” (supposed, assumed), the first one being quite common,
    and the second mostly used in specific constructs. Also, “sensé” is
    easily traced back to “sens”, whereas “censé” needs to go back to
    Latin [as in English “census”] or linked to words whose meanings are
    more distant, such as “recenser”, or of a higher register (“censément”).
    Here [the author] should have written “censées”, meaning “supposed to be”.”

Accidental versus necessary

Every language has rules that don’t seem that explainable to a linguist (or anyone else, I presume). One such rule in French is that if you have a plural noun that’s modified by a prenominal adjective (i.e. one of the few French adjectives that comes before the noun), then you use the singular form of the preposition de, not the plural form, which you would otherwise expect:

So, we know that French adjectives generally are postnominal (after the noun)—when do you put them in front of the noun? suggests the acronym BAGS for remembering at least some of the adjectives that go before the noun:

  • Beauty
  • Age
  • Good and bad
  • Size (except grand for humans)

According to, this phenomenon is related to inherent versus non-inherent properties of the noun that is being modified—a distinction similar to necessary versus accidental qualities, which, according to Pustejovsky’s The Generative Lexicon, is a distinction that goes back to Aristotle. Pustejovsky points out that adjectives describing necessary versus accidental qualities behave differently in the progressive aspect in English, with adjectives describing accidental qualities being grammatical in the progressive, while adjectives describing necessary qualities are not grammatical in the progressive—so that you can say (relevant adjectives in bold):

  • The horse is being gentle with her rider.
  • You’re being so angry again!
  • Stop being so impatient.

…but, according to Pustejovsky, you can’t say:

  • * John is being tall today.
  • * Aren’t you being beautiful tonight!
  • * Stop being so intelligent.

(In linguistics, the * before a sentence means that you can’t say it in the language in question. Note that there’s no claim that you can’t say Aren’t you beautiful tonight—the claim is only about the progressive aspect, indicated in these examples by the verb form being.  You are free to argue with Pustejovsky’s claims about whether or not the starred sentences are really ungrammatical.  Note also that there are French adjectives whose meaning changes depending on whether they’re prenominal or postnominal—more on those another time.)

So, why the de in front of plural nouns that have prenominal adjectives?  I have no idea.  The interesting thing to me is not so much the specifics of the rule (use de, not des, in front of a plural prenominal adjective) as what it suggests about the representation that underlies the rule, the qualities that the language has to have in order for a rule to be able to make reference to those qualities: in this case, something like the distinction between inherent versus non-inherent or necessary versus accidental qualities.

The ethics of crowdsourcing for linguistic resource construction in French

Screenshot 2014-10-06 11.34.33One of the major trends in my field today is the use of Amazon Mechanical Turk (AMT) to create linguistic resources, particularly for natural language processing.  Using AMT, tasks that require human intelligence—for example, deciding which synonym of a word is being used in a particular context, or labeling a photograph with the things that it pictures, or deciding whether or not a web page is relevant to a search query—are given to humans in very small increments, usually with the goal of using the humans’ data to train a computer to do the same task.  It is a form of crowdsourcing—using the public to do a (typically large) job in (typically) small amounts, e.g. Wikipedia.

Karën Fort of the Sorbonne and Gilles Adda of LIMSI have researched the ethics of the AMT model for work and for remuneration.  The AMT model turns out to raise many issues, including a number of ethical ones.  Karën and Gilles have worked to develop a charter for ethical use of this and other crowdsourcing platforms.  (Full disclosure: Karën and Gilles and I published an editorial on the use of Amazon Mechanical Turk in our field in the journal Computational Linguistics.)   If you click on the picture, it will take you to a set of slides that Karën prepared for a talk on the subject.  Zipf’s Law strikes in the domain of ethics as much as anywhere else—here are some words that I had to look up to read the slides:

▪    une ombre: shade, shadow.
▪    la zone d’ombre: gray zone.
▪    promouvoir: to promote.
▪    le vaut bien: to be worth it.
▪    la plate-forme: platform.
▪    la myriadisation: crowdsourcing.
▪    délocalisé: outsourced.
▪    la foule: crowd, mob, masses.
▪    le travail parcellisé: microworking.
▪    découpé: cut into pieces.

That’s ten words just to get to slide 10 out of 30, but that’s about all I can handle in a single day—more Zipf’s Law words next time.

Les sous-langages: sublanguages

The lines represent growth in the number of word types as increasing numbers of tokens are observed.  The blue line (BNRC) is unrestricted Bulgarian text.  The red line (epicrises) is Bulgarian clinical documents.  The clinical documents show lexical constraints--for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.
The lines represent growth in the number of word types as increasing numbers of tokens are observed. The blue line (BNRC) is unrestricted Bulgarian text. The red line (epicrises) is Bulgarian clinical documents. The clinical documents show lexical constraints–for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.

I had to look up all of these words today in order to be able to explain just one aspect of my research. One of the things that I work on is the topic of sublanguages (explained below). Looking for material on the subject in French, I came across the doctoral dissertation Sur la notion de sous-langage, by Roland Dachelet. Even in the context of discussing my own research, Zipf’s Law strikes.

  • le sous-langage: sublanguage.
  • le domaine: domain. A sublanguage is a variety of language associated with a specific domain—medicine, biology, weather, sports reporting.
  • spécialisé: specialized. Being related to a specific domain, a sublanguage is specialized.
  • la contrainte: constraint. Sublanguages are generally associated with constraints—constraints on the kinds of subjects and arguments that a verb in the domain can have, for instance; constraints on syntactic structures; constraints on the set of words.
  • le lexique: in this case, the set of words in a text—vocabulary. It has other meanings, too, such as bilingual dictionary. Typically the set of words in a sublanguage is constrained.
  • la morphologie: morphology (how words are put together).
  • une ambiguïté: ambiguity. The fundamental problem of language processing—if most things in language didn’t have multiple possible interpretations, computers could just look everything up.
  • la variabilité: variability. The other major problem of language processing—there are so many ways to express the same thing.
  • la caractérisation: characterization. The current challenge in sublanguages is to characterize them automatically—that is, with a computer, as opposed to a human doing it manually.
  • la syntaxe: syntax. This is how phrases are structured.
  • syntaxique: syntactic.
  • une analyse syntaxique: syntactic analysis.
  • la structure: structure. Syntax is mostly about structure.
  • la sémantique: semantics.
  • sous-jacent: below, underlying, implicit (the sense in which I need it). Important aspects of language, such as syntactic structure, are implicit in the sense that they are not visibly indicated in the stream of language.
Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues