Zipf’s Law, the Poisson Distribution, reflexive verbs, and terrorism in the age of social media

Screenshot 2016-01-25 23.09.23
“Islamic State dramatizes the macabre will and testament of the terrorists of the Paris attacks.” This is the mettre en scène (non-reflexive) form of the expression.  Picture source: screen shot of http://www.bfmtv.com/societe/dans-une-nouvelle-video-daesh-met-en-scene-les-auteurs-des-attentats-de-paris-946083.html.

By now, we know what goes hand-in-hand with Zipf’s Law: the Poisson Distribution.  Zipf’s Law explains why we run into words that we don’t know in a foreign language every stinking day, and the Poisson Distribution shows how even rare events can come in clusters.  Three rock stars die in one month, and the like.  This morning I ran into two occurrences of an expression that I’d never seen before at all.  There was an interesting twist, in that it’s actually two expressions, one with a regular verb, and one with the reflexive form of the same verb.  Reflexive verbs in French can refer to performing an action on oneself–je me mouche “I blow my nose,” je mouche le bébé “I blow the baby’s nose” (no, I didn’t make that up–look here).  In this case, the meaning of moucher is the same–it’s just a question of whose nose is getting blown.  However, non-reflexive and reflexive verbs can also have different meanings, and that’s the case with the expression that had me going to the dictionary before I even had breakfast this morning.

Mettre is one of those common and rather irregular verbs that shows up in a bazillion expressions.  This one has two forms.  The non-reflexive, mettre en scène, means to stage or to dramatize (definitions from WordReference.com).  The reflexive form, se mettre en scène, is to put on a performance.  I saw it on Twitter today: L’atroce vidéo de l’Etat Islamique montre que nous avons changé d’époque.  L’ultraviolence la plus sordide se met en scène façon Hollywood.  “The atrocious Islamic State video shows that we have changed eras.  The most sordid ultraviolence puts on a show Hollywood-style.”

I wish that I had some clever way to wrap up this discussion, but learning yet more vocabulary by way of terrorism just depresses me.  Yuck.  If you’re interested in theorizing about terrorism and the media in general and social media in particular, try this Wikipedia page for starters.  Sigh…

  • mettre en scène: to stage, to dramatize.
  • se mettre en scène: to put on a performance.
  • la mise en scène: staging, directing; dramatization; stageplay, stage direction.

Screenshot 2016-01-25 23.15.07

Doing computational lexical semantics with your web browser: An approach to using data to build semantic representations

Here’s how you can do computational lexical semantics in the comfort of your own home–and how to talk about it in French.

A lot of my work involves something called lexical semantics.  Lexical semantics is the study of how words mean things.  (That means that there’s some interaction with the question of how sentences mean things, since part of the meaning of a sentence comes from the words that it contains, but in lexical semantics, the focus is on the words and how they contribute to and interact with the semantics and the syntax (the phrasal relationships) of a sentence.)  In particular, I do something called computational lexical semantics.  That means that I use large bodies of data as a crucial part of my work, and I evaluate my work in part by trying to use it as the basis of computer problems.  If that doesn’t work, then I figure that what I’ve done needs to be improved.

My advisor is one of the world experts on computational lexical semantics.  (I won’t name her, since I try to keep this blog anonymous.)  As far as I know, she was the first person to demonstrate that large bodies of naturally-occurring data can, in fact, be used to test theories of lexical semantics.  This was important because semantic theories often haven’t really been tested in any way that would count as a “test” in science, and as we’ve seen in other posts, linguistics is the scientific study of language.  She often says that semantics is not a suitable subject of study for linguistics, since it’s so subjective.  I’m never sure whether or not she’s kidding; regardless of whether she is or not, one of my professional ambitions is to take the subjectivity out of computational lexical semantics.

Part of my approach to that has been to try to develop a systematic methodology for developing semantic representations of words.  In particular, I work with verbs, and with nouns that are derived from those verbs–for example, the verb phosphorylate (I specialize in biomedical language) and the related noun phosphorylation, or the verb receive and the related noun receptor.  (You’ll notice that there are different relationships between the verb and the noun in the two examples–phosphorylation is a noun that refers to the action of the verb, while receptor is a noun that refers to the thing that does the receiving.)

One of the bedrocks of my approach is that I try to base my representations of the meanings of words on data that I didn’t come up with myself.  (Note that I didn’t invent this idea, or any of the other aspects of the approach that I describe here—this is just my recipe for putting them all together.)  I mostly work with scientific journal articles.  There are two parts to what I do:

  1. Coming up with the representation of the meaning of the verb (or noun).
  2. Coming up with examples that let me test the representation, both by providing examples of the different effects that I think that the meaning of the verb has on how it behaves on sentences, and also doing a quick check to make sure that I don’t see any examples that argue against my representation of the meaning of the verb.

This is a pretty iterative, complementary process–I typically start out by looking at a bunch of examples of the verb to get a general sense of how it works, then write up a quick representation of the semantics, and then look for examples more systematically to see if my representation works.  Some of the goals that I keep in mind when I’m searching for these examples are:

  1. I want to know whether or not humans can be the agent of this verb.
  2. I want to be sure to get the full range of prepositional complements, as these can mark a variety of semantic relations.
  3. I want to get a variety of semantic classes as the subject and the object of the verb.
  4. If there is a deverbal nominalization, I want to get that, too.

Knowing whether or not humans can be the agent of the verb is important to me for a number of reasons.  People often question whether or not humans can perform the actions of particular verbs in the biomedical domain.  For example, Wikipedia describes the action of the verb phosphorylate as ” the addition of a phosphoryl group (PO32−) to a molecule. ”  That doesn’t sound like something that a human could do, right?  But, you can find sentences like this:

In order to determine the number of phosphorylated sites in human cardiac MyBP-C samples, we phosphorylated the recombinant MyBP-C fragment, C0-C2 (1-453) with PKA using (gamma32)P-ATP up to 3.5 mol Pi/mol C0-C2.

–Source: http://www.ncbi.nlm.nih.gov/pubmed/18573260

I’m trying to represent the semantics of the language, not the semantics of phosphorylation, so I need to take into account all of the data about the language, and that includes this kind of counter-intuitive use of the verb.  Why do we care about humans so much, though?  It’s because humans are the prototypical example of things that act with volition, or as a result of their will–what we call agents in the English-language terminology of linguistics–and agents get represented in a special way in lexical semantics.  So, we need to know if there can be a human agent for these biomedical verbs so that we can know if they can have agents at all, essentially.

Getting examples of the full range of prepositional complements (e.g. phosphorylate at, phosphorylate to) is important to me because different prepositions sometimes mark different aspects of the semantics of the verb.  For example, when we investigate phosphorylate at, we see that the semantics of phosphorylation involve a specific location on a molecule, and when we investigate phosphorylate to, we see that the semantics of phosphorylation involve something becoming something else–the to marks not a location at which the molecule ends up, but what the molecule becomes–like I had it converted to a round-trip ticket, in “normal” English.

Getting examples of a variety of semantic classes as the subject and the object of the verb is important to me for two reasons.  One reason is that I’m doing computational lexical semantics, specifically, which, as you might recall, means that I test my semantic representations by trying to use them as the basis of a computer problem.  I know that it can be important to know what kinds of things are taking part in the action of a verb in order to know how to interpret both the verb itself, and the sentence that it occurs in.  Imagine these situations: the author finished the book, the student finished the book, and the goat finished the book.  In the first, this means that the author completed the writing of the book.  In the second, this means that the student completed the reading of the book.  In the third, this means that the goat finished the eating of the book.  Can there be other interpretations of these sentences?  Of course–authors also read, students also write, and in a work of fiction, you could certainly imagine a goat reading a book.  But, none of these are the intuitively obvious interpretations of those sentences, and the reason for that is the expectations that the different subjects—author, student, goat–lend to our interpretations of the sentences.)  The other reason that I want to get a decent range of the types of semantic classes that can be the subjects and the objects of a verb is that I work with ontologists quite a bit.  I find that their models of the domain often don’t objectively seem to have taken full advantage of what the literature of the domain has to say about how those models would need to look if they’re going to be adequate, and collecting examples of lots of different semantic classes taking part in an action is my stab at being helpful.

So, how does one going about doing this with a minimum of subjectivity and a maximum of data-centeredness?  I follow roughly the following steps, pretty much in this order, allowing for some going back and forth between them as I fine-tune things:

  1. Look at what other people have done.  I didn’t always do this, as I wanted to see how different what I came up with was from what other people had come up with, but by now I have a decent feel for what kinds of differences there are likely to be (they’re related both to the different content matter and to the different writing styles that my work and previous work are based on), and I usually start by looking at the representations in the Unified Verb Index.  (Search for the verb of your choice.)
  2. Look at some random examples of the verb in use to get a general sense for how well the representation in the Unified Verb Index matches up with biomedical data.  I use the Sketch Engine interface to do my search for random examples, but you can use Google, specialized textual search tools, or whatever is easy for you.
  3. Look for examples of human agents.  I usually go to Google for this one, as the data that I have uploaded to Sketch Engine doesn’t have very many humans, in general.  My two tricks:
    1. I use Google’s site: operator to search just within the National Library of Medicine’s web site.  That way I can be almost positive that I’ll get examples of how the word is used in the biomedical domain.
    2. The first thing that I try is a Google exact phrase search with we plus the past tense of the verb.  You mark a phrasal search by putting the exact phrase that you’re looking for in double quotes.  So, my search for we phosphorylated looked like this: site:http://www.ncbi.nlm.nih.gov/pubmed/ "we phosphorylated"
  4. Look for the full range of prepositional complements.  I do this with Sketch Engine’s word sketch function.
  5. Look for a variety of semantic classes as the subject and the object of the verb.  Again, I use Sketch Engine’s word sketch function for this.

Then it’s time to see if the semantic representation actually covers everything that I’ve found using the strategy above.  If it does, then we’ll do a larger-scale project of marking up all of the examples of the verb in some large body of data, followed by trying to write a computer program that can make use of the representations and the examples to learn how to identify the semantics of the verb when shown new examples.

Here is some of the vocabulary that you will need if you’re going to talk about this kind of stuff in French.  Here is some data from the French Wikipedia page about semantics.  This will give us some of the vocabulary of semantics in general–then we’ll move on to lexical semantics.

La sémantique est une branche de la linguistique qui étudie les signifiés, ce dont on parle, ce que l’on veut énoncer. Sa branche symétrique, la syntaxe, concerne pour sa part le signifiant, sa forme, sa langue, sa graphie, sa grammaire, etc ; c’est la forme de l’énoncé.

  • la sémantique: semantics
  • le signifié: the “signified,” the concept or mental representation that is the locus of meaning.  (I should point out that it is unfortunately rare for English-speaking linguists to use this old Saussurean terminology, at least in my corner of linguistics.)
  • énoncer: to formulate, state, or pronounce (definition from Wikipedia.org).
  • la syntaxe: syntax.
  • le signifiant: the “signifier,” the spoken (or, in my field, written) form that corresponds to the signifié or “signified.”  (See above about unfortunate tendencies to not use Saussurean terminology.)
  • la graphie: written form (definition from WordReference.com).
  • un énoncé: in linguistics, this usually corresponds to the technical term “utterance,” but since we’re talking specifically about syntax here, it may be better translated as “wording” (see WordReference.com).

Now let’s move on to some vocabulary that’s more specific to lexical semantics.  We’ll take this material from the book Introduction au TALN et l’ingénierie linguistique, by Isabelle Tellier.

La sémantique lexicale est l’étude du sens des “mots” -ou plutôt des morphèmes- d’une langue. Cette définition est en réalité assez problématique, puisque la notion même de “sens” n’a rien d’évidente. Le problème tient précisément à ce que, pour définir le “sens” d’un mot, on recourt en général à d’autres mots. Pourtant, la consultation d’un dictionnaire d’une langue donnée est de bien peu d’utilité si on n’a pas déjà d’un minimum de connaissance de cette langue. Comment échapper à cette “circularité du sens” ? Nous evisageons dans ce chapitre (et le suivant) diverses tentatives qui peuvent être regroupées en trois familles…

  • la sémantique lexicale: lexical semantics.
  • le morphème: morpheme.
  • avoir rien d’évidente: I don’t know!  Can someone help out with this?
  • tenir à qqch:  to come from, stem from, arise from.  (Note: tenir à has a bazillion other meanings–see WordReference.com for this definition and many others.)
  • recourir à: to resort to, appeal to.  (Definition from WordReference.com.)

Want to learn more about the kind of approach to (computational) lexical semantics that I’m talking about here?  Check out my advisor’s book on the subject–Martha Palmer, Daniel Gildea, and Nianwen “Bert” Xue’s Semantic role labeling.  (I’m not telling you which of these people was my advisor–still anonymous!)

Waiting for the train confuses me way more than it ought to

Waiting for things in France involves confusing vocabulary.

train schedule sign
Sign announcing incoming RER B trains. Amazingly, I found a picture of the sign for my train on line, although you can tell that this is from a different stop by the fact that it has a time for a train headed to my station. Picture source: http://www.francetravelplanner.com/go/paris/trans/air/choose_train.html.

The last time I was in Paris, I tried to figure out the optimum time to leave my apartment in order to minimize my wait for the train to the little town where I work.  This requires complicated (at least for a humanities major like me) record-keeping in which I track the time that I leave the house, the time that the metro comes to the metro station by my apartment, the time that I get to the actual train station, and the time that my actual train shows up.

All of this involves a certain amount of time staring at electronic signs telling me the wait time for the next train, and that’s often where Zipf’s Law enters my day.  There are a few words with very similar appearances, but very different meanings, and I confuse them constantly.  They have so many related nouns, so many reflexive forms, and so many related colloquialisms that I’m going to start with just the verbs.  (Definitions from WordReference.com, with some editing.)

  • attendre: to wait, to wait for; to expect.
  • attenter à: to make an attempt on (to attack).
  • s’attendre à: to expect (definition from Phildange).
  • attenter à: to be a slur on something/one.
  • atteindre: to reach, to get to (a place); to achieve, to meet (a goal); to affect or harm (someone).
  • étendre: to stretch out; to spread out; to open out (definition from Phildange).
  • s’étendre: to lie down; to talk at length; to pervade, etc. (definition from Phildange).
  • éteindre: to extinguish, to put out (a fire, a cigarette); to turn off (a light, a machine).
  • s’eteindre: to die, to go out, to switch (oneself) off.
  • s’éteindre: to pass away.
  • s’entendre: to get along (with each other) (definition from Phildange).

Bottom line: leave the house at 07:55 and I get to work in an hour and a quarter, with very little of a wait at the train station–that’s important when it’s cold.  Leave the house at 08:00 and it’s a totally different story–over a 20-minute wait at the train station, and who knows how long it takes to get to work.

 

The first rule of talking about how people talk in Cincinnati is, don’t talk about how people talk in Cincinnati

I get disoriented, Zipf’s Law shows up, and I have breakfast.

IMG_3370
Grits. Photo source: me.

I walked out of my room today–I’m on the road, visiting a research center with which I have a long-standing collaboration–and ran into a local.  We greeted each other politely, saying “good morning” and remarking on the exceptionally cold weather–see this post on the subject of saying hello to strangers in the US–and it was clear from his accent, as well as the accent of the other people that I ran into on my way out of the building, that I was in Kentucky.  I walked outside, looked around, and it was immediately clear that I wasn’t in Kentucky at all, but rather Ohio.  Southern Ohio, specifically–and therein lay my accent mix-up.

Southern Ohio has an identity issue, especially in the moderately large city in which I find myself: is it in the North, or is it a part of Appalachia?  The local Kentucky-like dialect is very strongly socially marked, and people around here do not like having their dialect remarked on, especially if they speak that particular dialect.  Ohio dialects are actually quite diverse–Columbus, in the middle of the state, has four dialect boundaries, roughly corresponding to the four parts of the city divided up at the intersection of High St. and Broad St.–and around here in Cincinnati, there is a long history of prejudice related to social class.  Around here, that social class is reflected most strongly by which dialect you speak, or at least which dialect you speak in public.

The Zipf’s Law connection: I stopped by the cafeteria in the research center to pick up some breakfast, and was happy to see a big vat of Cream of Wheat, a childhood favorite with which my father’s second wife often fed me.  There was something wrong with it, though–what were all of those little yellow specks in it?  A quick look at the menu confirmed my suspicion: it was not Cream of Wheat at all, but rather grits.  Grits is a food of the southern United States, similar to a thin polenta.  Staring at a vat of grits immediately raises a question: how do you say grits in French?  In turns out that you don’t.  It’s actually a complicated issue.  The word can be singular or plural in English, and the French Wikipedia article on it starts out like this:

Le ou les Grits est une préparation culinaire…

“The (singular) or the (plural) Grits is a culinary preparation…”

That is, the word the shows up twice–once in a singular form, le, and once in the plural form, les.  To find some actual French Zipf’s-Law-type words related to grits, let’s look at a couple of sentences from the French Wikipedia article, this time on the subject of the manufacture of grits:

Le grits trouve son origine dans la préparation du maïs par les amérindiens. Traditionnellement, la semoule du grits est réalisée par un moulin en pierre qui broie le maïs. On tamise ensuite et la poudre la plus fine est utilisée comme farine, alors que la plus grossière est destinée au grits.

“Grits originates in the preparation of corn by the Native Americans.  Traditionally, the semolina of grits produced by a wood mill that grinds/crushes the corn.  It is then sifted and the finest powder is used as flour, while the coarsest powder is reserved for grits.”

Let’s just focus on the verbs.  Definitions from WordReference.com:

  • réaliser: to make, produce, or create.  (Several other meanings, too, but that’s the one here.)
  • broyer: transcription: [bʀwaje].  To grind or crush; figuratively, to destroy or wreck.
  • tamiser: to sift, to sieve.
  • destiner à: to reserve for.

Click here for a collection of materials from different Ohio dialects.  And yes, the title of this post is a reference to the book/movie Fight Club.

 

 

Molière, Tartuffe, Dr. Seuss, and the Grinch Who Stole Christmas

An unexpected connection between Dr. Seuss and one of the greatest French dramatists of all time.

Grinch 8d6b92b2-a41e-4740-8689-e986a12416fd
The Grinch, from Dr. Seuss’s “How The Grinch Stole Christmas.” Picture source: http://www.playbuzz.com/nedbullock10/how-much-of-a-grinch-are-you.

Molière was one of the great French dramatists.  He lived just after Shakespeare, and you can compare them quite a bit in terms of their skill with language–reading his play Le Tartuffe in the original was almost adequate recompense for two years of studying French, mostly in the hours before sunrise.

Dr. Seuss is one of the most beloved American children’s authors.  His classic Green Eggs and Ham was the first book my child ever read out loud, and the notorious American politician-nihilist Ted Cruz read it out loud on the floor of the Senate during his attempt to shut down the American government by filibuster.

One of Dr. Seuss’s most famous characters is the Grinch.  In his book How The Grinch Stole Christmas, the Grinch is a nasty, bitter character who decides to ruin everyone else’s Christmas by stealing their Christmas presents.

Reading the commentaries on Le Tartuffe, I was thrilled to see the character of Mme. Pernelle, the curmudgeonly mother of one of the main characters, described as grincheuse.  WordReference.com defines grincheux/grincheuse (male and female forms) as “grumpy, grouchy, or cranky.”  Could this be the origin of the name “Grinch”?  Wikipedia says yes!  Who would’ve guessed?

 

I’m going to die in 2028

Montaigne2291wl
Michel de Montaigne, 16th-century French philosopher and essayist. We call him Eyquem around the office. Picture source: http://hua.umf.maine.edu/Reading_Revolutions/MontaigneT.html.

I’m going to die in 2028.  I calculated this by starting with my year of birth, adding the median of my father’s and paternal grandfather’s ages at the time of their first heart attacks, subtracting a bit for my smoking as a teenaged problem child, adding a bit for my maternal grandfather’s long life despite his smoking (a friend of my mother’s once described my grandfather’s apartment to me as “nothing but books and cigarette smoke”), subtracting a bit for the deleterious effects of years of high cortisol production due to years of incredible stress (I have my own problem child), and adding a bit for the salutary effects of ten years of intensive study of the incredibly physically demanding sport of judo.  2028 works out great for me–I can retire and spend a couple years sitting around reading, and then croak right about the time that my paltry retirement savings run out.

The verb mourir, “to die,” turns out to be quite irregular in French, and since we’ve been working our way through irregular verbs, il serait séant q’on l’étudiât.  (That’s a little French morphosyntax joke.  A double joke, actually, since séant means both the very literary “fitting, seemly” and “backside, behind”–roughly fesses, if you’re French.)  As far as I can tell, some verbs have similar conjugations, but no other French verb is conjugated quite the same as mourir.  Let’s tour the various and sundry tenses and aspects.

Present indicative

Mourir has a vowel change in the present tense that is almost all its own–the only similar verb that I can think of is émouvoir. But, émouvoir is quite different even in this tense, as the third person plural form (ils/elles) has the root-final consonant of the other plurals, rather than the (lack of a) root-final consonant in the singulars, which is how mourir works.

je meurs nous mourons
tu meurs vous
on meurt ils/elles mourez

 Imperfect indicative

As far as I know, even irregular verbs are all regular in the imparfait, or imperfect indicative.

je mourais nous mourions
tu mourais vous mouriez
on mourait ils/elles mouraient

Passé simple (past historic)

Mourir is irregular in its own special way in the passé simple.  It takes the same endings as a set of irregular verbs that have past participles that end with u, but its past participle does not end with u.  (It’s mort(e).)  (See Laura Lawless’s page on the passé simple on About.com for the full set.)

je mourus nous mourûmes
tu mourus vous mourûtes
on mourut ils/elles moururent

Passé composé (compound past)

Of course, mourir has to be different and make the compound past with être, rather than as most verbs do, with avoir:

je suis mort nous sommes morts
t’es mort vous êtes morts
on est mort ils sont morts

Futur simple (future indicative)

This is one of those double-rr-in-the-future-tense verbs that we ran into in a recent post on irregular future-tense verbs.

je mourrai nous mourrons
tu mourras vous mourrez
on mourra ils/elles mourront

Present subjunctive

Mourir has the same unusual root vowel change in the present subjunctive as it has in the present indicative.

je meure nous mourions
tu meures vous mouriez
on meure ils/elles meurent

Imparfait du subjonctif (imperfect subjunctive)

Mourir is irregular in the imperfect subjunctive in the same way that it’s irregular in the passé simple, which is to say that it has a u in the stem

je mourusse nous mourussions
tu mourusses vous mourussiez
on mourût ils/elles mourussent

Participles

There are only three French verbs that have irregular present participles, and amazingly, mourir isn’t one of them.  The past participle is irregular, though–it doesn’t end with the i that a regular -ir verb would end with, but rather with a t(e) (depending on whether we’re talking about something grammatically male or grammatically female):

present: mourant paste: mort

Imperatives

Weird stem vowel change, once again:

mourons!
meurs! mourez!

I believe it was the famous French philosopher and essayist Montaigne who said that “to learn to philosophize is to learn how to die.” (An interesting contrast with my peeps at the café philo who felt that the point of philosophy is to learn how to live.)  Any way you slice it, if you’re going to die (and at some point you are), now you know how to talk about it in French.

I am getting soooo tired of learning vocabulary related to terrorist activity

It’s impossible to keep up with current events these days without a robust vocabulary related to terrorism and police actions.

lance-roquettes-perquisition
Rocket launcher seized in a police raid in Lyon on November 13, 2015, three days after the attacks in Paris. Picture source: http://www.lefigaro.fr/actualite-france/2015/11/26/01016-20151126ARTFIG00199-des-perquisitions-contestees-mais-efficaces-selon-les-policiers.php.

With the world apparently going violently crazy–this year is only 14 days old, and so far we’ve had terrorist attacks in Istanbul, Tel Aviv, Jakarta, Hurghada (Egypt), Marseille, Kabul–more than I have the patience to list, actually (Wikipedia says 28 of them), and that’s not even counting the nutjobs occupying federal property in Oregon, who haven’t actually killed anyone yet–the news from France is full of stories of searches, arrests, and the like.  A bit of trivia: according to Wikipedia, the English word terrorism comes from the French word terrorisme, which originally referred to the policies of the revolutionary government during the Reign of Terror (1793-1794).  Here’s some of the relevant vocabulary that keeps showing up in the news these days.  Definitions from WordReference.com:

  • la garde à vue: custody.  Not to be confused with:
  • le garde-à-vous: attention (the military posture).  If you’re French: to us Americans, this sounds just like garde à vue–really.
  • être placé en garde à vue: to be held in custody.
  • mettre en garde à vue: to put in custody or to send back to custody–I’m not sure about this one.
  • perquisitionner: to search.
  • la perquisition: a police raid, a police search.
  • faire une perquisition: to carry out a search.
  • le mandat de perquisition: search warrant.
  • les menottes (f): handcuffs.
  • passer les menottes à qqn: to handcuff someone.

 

 

Linguists versus normal people

Linguists and normal people can be quite different. Here’s one way.

A difference between linguists and normal people: non-linguists get excited about irregular things in language, while linguists mostly don’t.  I can hardly go to a wedding, bar mitzvah, or quinceañera–any place where you meet new people, basically–without someone saying “isn’t it funny how the plural of foot is feet?”… or something along those lines.  From a linguist’s point of view, the irregular ones are the easy ones–for a child to learn them, all they have to do is remember them.  In contrast, for the regular forms, the child actually has to figure out a system–a much more abstract problem.  From another perspective, suppose that your job (like mine) has to do with figuring out how to make computers process language.  The irregular things are easy–there’s a limited number of them, so the program can just look them up.  In contrast, the regular ones are essentially infinite (languages add new words all the time, and they’re almost always regular), so the computer has to be able to figure them out somehow (and that’s how I stay employed).  So, linguists mostly aren’t that interested in irregular forms–regular ones are much more what we’re trying to figure out. But, one needs to know the irregular verbs–hence this post about verbs that are irregular in the future tense.  (See here for verbs that are regular in the future tense.)  Of course, like the majority of verbs, the frequency of any of these (irregular) verbs is quite low, so Zipf’s Law comes into effect–see below for how that led to an embarrassing incident for me in a bookstore.

The good news about verbs that are irregular in the future tense in French is that the inflections stay the same.  The bad news is, there are still a lot of verbs that manage to be irregular.  I’ll try to arrange them in some sort of structured way that makes it easier to see what kinds of patterns tend to recur in the irregularities.  Note: Watch the pronunciation of these–some of them are non-intuitive.  I suggest listening to the recordings on this page on the Tex’s French Grammar web site.

There are two things that make memorizing the irregular future tense forms a bit less intimidating:

  • The inflections (endings for person and number) are the same even in irregular futures.
  • There’s always going to be an r.

OK, let’s try to group these. What we’d like to find is groups where a particular infinitive form maps to a particular irregularity–not always possible, but when we can, it should help us in our memorization. Throughout, I’ll give the il/elle/on form of the conjugated verb.  Note that some verbs may show up in more than one
grouping. Don’t see that as a problem—see it as an extra opportunity to remember
the form.

One pattern is that roots that do not have a d get a d added in the future tense.

venir viendra
tenir tiendra
obtenir obtiendra

Do those have anything in common?  Hells yeah–the root ends with -enir.

Some verbs with an l lose it when this picking-up-a-d thing happens:

falloir faudra
vouloir voudra

Another pattern is that some verbs end up with a double r, sometimes losing the final consonant of the root in the process:

mourir mourra
courir courra
envoyer enverra
pouvoir pourra
voir verra

Here’s a pattern where oi goes away, leaving behind a v:

devoir devra
pleuvoir pleuvra
recevoir recevra

There’s one that I can’t make fit into any other grouping, and it’s one with an embarrassing story attached to it.  I went into a great travel book store in Paris (Librairie Ulysse on the Ile Saint-Louis) and asked for books about Benin.  (If it’s in italics, it happened in French.)  Are you going to Benin?  asked the owner.  I don’t know–I have an application in to a volunteer program there, I answered.  When will you saurez?, she asked.  I stood there with a panicked look on my face until I remembered that saurez is the irregular future tense of the verb savoir, “to know.”

savoir saura

There are some super-irregular ones that are very important, just because they occur very
frequently: être (to be), aller (to go), and faire (to do, to make).  This post is long enough already, so we’ll come back to these another time.

Remember: from a linguist’s point of view, the irregular verbs are the easy ones. So:
no complaining—just memorize them with me! And, if you can come up with any patterns/groupings that I missed, I would love to hear about them.

The actor, the drug lord, and Zipf’s Law

Screenshot 2016-01-12 13.31.17
Screen shot of Rolling Stone’s tweet announcing the Sean Penn/El Chapo interview. That’s Penn on the left and El Chapo on the right.

The French press is just as abuzz about the whole Sean Penn/El Chapo thing as the American press is.  In case you’re reading this 10 years from now: “El Chapo” is the nickname of Joaquin Guzman, until recently one of the biggest drug dealers in the world–he may still be, although from behind bars.  He was arrested and imprisoned in 1991, and then escaped.  He remained free until 2014, when he was recaptured and, again, imprisoned.  In 2015 he escaped again, this time apparently being driven down an underground tunnel on the back of a motorcycle.

All was well for him until he decided that a movie should be made about his life.  He reached out to the motion picture industry, which somehow led to him being “interviewed” by American actor Sean Penn, the resulting interview being published in Rolling Stone magazine.  This exposure led to him being captured once again, and at the time that I write this, he is still in jail, hoping like hell he doesn’t get extradited to the United States.

We actually have two Zipf’s Law connections with this story–one regarding Sean Penn, and one regarding El Chapo.

Some time ago, we read about the filles du roi–the French orphans who were sent to Canada to get married and increase the French-speaking population in North America between 1663 and 1673.  You might recall from that post that Madonna is descended from a fille du roi.  In 1985, she broke the heart of every young man in America by marrying Sean Penn.

El Chapo’s Zipf’s Law connection is lexical–i.e., one of those bazillion words that is not particularly unusual, but that you nonetheless almost never hear (in the big scheme of things).  As I said, El Chapo is all over the news, and he is almost referred to by his name–Joaquin Guzman–followed by surnommé El Chapo, meaning “nicknamed El Chapo.”  Here’s an example from France 24‘s web site:

Les autorités mexicaines ont annoncé vendredi l’arrestation de Joaquin Guzman, surnommé “El Chapo”, le plus important narcotrafiquant mexicain, qui s’était évadé de façon rocambolesque d’une prison de haute sécurité le 11 juillet dernier.

On Friday, Mexican authorities announced the arrest of Joaquin Guzman, nicknamed “El Chapo,” the most important Mexican drug trafficker, who had escaped in an extraordinary fashion from a high-security prison last July 11th.

Surnommer is a great example of Zipf’s Law–not particularly unusual, but low-frequency enough that I haven’t run into it in two years of studying French quite seriously.  Again, it means “to nickname.”  I’m not telling you any of my nicknames…

 

 

The US small arms market will sell $3,985,000,000 worth of firearms in 2020: the future tense in French

John_William_Waterhouse_-_The_Crystal_Ball
The Crystal Ball, by John William Waterhouse. La boule de cristal, in French. Picture attribution: John William Waterhouse [Public domain], via Wikimedia Commons
I’ve been reading predictions about the United States and France in 2020.  The Euromonitor International website says that the US had the largest economy in the world in 2010, but will have slipped to #2 (behind China) in 2020; it says that France had the 7th-largest economy in 2010, but will have been displaced by Brazil to drop to #8 by 2020.  According to the Congressional Research Service, France got 10.3% of its energy from renewable sources in 2005, and has a target of 23% by 2020. Statista.com thinks that the US small arms market will sell $3,985,000,000 worth of firearms in 2020, and that the total population of France (currently 64.21 million) will be 65.7 million.  (The US should be a bit above 330 million.) Americans are projected to eat 5,404 metric tons of cheese in 2020; statistica.com cut me off before I could get the relevant numbers for France, but revenues from cheese sales in France last year should have been about $37.71 billion.

Talking about any of these predictions requires that we be able to use the future tense.  And, there’s no time like the present to talk about the future, right?

For starters, let’s look at the inflection of a regular verb or two. Well: three… We’re going to concentrate on the particular future tense called simply the futur, as opposed to the futur proche.  As the French Crazy web site explains it: while the futur proche or “near future” tense (the one formed with the verb aller) is used to refer to “events that are certain to occur and are happening relatively soon”, the futur “is used to talk about more general or distant future events. These events are slightly more uncertain because the amount of time needed to elapse is greater than the near future.”  Here is the paradigm for the futur.  I recommend that you listen to the pronunciations of even the regular -er verbs on the Tex’s French Grammar web site. For your convenience, I’m going to use the same examples as Tex:

nager (to swim) réfléchir (to think) rendre (to give back)
je  nagerai  réfléchirai rendrai
tu  nageras  réfléchiras  rendras
on  nagera  réfléchira  rendra
nous  nagerons  réfléchirons  rendrons
vous  nagerez  réfléchirez  rendrez
ils/elles  nageront  réfléchiront  rendront

It’s way too easy to confuse the future with the conditional, and we’re going to need both of them to form the compound tenses that we’ve been talking about lately, so let’s look at the potential points of confusion between the two.

The potential problem comes from the fact that both the futur and the conditionnel maintain the r sound of the infinitive.  One of the linguist’s best approaches to everything is to look at “minimal contrasts,” so let’s try that.

In the first person singular (je), the futur and the conditionnel sound the same.  This screws me up constantly when I’m listening to someone else.  In writing, though, they are differentiated by the presence of a (silent) s in the conditional:

nager (to swim) réfléchir (to think) rendre (to give back)
Future je nagerai je réfléchirai je rendrai
Conditional je nagerais je réfléchirais je rendrais

The tu and on (first person singular informal and third person singular) inflections are not very confusable, but the nous (first person plural) inflections are.  Here the difference is that the conditional has an i:

nager (to swim) réfléchir (to think) rendre (to give back)
Future nous nagerons nous réfléchirons nous rendrons
Conditional nous nagerions nous réfléchirions nous rendrions

There’s a similar “minimal contrast” in the vous (second person singular formal or second person plural) inflections:

nager (to swim) réfléchir (to think) rendre (to give back)
Future vous nagerez vous réfléchirez vous rendrez
Conditional vous nageriez vous réfléchiriez vous rendriez

The ils/elles (third person plural) forms are pretty distinct, so we’ll skip those, too.

There are tons of verbs that are irregular in the future, so we’ll come back to the future in a future post.  (Sorry.)  There are also a number of differences in when the future tense versus the present tense get used in English versus French, and we’ll come back to those, too.  In the meantime: I’m going to have a cup of coffee.  (How many different ways have I formed the English future tense in this paragraph?)

 

 

Ukrainian Humanitarian Resistance

Resisting the russist occupation while keeping our humanity

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE