Zipf’s Law English: reduction

Spoken American English can be very difficult to understand. Here’s a video to help you cope with one of the problems therewith.

Walking out of the exam on oral comprehension during the testing for the Diplôme approfondi de langue française a couple months ago, I found a very unhappy-looking young man waiting for the elevator.  Are you OK?  He shook his head glumly: I flunked again, I know it.  I made sympathetic noises.  Was this your first time taking the test?  I responded in the affirmative.  He gave me a look of pity–clearly the expectation was that I was going to find the experience as brutal as he had.  Repeatedly, apparently.

Indeed, the oral comprehension exam got me my worst score out of the whole test.  Spoken French and spoken English can both be brutally difficult to understand if they’re not your native language, and for many of the same reasons.  One of those is their sets of vowels–both languages have vowel “inventories” (the technical term) that are shared by relatively few languages.  Another is a process called reduction, which leads to things having a range of ways that they could be pronounced, some of which are less distinct than others.  For example, in French, some unstressed vowels are optional in casual spoken language, so that cheveux is often pronounced chveux, matelot can be pronounced matlot, and so on.  Furthermore, the sounds that are “left behind” can be changed as a result, so that, for example, the in je becomes pronounced as ch when je suis is “reduced” to chuis.  So, when I describe this as becoming “less distinct,” think about this.  In French, there are these two words, and the difference between them is the sound of versus the sound of ch:

  • le jar: secret language, argot
  • le char: chariot; in Canada, car.

When becomes ch, as in chuis, the difference between the two sounds goes away, and in that sense, a “reduced” word is less distinct from other words than it might have been.

Reduction processes are rampant in spoken American English, and they can make the language pretty difficult to understand if you’re not a native speaker.  I’m trying my hand at putting some videos together that aim to help people learn to understand these reductions.  You can find the first one, on the topic of the reduction of let me to lemme, at the link below.  If you’re as mystified by spoken American English as I am by spoken French, check it out–I’d love to have feedback on what does and doesn’t work, whether that be here on this blog, or in the Comments section on YouTube.  Unfortunately, I haven’t figured out the whole subtitle thing, and I’d like to know to what extent that does or doesn’t interfere with the effectiveness (or lack thereof) of the video.  Any input at all would be appreciated, though!

Vocabulary hoax and social stratification

Recommended readings on language: the Great Eskimo Vocabulary Hoax, and what’s on the fourth floor.

Apropos of nothing, here is a blog post with this week’s suggested readings for a class that I’m teaching.  Some of them are quite interesting and not at all technical.  In particular, the Great Eskimo Vocabulary Hoax piece by Geoffrey Pullum talks about issues that have come up multiple times in the Comments section of this blog, and the Social stratification of (r) in New York City department stores piece by William Labov is quite fascinating if you’re interested in language and society.

Suggested readings for Weeks 2 and 3, natural language processing

English notes

apropos of nothing: used to introduce a new topic that isn’t related to anything that’s previously been under discussion.  Examples:

How it was used in the post: Apropos of nothing, here is a blog post with this week’s suggested readings for a class that I’m teaching.  

My last assclown

Picture source:

Winter will be past before we know it.  I’ll see the chestnuts blooming in the Place Cambronne on my way home from work (on my way to work, I study vocabulary, and don’t notice them), and rejoice in the knowledge that they will survive even the zombie apocalypse.  Not far behind will be National Poetry Month.  In anticipation of that, and after a long weekend of contemplating what exactly it means to have a thin-skinned assclown, a man who rages in response to tweets and threatens the press when he doesn’t like their reporting, with his fingers on the most powerful nuclear arsenal in the world, I propose a timely bit of Robert Browning.  Follow this link if you’d like to hear a pretty good recording thereof.  It’s pretty disturbing in and of itself, and all the more so with Trump in the presidency.  I gave commands;  then all smiles stopped together. There she stands  as if alive….Notice Neptune, though…thought a rarity, which Claus of Innsbruck cast in bronze for me!  (Rough translation: I had her killed.  Hey, look at this great thing that I have!)

My Last Duchess

Robert Browning

That’s my last Duchess painted on the wall,
Looking as if she were alive. I call
That piece a wonder, now; Fra Pandolf’s hands
Worked busily a day, and there she stands.
Will’t please you sit and look at her? I said
“Fra Pandolf” by design, for never read
Strangers like you that pictured countenance,
The depth and passion of its earnest glance,
But to myself they turned (since none puts by
The curtain I have drawn for you, but I)
And seemed as they would ask me, if they durst,
How such a glance came there; so, not the first
Are you to turn and ask thus. Sir, ’twas not
Her husband’s presence only, called that spot
Of joy into the Duchess’ cheek; perhaps
Fra Pandolf chanced to say, “Her mantle laps
Over my lady’s wrist too much,” or “Paint
Must never hope to reproduce the faint
Half-flush that dies along her throat.” Such stuff
Was courtesy, she thought, and cause enough
For calling up that spot of joy. She had
A heart—how shall I say?— too soon made glad,
Too easily impressed; she liked whate’er
She looked on, and her looks went everywhere.
Sir, ’twas all one! My favour at her breast,
The dropping of the daylight in the West,
The bough of cherries some officious fool
Broke in the orchard for her, the white mule
She rode with round the terrace—all and each
Would draw from her alike the approving speech,
Or blush, at least. She thanked men—good! but thanked
Somehow—I know not how—as if she ranked
My gift of a nine-hundred-years-old name
With anybody’s gift. Who’d stoop to blame
This sort of trifling? Even had you skill
In speech—which I have not—to make your will
Quite clear to such an one, and say, “Just this
Or that in you disgusts me; here you miss,
Or there exceed the mark”—and if she let
Herself be lessoned so, nor plainly set
Her wits to yours, forsooth, and made excuse—
E’en then would be some stooping; and I choose
Never to stoop. Oh, sir, she smiled, no doubt,
Whene’er I passed her; but who passed without
Much the same smile? This grew; I gave commands;
Then all smiles stopped together. There she stands
As if alive. Will’t please you rise? We’ll meet
The company below, then. I repeat,
The Count your master’s known munificence
Is ample warrant that no just pretense
Of mine for dowry will be disallowed;
Though his fair daughter’s self, as I avowed
At starting, is my object. Nay, we’ll go
Together down, sir. Notice Neptune, though,
Taming a sea-horse, thought a rarity,
Which Claus of Innsbruck cast in bronze for me!

English notes

assclown: “someone who, wrongly, thinks his actions are clever, funny, or worthwhile.”  ““someone who seeks an audience’s enjoyment while being slow to understand how it views him.”  A specific kind of asshole, defined as “A person counts as an asshole, when and only when, he systematically allows himself to enjoy special advantages in interpersonal relations out of an entrenched sense of entitlement that immunizes him against the complaints of other people.”  Sources: John Kelly on the Strong Language blog, and Aaron James, in his book Assholes: a theory of Donald Trump.

Fra: “used as a title equivalent to brother preceding the name of an Italian monk or friar” (Merriam-Webster).  My best guess is that it’s used here to suggest that the Duke things that the painter was overly familiar (brother) with his wife, and/or that his wife was overly familiar with the painter.

familiar: a word with at least two parts of speech (adjective, of course, but also noun).  In the (attempt at an) explanation above, it’s used with this range of meanings, again from Merriam-Webstera :  being free and easy


association of old friends> b :  marked by informality familiar essay>

For my money, I really need to get more sleep

I can’t sleep, which leads to tokenization issues and the definition of “for my money.”

I don’t sleep well.  That is to say: I don’t sleep very much.  Not at night, anyway.

In the best-case scenario, the middle of the night, when in theory I should be sleeping, is my time to study vocabulary or to read.  In the worst-case scenario, the middle of the night is when I return emails from people who are in North America, and therefore awake.

Tonight’s email brought a help-wanted ad from the School of Informatics at the University of Edinburgh, posted by the amazing Mirella Lapata.  (I say “amazing” because her paper with Regina Barzilay at the Association for Computational Linguistics annual meeting in 2005 opened my eyes to the possibilities for inventive evaluation strategies in computational linguistics in a way that my eyes had not previously been opened.)  For my money, the University of Edinburgh’s graduate program in computational linguistics is the best in the world, so I forwarded Mirella’s email to the students in our program, most of whom are not computational linguists, but most of whom would be quite suited for one of the advertised jobs in the School of Informatics.  I added the following introduction to the email:

Picture source: me.

This got me the following response from one of my students in the US (and therefore awake):

Picture source: also me.

Now, I love getting this kind of question, for many reasons.  It lets me repay the apparently endless patience of my colleagues in France for my crappy command of their language.  It lets me be the person who knows the answer to a question about language, which in French happens exactly never.  It gives me a socially acceptable excuse for talking about language, which I enjoy way more than is cool.  It suggests that someone actually both read and thought about what I wrote.  (You pick whichever one you think portrays me in the best light.)  In fact, I love that kind of question so much that I will often go out and find naturally-occurring examples, which like any good linguist these days, I do on the Interwebs.  A trip to the Sketch Engine web site and a search of the Open American National Corpus found me these:

Picture source: screen shot of the Sketch Engine web site.

…which, of course, like most things of interest, leads to a question. In this case, the question is: what’s wrong with the Sketch Engine web site?  Where did all of those spaces come from?  

The answer: there’s nothing wrong with the Sketch Engine web site.  Part of any analysis of written data is choosing an answer to this question: what is a word?  It’s not typically obvious what the answer is.  Give students in a beginning language processing class this sentence, and ask them what the words are:

My dog has fleas.  

(For reasons that are obscure to me but that I think have something to do with playing the ukulele, that is a famous sentence.)  Ask them what the words are, and the first answer will be anything separated by white space:

My dog has fleas.

…at which point they quickly realize that they’ve just posited that fleas. is a word, and they modify their hypothesis, to be anything separated by white space and stripped of punctuation: 

My dog has fleas .

(I’m not making this up–in fact, I did it in class last Tuesday.)  Next they figure out that they probably want My and my to be considered the same word, which means that they need to do something about the case of letters, and if they speak any of the bazillion languages that have more inflectional morphology (example in a minute) than English does, then they might want to do something with aller/allais/allai/allasse, etc.

Things get pretty complicated pretty quickly, though.  Suppose that you’re dealing with English.  What do you do with

wouldn’t don’t haven’t didn’t

Seems pretty straightforward–you want something like this:

would n’t do n’t have n’t did n’t

…except that it’s not straightforward at all, because then you have to propose

wo n’t

…which people generally aren’t happy about.

The table of contents of “Le mot,” by Maurice Pergnier. The point of the picture is that the first 46 pages of the book address the various arguments for and against the whole idea of the word. Picture source: me.

There are a variety of ways to answer these sorts of questions, and it does actually matter.  From a practical point of view, the choices that you make about how you do this–the process is called tokenization–is important enough that it affects the performance of computer programs that do things with language.  (Here’s a recent paper on the topic.)  From a theoretical point of view,  your choice takes a position on a hugely controversial topic in linguistics: what a word is.  (The best discussion of the controversy that I’m aware of is in the book Le mot, by Maurice Pergnier.)

So, why are those spaces there in the Sketch Engine output?  Let’s look at it again:

Picture source: screen shot of the Sketch Engine web site.

One of the immediately obvious things is that they have “tokenized” the punctuation off, so that “personal growth” becomes ” personal growth ” and (1995) becomes ( 1995 ).  The next thing that you might notice is that there is some ambiguity in the output.  Look at what happens to that’s and people’s ..

…which become

that ‘s and people ‘s

Now we have two ‘s …and they are different, but look the same.  What is a computer program to do with that?  Welcome to my world.  Nobody said that computational linguistics was going to be all about suicide prevention and curing cancer, right?

The Regina Barzilay and Mirella Lapata paper that I mentioned above:

Regina Barzilay and Mirella Lapata. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 141-148. Ann Arbor.

The declaration of competing interests: I don’t have any.  Sketch Engine doesn’t pay me–I pay them, and I get a hell of a lot of use out of it.  

French notes

— Je vous regardais tout à l’heure, vous étiez marants tous les deux le flicmane et vous.

— A tes yeux, dit la veuve Mouaque.

— “A mes yeux?  Quoi, “à mes yeux”?

— Marants, dit la veuve Mouaque.  A d’autres yeux, pas marants.

— Les pas marants, dit Zazie, je les emmerde.

Raymond Queneau, Zazie dans le métro

How would you say for my money in French, or more generally, label something as someone’s opinion, yours or otherwise?  There are a lot of options, and unfortunately, I don’t know the status of any of them with respect to register of language, contexts in which they are or aren’t appropriate, etc.  Here’s what I’ve come across so far, and I should point out that I also don’t know which of these can only gracefully be used to introduce your own opinion, versus which could also be used to introduce someone else’s opinion.  I’ll also mention (and then I’ll shut up) that of all of these, I’ve heard the first one (à mon avis) the most, the second one exactly once (in Raymond Queneau’s Zazie dans le métro), and the rest never, as far as I know.  If any of you native speakers out there can offer suggestions about when and where to use which of these, it would be great.

  • à mon avis
  • a mes yeux
  • selon moi 
  • à mon gré
  • de mon point de vue
  • d’après moi
  • d’après mon point de vue
  • à mon sens
  • de l’aveu de qqn: I think that this one implies something negative, along the lines of “as Chomsky himself admits,” as opposed to relaying an opinion about which you’re not necessarily making any judgment one way or the other.

In which I repay a random act of kindness by being a jerk

“Wakarimasen” means “I don’t understand” or “I don’t know.” Picture source:

I was walking down the street in Tokyo this morning when a fellow foreigner acknowledged my existence.

This is a far rarer occurrence than you might think in this country with a very low immigration rate, where running into another “Western” foreigner is pretty uncommon outside of tourist areas, and you might expect that it would lead to at least a smile, if not an actual conversation.  I’ve had many occasions when Japanese who spoke some English struck up random chats with me, but I’ve noticed that the few foreigners who you run into in Japan will, in general, resolutely avoid meeting your eyes.  (Note that I’m talking about foreigners who live here–not tourists.)  Why?  I can only guess.  OK, my guess: foreigners here in Japan struggle so very hard to integrate themselves into the culture that I suspect that they’re loath to, in some sense, admit that they are “others” by sharing in the otherness of some random visitor such as myself.

So, when a clearly foreign guy caught my eye and smiled at me this morning on my way back from a morning visit to the neighborhood shrine, I was so surprised that I don’t think I smiled back.  Then I felt like a total jerk.  Maybe being someone who lives here–you don’t come out of the very busy Ochanomizu station at that time of the morning unless you’re going to work, so I’m guessing that he does–he’s used to getting that reaction from other foreigners.  Still: I felt like even more of an asshole than I usually do.

French notes

le sanctuaire shinto: Shinto shrine

English notes

to meet someone’s eyes: to look directly into someone’s eyes, acknowledging the contact.

How it was used in the post: I’ve noticed that the few foreigners who you run into will, in general, resolutely avoid meeting your eyes.

to be loath to: to be deeply unwilling to do something.  (Definition adapted from Merriam-Webster.)

to loathe: to dislike to the point of disgust. 

Keeping track of the difference between these two is actually quite difficult even for native speakers.  You can read an article about the history of the problem here on the Merriam-Webster web site.  There are two parts to it.  One is keeping straight the fact that the verb ends with an e, and the adjective doesn’t.  The other is that the verb is pronounced with the th of this and the, while the th of the adjective can be pronounced with the th of this and the, or with the th of thin.    

How this showed up in the post: foreigners here in Japan struggle so very hard to integrate themselves into the culture that I suspect that they’re loath to, in some sense, admit that they are “others” by sharing in the otherness of some random visitor such as myself.

Why you can’t unsee things: compositionality and Haussmann’s apartment buildings

You can unscrew a lightbulb, you can unplug your monitor, and you can unbuckle your suspenders, so why can’t you unsee things?   It has to do with the prefix un- when it’s attached to verbs.  In order to be able to un- a verb:

  • The verb has to refer to changing the state of something.  So, you can undress yourself (changing your state from being dressed to not), you can unclog a pipe (changing its state from being clogged to not), and you can unlock a door (changing its state from being locked to not).
  • The state has to be reversible.  So, you can dress/undress yourself, you can clog/unclog a pipe, and you can lock/unlock a door.  But: you can bake a cake, but can’t unbake it; you can dry a shirt, but as far as I know, you can’t undry it; you can break an egg, but you can’t unbreak it.

So: you can see something, but you can’t unsee it, because when you see something, you’re not changing its state, and that’s the sine qua non of verbs that can take un-.

Ack–data!  I almost forgot that I’m an empiricist!  In fact, the verb to unsee occurs a lot.  It occurs with a frequency of 0.02 words per million in the enTenTen13 corpus (19.7 billion words of English, available on the Sketch Engine web site).  But, it’s cool: it doesn’t mean to undo the seeing of something.  When we talk about unseeing things, we’re usually talking about the very fact of not being able to unsee them, and what that actually means is this: we can’t forget them, and/or we can’t move beyond whatever we learned from what we saw.

In fact, the interwebs are full of talk about things that can’t be “unseen.”  Some examples:

Why does unsee work so well for this use, when it can’t have the meaning that you would think it would?  I suspect that it’s precisely because (a) it’s basically an impossible verb, and (b) it’s used only to describe an impossible action.  And, the fact that the meaning of unsee is not the meaning of see plus the meaning of un- is important here.  We’ve talked often about the basic principle of compositionality–the idea that meaning in language comes from something like “adding together” the meanings of different things.  Here is a case where the meaning is clearly not compositional–to unsee something, were it possible, would not be what it is if it were compositional.  (Were it possible explained below in the English notes.)  So: cool, if you think that it’s cool to violate the expectations of linguistics, computer science, and philosophy.  (I do think it’s cool, but maybe that’s why I’m single.)

What I can’t unsee: pierres d’attente.  I took a guided tour of Haussmannian Paris the other day.  What that means: the enormous redesign of Paris in the 3rd quarter of the 19th century, when huge swaths of the city were torn down and rebuilt into the stereotype that you’re thinking of when you visualize Paris today.  (See here for a post about the typical Haussmannian streets and how they relate to your ability to survive the zombie apocalypse in Paris, as well as here for a post about the typical Haussmannian apartment buildings and how they, too, relate to your ability to survive the zombie apocalypse in Paris.)

The new Haussmannian buildings went up in the order in which their lots were appropriated, the old buildings torn down, and the new buildings financed.  That meant that it was often the case that buildings were put up that one day would have neighbors, but didn’t yet.  In anticipation of the need to line up with adjacent buildings–lining up with things was very important in Haussmann’s Paris–the front-facing walls of the buildings had projections that were meant to facilitate alignment with future neighbors.  So, pierre d’attente: “waiting stone,” I guess.  (I think they can also be called pierres d’accord.)

Now, at some point, architects realized that if you have pierres d’attente sticking out of the side of your building, they catch rain, and then it can run into your walls, and that is most definitely not a good thing for your building.  So, people started cutting them off, which is why you will see things like this:

Apartment building with pierres d’attente removed. Picture source: me, on the rue La Fayette.

But: not everyone was happy about this.  Haussmannian apartment buildings are part of our patrimoine, and pierres d’attente are part of Haussmannian apartment buildings, so those pierres d’attente are part of our patrimoine, and no asshole should be cutting them off, right?  Point taken, and cutting off your pierres d’attente is apparently no longer allowed.  But, hey, this is France, and we’re logical–so, what you can do is, you can cut them so that there’s a pente, a slope, on the top edge.  (I just had to throw the French word in there, on account of the fact that when I memorized it, I thought that I would never, ever get to use it–and there, my friends, is a very concrete example of Zipf’s Law in action.)

The guided tour was great.  Seulement voilà (the thing is)…the tour guide explained pierres d’attente to us, and now I can’t stop seeing them.  It’s OK–frankly, the more there is to occupy my fevered little brain, the better…

English notes

Anglophone students of French whine about the French subjunctive, and frankly, I’m not sure that Francophone professors are thrilled about teaching it to us, but: the fact is, English has a subjunctive voice, too.  Or, more accurately: it can.  This varies quite a bit by dialect, but English can have a subjunctive, in at least the following circumstance: talking about things that are not real at the moment.  For example, here are some options, with and without the subjunctive:

  • If I were you, I wouldn’t tell him to fuck off–he’s a lot bigger than you are.
  • If I was you, I wouldn’t tell him to fuck off–he’s a lot bigger than you are.

You can recognize the subjunctive by the weird agreement of If I were you, rather than If I was you.  Both are correct, and most Americans would say If I was you, but If I were you is more natural in my dialect.  (I come from a relatively obscure area in the northwest of the country.)

  • Would you prefer that he give you a pat on the back, or a kick in the ass?
  • Would you prefer that he gives you a pat on the back, or a kick in the ass?

Again, you can recognize the subjunctive by the weird agreement of he give you versus he gives you.  

How the subjunctive was used in the post: Here is a case where the meaning is clearly not compositional–to unsee something, were it possible, would not be what it is if it were compositional.  I chose obscenity-laden examples to make clear that this isn’t a formality thing–the subjunctive is just more natural in my dialect.  Again, most American speakers of English would say the form of these two sentences without the subjunctive, but both are fine.  I have no idea how this works in the United Kingdom–can any of you Brits comment on this?


If Paris were full of the living dead

Who among us has not looked across the majestic sweep of the Place de la Concorde, up the stretch of the Champs Elysées, or through the luxurious Luxembourg Gardens and wondered: what will this place look like when it’s overrun by zombies?

Picture source:

I first published this on November 13, 2015, from Denver, Colorado. Not long afterwards, phone calls and texts started coming in fast and furious: relatives who were hearing about the Islamic State terrorist attacks that would kill 130 people and injure another 368 that evening.  The post didn’t seem so funny in that context, and I took it down after an evening of trying to reach family and friends in Paris.  14 months later, Paris has brushed off her shoulders and kept walking, as she always does, and I am ready to play my infinitesimally small part in that.

Who among us has not looked across the majestic sweep of the Place de la Concorde, up the stretch of the Champs Elysées, or through the luxurious Luxembourg Gardens and wondered: what will this place look like when it’s overrun by zombies?  Who among us has not looked down an unending line of the 7-story Hausmannian apartment blocks that make Paris look like Paris and thought: it would really suck to have to clear 7-story building after 7-story building–with optional basement–of zombies…

The English Wikipedia page on zombies is quite long, and discusses zombies from every angle that one could think of–folklore, the evolution of the zombie archetype, the zombie in modern fiction, the significance of the zombie apocalypse, and the zombie in popular culture–each with its sections and subsections.  In contrast, the French Wikipedia page on zombies is pretty much just this sentence:

Un zombie (ou zombi) est, dans le folklore, un mort-vivant ou un individu infecté d’un virus nuisible à certaines parties du cerveau.

Of course, even with just one sentence, Zipf’s Law brings us some new vocabulary items:

  • le mort-vivant: living dead.
  • nuisible: harmful, damaging, injurious; pest.

I have no idea what it means that there is a long English Wikipedia page on zombies and a very short French one.  Probably something profound about France and America, but I don’t know what.  I do know this: I hate zombies.

About 14 months later, the French Wikipedia page on zombies is considerably longer, and I’ve reached a new level in my thinking about the relationship between zombies and those Haussmannian apartment buildings: they will contain the zombies nicely, so they’re actually going to be a big help in recovering from the zombie apocalypse.  However, I’m leaving this post as it was on November 13th, 2015–a fond memory of a more insouciant time.

I see you’re wearing your fat clothes: suicide isn’t about what you think it is

Things are only interesting when they’re different from what you expect. Here’s an example. Two, actually.

Im_Fat-1 fuck off
Picture source:

One December night a couple years ago I came home from a pleasant evening spent wandering the Christmas market on the Champs Elysées with some friends to find a call from my mother.  My father’s heart had stopped in the emergency room–twice.  By an amazing stroke of luck, his cardiologist had been passing through at the time, and he had resuscitated him.  They pumped on his chest, they shocked him–lots of times.  Now he was on a ventilator (a machine that breathes for you) in the intensive care unit, and it wasn’t clear whether he would survive the night.

I got on the phone with the airline, threw some clothes in a suitcase, and the next morning I was on the first plane out of Paris.  After crossing the Atlantic, and then North America, and then switching planes for the last leg from San Francisco to our home town in Oregon, I finally landed in Portland.  I grabbed a rental car and sped to the hospital.

I got to my father’s room.  He had survived the night.  He was doing well, all things considered.  The breathing machine had been removed.  The number of IV tubes, monitors, and other beeping and buzzing things that he was attached to was not enormous, given the circumstances.  He was still hoarse from the tube that had been in his throat as he greeted me in his own special way:

I see you’re wearing your fat clothes.

(Your “fat clothes” are the clothes that a person whose weight tends to go up and down wears when their weight is up.)

This didn’t feel anywhere near as bad as it must sound.  In fact, my reaction was: Okey-dokey–looks like he’s doing fine!  And, I actually wasn’t anywhere close to as fat as I usually am, so it seemed like a win, as far as I was concerned.  It might not actually be the case that every cloud has a silver lining, but you can at least try to ignore the fucking cloud, right?

I’m guessing that you laughed at my story.  Possibly you’re crying, remembering your own parents criticizing your weight, or your choice of clothes, or your choice of boyfriend, career, or political party–if so, I apologize.  In either case, my story probably made an impression.  Why?  Because it is so entirely different from what one would expect.

It’s differences that make things interesting.  I say that not as a statement about the value of diversity (although diversity is valuable) or about the value of surrounding oneself with dissenting opinions (although dissenting opinions are valuable), but as an assertion about why we are interested in things, and in particular, about why we read what we read.  Presumably you don’t pick up the newspaper in the morning to see what was the same yesterday as the day before–you pick it up to see what was different yesterday from what usually happens.  Roger Schank (famous artificial intelligence guy once upon a time, not-quite-so-famous Trump University guy more recently) has a whole theory about this being the reason that time seems to go by faster as we get older–the more we’ve already experienced, the fewer new things there are to notice, and so we just don’t notice time going by the way that we did when we were younger.  The excellent book They say/I say is based entirely on the notion that academic writing—you could generalize it to what the authors call persuasive writing, in which you try to convince the reader of your particular take on something—is most convincingly done by starting out showing how the position that you’re going to take runs counter to positions that have been taken previously.  Americans think that the French are rude, but they’re actually hyper-polite–you just have to know the differences between American etiquette and French etiquette to recognize it.  Paris is always portrayed as the city where everyone strolls leisurely in a state of Zen-like relaxation, but Monday through Friday, we’re all just rushing to work as fast as we can and hoping that nobody gums up the works by throwing themselves on the train tracks.  “French people are so nice” is not an interesting topic.  “Americans think that French people are rude, but they’re actually very polite”–that’s a bit more interesting.  Why?  Because of the contrast between what you thought was true, and what’s going to be asserted.

Something that everyone knows is true: suicide is the coward’s way out.  I went looking for pictures to illustrate this statement with on Google Images, and they are legion.  Among the memes, posters, and tweets that I found were sentiments communicated by the following:

  • Albert Camus, second-youngest winner of the Nobel Prize for Literature
  • Seneca (not sure whether the Elder or the Younger–ironic, if the latter)
  • Some sort of fuzzy baby bird
  • Some entertainer I’ve never heard of
  • A pretty girl
  • A handsome guy
  • A guru

However: despite the fact that “everyone knows” that suicide is for cowards, this turns out to be bullshit.  In fact, the distribution of suicide in society (in American society, at any rate) is not random: people with positions in life that we typically think of as requiring extra amounts of courage—police, military people and military veterans, prisoners, murderers–are more likely than the average person to kill themselves.  Lemme run a list of occupations with elevated suicide rates by you (lemme and to run something by someone explained below in the English notes):

Picture source:
  • Military veterans
  • Police
  • Prisoners (an atypical study in that it includes data from 12 countries)
  • Among prisoners, people convicted of murder or manslaughter have higher rates of suicide than others
  • Sex workers (not sure why this paper gets cited a lot when people write about suicide rates in sex workers, but it does–my literature search kept leading me back to it.  Google Scholar shows that it’s been cited 373 times.  For comparison: my most heavily-cited article has been cited 275 times.)
  • Physicians

Think about this: the instinct to preserve your life is pretty much the strongest instinct that any living thing has.  It takes a tremendous amount of courage to go against it.  Over and over, when you look at the statistics from the studies that I listed above, what you see is the following: people who are a lot tougher than you and me are at a higher risk for suicide than you are.  It’s not a game for cowards–it’s really and truly a game for people who can look death right in the eye, and step forward, against every living being’s strongest instinct.

Thomas Joiner’s book Why people die by suicide explores this issue and its implications in depth.  He makes the point that killing oneself requires overcoming what may be the strongest drive in human beings: self-preservation.  (You don’t buy it?  Pick up a razor and see if you can slit your wrists.  No, not your wrists–just a little cut someplace where it won’t do you any damage.  Did you do it?  I thought not.)  His theory is that self-harm is essentially a learned behavior—that you must, in essence, be trained, or train yourself, to have the capacity to kill yourself.  You must be fearless, and you must be able to tolerate pain.  See that photo of a Japanese army officer about to kill himself?  See that diagonal straight line to the left of the photograph?  It’s a rifle.  After the officer opens himself up with his sword, that guy shoots him in the head to put him out of his misery.  (Back in the day, after you cut yourself open, one of your samurai buddies took your head off with a sword.)  Yes, I spared you the next picture in the series.  If you want to see it, follow the link.

So, why do people kill themselves?  I don’t claim to know.  I’ll give you a quote from the Harvard University Press notes on Joiner’s book:

Among the many people who have considered, attempted, or died by suicide, he finds three factors that mark those most at risk of death: the feeling of being a burden on loved ones; the sense of isolation; and, chillingly, the learned ability to hurt oneself.

(You might notice that I’m not crediting the sources of any of the “suicide is for cowards” memes that I included earlier in this post–from my point of view, the authors are welcome to shove the lack of a citation up their butts.)

If you know someone who killed themselves, the two questions that you’ve been asking yourself ever since are probably:

  • Why did they do it?
  • Could I have prevented it?

With respect to the first question: probably nobody knows but the person themself.  With respect to the second question: probably not.  We can’t know what could have happened, right?  But, I can tell you this: psychiatrists, psychologists, and licensed clinical social workers spend years learning how to prevent suicide, but they definitely cannot always do it.  I don’t know how you could expect yourself to do any better than a trained psychiatrist.

You might be able to do something about somebody else’s suicide, though.  For psychiatric disorders, language plays a central role in diagnosis. Applying language technology in this domain could potentially have an enormous impact.

Seulement voilà–the thing is–if you want to use computers to do things with language, then you need language data with which to train and evaluate the computer.  Until recently, if you wanted to get your hands on actual data, here’s what was available: you could obtain a set of suicide notes collected and annotated by my colleague John Pestian at Cincinnati Children’s Hospital Medical Center (and me and a bunch of other people). That data has been revealing, and we’ve learnt things about suicide from that data that we didn’t know.  But, that data was hard to come by.  Putting that data set together took years (if you can read French, you can find a paper here on some of the issues), and if you want to get your hands on it, you need to go through some hoops to demonstrate that you have a legitimate research interest, that you will not be posting people’s suicide notes on Facebook or Pinterest, and so on.

Social media has completely changed the landscape of the availability of linguistic data, including linguistic data related to depression and suicide.  In fact, the past couple years have seen an explosion of work on the linguistic characteristics of mental states associated with mental illness, including suicidality.  But, you can’t just grab it–just because people post their lives on social media doesn’t mean that it’s OK for you to use that stuff for your own purposes.  Ethical questions abound, and that’s just as true for the tweets, posts, or whatever of the psychiatrically healthy controls as it is for those with mental illness, suicidal behavior, or whatever.  And that’s where you come in. is a group that collects social media data, particularly linguistic data, for use in doing research like the stuff that I’ve described here with the goal of suicide prevention.  They want your data if you have ever flirted with suicide, but they want your data if you haven’t, too–you always need something to compare to, and people like me need data from non-suicidal people to compare to the data from suicidal people.  That could be you!  Check it out:

Not that you care about my point of view, but: I support people’s right to kill themselves. As the famous suicidologist Ed Shneidman put it in an interview with my colleague John Pestian: you ask me how many suicides I want?  I want zero.  But, I support the right to do it.  

This is a pretty prevalent attitude amongst suicide researchers.  My goal here is to give the person a chance to be shown that they have some options that they might not know they have–but, in truth, my motivation is less to prevent your death than it is to spare the people that you would leave behind the pain of losing you.  Ultimately, you have the right to end your life, if you choose to do so.  But: you probably won’t do it unless you believe that the lives of your loved ones will be improved by your death.  It won’t be, and it’s actually for them that I do the work that I do in this area.

English notes (no French notes today)

  • lemme: an informal way of writing the informal pronunciation of “let me.”  Don’t use this in work- or school-related emails, but it’s totally fine in casual written communication.
  • to run something by someone: to get someone’s input or permission.  I’m going to run my abstracts by Pierre and see what comments he has.

In fact, there are a bazillion expressions with run and a preposition.  (You might remember that bazillion is a word that means a large, but unspecified, number.)  Off the top of my head:

  • to run through [a person]: to pierce completely, going in one side and out the other, as with a sword or spear.
  • to run through [information, instructions]: to discuss or present, typically all of it, but not necessarily in a lot of depth.  “Before we get on the boat, let’s run through what to do in case someone falls overboard.”
  • to run by [location]: to go someplace, but not stay there very long.  “I’m going to run by the 7-11 and pick up a lightbulb.”
  • to run [something] by [someone]: to get someone’s input, or permission, or opinion.  “I’m going to run my abstracts by Pierre and see what comments he has.”
  • to run over [someone/something]: to pass over with a car.  “Crap, I ran over a skunk, and now my car stinks to high heaven.”
  • to run over [information]: to “go over” information quickly.  “I’ll just run over my notes quickly, and then I’ll go to the presentation.”
  • to run up [a bill]: to accumulate charges.  “I ran up a phone bill like you wouldn’t believe in Guatemala–insane roaming charges…”
  • to run down: to locate by searching, with implication that the searching is long or laborious.  “I finally ran down the guy who could issue my carte de séjour.”  “The police finally ran him down.”

The most boring neighborhood in Paris

I live in the most boring neighborhood in Paris, but that doesn’t mean there’s nothing going on.

My little street, Christmas Eve or so.

The 15th arrondissement, where I live when I’m in France, is so boring that it typically doesn’t show up in guidebooks for tourists.  A friend, Paris born and raised, once said this to me about the 15th: the rest of us don’t even think about it.  

And yet: one of the things that makes Paris what it is to me is that anywhere you go, there’s a story. This morning on the way to the metro, I heard music and turned to see a taxi driver waiting at the stand–playing an electric guitar in the driver’s seat. Further down the block from my apartment is a little park. There used to be a château there, but after the revolution of 1789 it got turned into a gunpowder factory, and early one morning, it blew up. There were surprisingly few casualties–about a hundred–but they say that people found bits of clothes and body parts across the Seine in what is now the 16th.

Continue down the street and you get to the Dupleix metro station.  It’s on the number 6 line, which follows one of the old city walls, and right outside the exit of the metro station was, for a long time, the place where you got taken to face the firing squad.

Turn left and you’ll soon find the rue du Commerce on your right. The famous British author George Orwell washed dishes there before he became a famous British author–if you are a Parisophile and you haven’t read his book about that time of his life, Down and out in Paris and London, you really should.  And, although it would be tough to get further from an haute couture neighborhood than mine, this morning I was treated to the sight of a little old lady coming down the street in a full-length leopard skin coat. Matching high-heeled leopard skin boots.  Oh–and matching leopard skin shopping bag.

Indeed, there’s a story everywhere you go in this city, and sometimes that story is personal. The 16th arrondissement (where the body parts landed when the gunpowder factory blew up in the 15th) really is the most boring arrondissement in Paris, but I never mind going there, because it’s where my grandfather lived.

I edited just a bit what my Paris-born-and-raised friend said. What she really said was this: people who live in the 15th love it, but the rest of us don’t even think about it.   She’s definitely right about one thing–those of us who live here love it.

English notes (French notes follow)

To show up: to appear. Usually the subject is a human:

  • Party at my place Saturday night!  Show up at 8…means that you should arrive at my house at 8.
  • Fifty percent of life is just showing up…means something like a lot of what it takes in life is to just try.  (A Robin Williams quote, I think.)

…but the subject doesn’t have to be human, by any means:

  • My dog ran off last night, but thank God, he showed up on the back porch this morning, smelling like a garbage dump and looking pretty pleased with himself.
  • I was freaked because I lost my wallet, but then it showed up on my desk.
  • How it was used in the post: The 15th arrondissement, where I live when I’m in France, is so boring that it typically doesn’t show up in guidebooks for tourists.  

to be born and raised somewhere: to be completely native to a place, because of having been born there and also having grown up there.  You can use it with a normal sentence structure:

    • (The Steelers are the football team of the city of Pittsburgh, in Pennsylvania. In reality, the Eagles are the best football team in Pennsylvania, of course.)





There’s a more elegant construction that I like, in which the location precedes born and raised:

    • (Cleveland is a city in the north of the state of Ohio.)

    • (The Bay Area is the area around San Francisco. “Sammich” is slang for “sandwich.”)

  • (“GF” is girlfriend.  Oakland is a city in California.)

  • How it was used in the post: I edited just a bit what my Paris-born-and-raised friend said. What she really said was this: people who live in the 15th love it, but the rest of us don’t even think about it.   She’s definitely right about one thing–those of us who live here love it.

French notes:

le parigot/la parigote: Parisian.  Pejorative.  I wear it with pride.

très 16e: “very 16th”–in English, we would probably say “bouge,” or “boozh,” or something.


Light verbs and suicide

If you go to PubMed/MEDLINE, the US National Library of Medicine’s giant repository of (and search engine for) biomedical publications, and look around for papers on language and suicide, you won’t find that much on what you’re probably expecting: research on the language of suicidal people.  What you will find is papers on how we talk about suicide.

The major issue has to do with the ways that we refer to the act of suicide.  In English, your basic options are:

  • to commit suicide
  • to kill oneself
  • to take one’s own life
  • to do oneself in
  • to die by one’s own hand

The problem is that first one: to commit suicide.  People who work in the field of suicide in any capacity–prevention, treatment, research, whatever–aren’t very fond of it.  The reason: it stigmatizes the act.  In English, things that you commit are bad.

Now, you’re thinking: commit isn’t always bad, right?  You can commit to doing something, commit to someone, commit something to memory.  No question!  But, we’re seeing two very different linguistic phenomena here.  The bad commit has a very specific kind of structure: it’s what we call a light verb.  

Light verbs are a special kind of verb.  They don’t have very much meaning by themselves.  Rather, they occur with some other verb, and it’s that verb that gives the expression its meaning.  For example: in English, the verb to take can be a light verb.  (It isn’t always a light verb–but, there are many times when it is.)  Here are some English-language expressions in which to take is a light verb:

  • to take a bath
  • to take a beating
  • to take a break

What does to take mean in to take a bath, to take a beating, and to take a break?  I suggest to you that it doesn’t mean very much at all.  Rather, it’s bath, beating, and break that contain the meanings of those expressions.  (I’ve put a technical definition at the end of the page, if you are into that kinda thing.)

Light verb constructions are not a rare phenomenon.  Here are a bunch more expressions in English that are what we call light verb constructions (that means the light verb plus whatever it is that it combines with–in English, typically a noun) in which the light verb is take:

  • to take a breather
  • to take a bus/taxi/shuttle/plane/train
  • to take a dump
  • to take a gander (at)
  • to take a minute
  • to take a pee
  • to take a piss
  • to take a shit
  • to take a vacation
  • to take a walk
  • to take pity (on)

…and, it’s not like take is the only light verb in English.  In fact, we have several.  Some examples:

  • to make a decision, to make an offer, to make haste, to make peepee
  • to give a shit, to give (someone) a hand, to give a damn, to give a fuck, to give birth (plus some obscene ones that I’m leaving out)
  • to get dressed, to get ready, to get angry/mad, to get nasty, to get drunk, to get high, to get sober
  • to do battle, to do business, to do your business (yes, those are different)
  • to have a ball, to have a blast, to have fun, to have a good time, to have a headache, to have mercy, to have sex, to take a piss
  • to take action, to take a seat, to take one’s time, to take note, to take notes (yes, those are different), to take a look (at)

With that data in hand, we can see the difference between the commit of to commit suicide and the non-bad senses of commit in to commit to memory, to commit to a person, to commit to a deadline…  none of those have that verb + noun structure.  They’re all commit + to something.

Some more (or less) useful light verb constructions in French: faire la vaisselle: to do the dishes faire la lessive: to do the laundry faire [+université]: to go to a university (J’ai fait William and Mary, I went to William and Mary) faire du [+musical instrument]: to play an instrument (as in to do so habitually) faire du diabète: to have diabetes





One of the interesting things about light verb constructions is that they don’t show a basic characteristic that we expect to see in language: compositionality.  To paraphrase from a previous post:

Compositionality is the process of meaning being produced by something that you could think of as similar to addition (technically, it’s a more general “function,” but “addition” will work for our positions–linguists, no hate mail, please).  Take a situation where my dog stole some butter.  The semantics are: there’s a dog, it’s my dog, there’s some butter, and the butter was taken, by the dog, without permission.  (You can’t believe how horrible the poo that I had to pick up over the course of the next 24 hours was.)  My dog’s name is Khani, so I might say something like this: Khani stole some butter.  The idea behind compositionality is that the meaning of Khani stole the butter is the adding together of the meanings of Khani, steal, butter, and the meaning of being in the subject position versus the object position of an active, transitive sentence.

So: we have this basic expectation that meaning in language will be compositional, and as linguists, as computer science people who work with human language, and as philosophers, we have a hell of a lot riding on that expectation.

In that context, the cool thing about light verb constructions is this: they’re not compositional.  There is pretty much no way to get any systematic interpretation of the combinations of light verbs and their nouns.  Pause and ponder:

  1. To make peepee and to take a piss: they mean the same thing (the difference is that one is child language and the other is too impolite to say in front of your grandmother).  Peepee and piss mean the same thing–again, one is child language, and the other one is too impolite to say in front of your grandmother.  Your assignment: tell me what make and take contribute to the meaning of those expressions–that is, explain to me what their contribution is to the composition of the verb and the noun.  My point: peepee and piss mean the same things, and to make peepee and to take a piss mean the same things–how do you explain that, if make and take each contribute something to the meaning of those expressions?
  2. Consider to take a bath and to take a bus.  One of those is what you might think of as event–an act of bathing.  The other is a big, smelly thing that takes you to work.  (See how I slipped another take in there?  Different take–nobody said that linguistics was going to be easy.)  In take a bath and take a bus, your relationship with the two things is pretty different.  In the first case, you’re participating in an event, while in the second case, you’re making use of a mode of transportation.  Your assignment: tell me how that difference comes from the verb to take.

The answers:

  1. Trick question: as far as I know, make and take don’t contribute anything to the meanings of those expressions.  The meanings of the expressions are not compositional.
  2. Trick question: as far as I know, take doesn’t contribute anything to the meanings of those expressions.  The meanings of the expressions are not compositional.

So, back to to commit suicide.  As you might have noticed, the relationships between light verbs and their nouns are things that a child learning their native language just has to remember.  There’s nothing that the kid learns about their language that would let them infer or guess that they’re making peepee now, but they’ll be taking a piss when they grow up: they have to remember it when they’re exposed to it.  (I don’t mean to suggest here that children learn language by remembering stuff to which they’ve been exposed–we’ve talked about how very little of language-learning for children works that way.)

So, the verb + noun combinations in light verb constructions are pretty random.  The thing about commit is this: it’s a light verb, too, in constructions like to commit suicide.  Its noun is of a very specific kind, though: its verb is something bad.  Compare that with the light verb to have.  You can have a heart attack, you can have a migraine, or you can have a good time, or have sex.  No particular semantic consistency there–could be bad (heart attack, migraine), or it could be good (a good time, sex).  Here’s a list of the words that are statistically most closely associated with the verb to commit in the enTenTen corpus (a collection of 19.7 billion words of written English, available on the Search Engine web site; git is a computer science thing.  See this post.)

Top objects and subjects of the verb “commit” in 19.7 billion words of English. From the enTenTen corpus on the Sketch Engine web site.

What kinds of things get commited?  Crime, sin, murder, fraud, atrocity.  Who commits things?  Offender, defendant, criminal.  Not good.

So, what are we to make of to commit suicide?  Many people who work in the field (go do your own search on PubMed/MEDLINE if you’re interested, or just see here and here for examples) are of the opinion that use of the expression to commit suicide has the effect of stigmatizing the person who killed themself.  Is that a bad thing?  They think it is.  I think it is, too.  Now, does the person who killed themself care how you talk about them?  Certainly not–they’re dead.  But, that person’s mother, husband, son, daughter, cousin, aunt, uncle, best friend…  They do, and they’re not dead.  So, many people who work with suicide in some capacity would like to see that expression go away.  Here’s a very eloquent expression of the idea, from Doris Sommer-Rotenberg:

The expression “to commit suicide” is morally imprecise. Its connotation of illegality and dishonour intensifies the stigma attached to the one who has died as well as to those who have been traumatized by this loss. It does nothing to convey the fact that suicide is the tragic outcome of severe depressive illness and thus, like any other affliction of the body or mind, has in itself no moral weight.   —Doris Sommer-Rotenberg, Suicide and language

Who cares?  Sommer-Rotenberg again:

The rejection of the term “commit suicide” will help to replace silence and shame with discussion, interaction, insight and, ultimately, successful preventive research.

Now, I know what you’re thinking: you’re thinking that the language that we use doesn’t affect the way that we think.  You know what?  I agree with you.  However: even if how we talk about suicide doesn’t change the way that we think about it, I suggest to you–as a fellow student said to me in the basement of Oxley Hall one evening in our graduate student days: the fact that the language that we use doesn’t change our reality doesn’t change the fact that you can make someone feel bad with the language that you use, and you wouldn’t want to do that, right?  Well, of course not.  (I love my job, but my fellow ex-student’s job is definitely cooler–she’s the only speech therapist in New York City whose practice is exclusively concerned with transgendered people.)

There’s also controversy/discussion around the ways that we talk about what happens when people try to kill themselves, but don’t succeed–attempted suicide, unsuccessful attempt, failed attempt, failed suicide, and failed completion, versus completed suicide–the idea is that these expressions model suicide as a desirable act.  (If you fail to have a good time, fail to get into medical school, fail to convince someone to marry you, that’s a bad thing.) See here for a fuller discussion.

So, yeah: there’s probably more stuff in the National Library of Medicine’s repository on how we talk about suicide than there is on how people who are suicidal talk.  It would be great to change that, because if we knew more about how people who are suicidal use language, then we might be able to do a better job of preventing it.  Now, that means that we need language data from people who are or have been suicidal, right?  But, we also need data from people who aren’t suicidal–if you want to understand something, you usually need to compare it to something else, and in this case, that means comparing the language of suicidal people to the language of people who aren’t suicidal.

It happens that there’s an enormous amount of real, live language out there in the world on social media platforms.  It would be great for suicide researchers to have it, but there are ethical issues involved–just because someone puts their life out there on the web doesn’t give you the right to just grab it and do stuff with it.  However: you can donate your social media data to, a group that collects language from all kinds of people for social media research.  You can sign up with them here.  As the character Père LeFève says in Anne Marsella’s wonderful short story The Mission San Martin:

Best wishes to all of you who are still alive.  And if you’re yet alive, please give.

–Anne Marsella, The lost and found and other stories

No English notes as such today.  Instead, here’s some extra stuff for those of you who like to dive deeply into the linguistics of things.

One way of defining light verb construction:

Definition of light verb construction (LVC) from Light verb constructions in Romance: A syntactic analysis, by Josep Alba-Salas.

On the mechanics of how the meaning gets out of the noun and into the verb, so to speak:

Contrary to “prototypical” verbal constructions where the verb is the syntactic and semantic head of the sentence and its syntactic dependents are also its semantic arguments, in LVCs, one of the syntactic dependents of the verb, generally its direct object, functions as the semantic head, projecting its own argument structure, while the verb, which is semantically “light”, bears only inflection and projects no argument structure.

− Given the fact that the verb has no semantic contribution or rather its semantic contribution is quite weak, it cannot be selected lexically, that is on the basis of its semantic contribution. The combination of a particular predicative noun (PN) with a particular light verb (LV) is thus a matter of idiosyncrasy: The noun and the verb form a collocation that must be stored in the lexicon.

Pollet SAMVELIAN, Laurence DANLOS, and Benoît SAGOT, On the predictability of light verbs

Why would you need to posit the existence of such a thing?  From Samvelian et al.:

  1. l’agression de Luc contre Marie (the attack of Luc against Mary)
  2. Luc a agressé Marie (Luc attacked Mary)
  3. Luc a commis une agression contre Marie (Luc committed an attack against Mary)
  4. l’agression que Luc a commise contre Marie (the attack Luc committed on Mary)

Note: you can also se commettre avec quelqu’un.