January 2016 – Zipf's Law

Never a lovely so real: talking about Nelson Algren’s Chicago in French

Picture source: “AWalkOnTheWildSide” by Source. Licensed under Fair use via Wikipedia – https://en.wikipedia.org/wiki/File:AWalkOnTheWildSide.jpg#/media/File:AWalkOnTheWildSide.jpg

“Like loving a woman with a broken nose, you may well find lovelier lovelies. But never a lovely so real.” –Nelson Algren, Chicago: City on the Make (1951)

I’m reading Bernard-Henri Lévy‘s American Vertigo, his reprise of Tocqueville‘s journey through the United States as described in the famous Democracy in America. In the chapter that I’m reading, he’s talking about meeting Chicago mayor Richard Daley, and the contrast between Chicago as Daley wanted it to be seen and the Chicago of Otto Preminger and James T. Farrell. He talks about the Chicago that was described in the novels of the great American novelist and Chicago-lover Nelson Algren. I was pleased to read this, as I had read a couple of Algren’s novels as a teenager. His shitty world of junkies, drunks, murderers, thieves, and other assorted low-lifes had a certain resonance for me as a problem child. I was struck by how much his photo on the back cover of one of his books looked like my father–short hair, glasses, borderline angry, smoking a cigarette, a book sticking out of the pocket of his Army field jacket (Algren was a stretcher-bearer during World War II)–and later, in my early college years, I flirted with the look myself, wanting to look like my father and hoping for a little bit of that Algren magic. (The book sticking out of my pocket was most likely Nausea, by Jean-Paul Sartre, for whom Algren’s long-time lover Simone de Beauvoir left him. The book sticking out of my father’s pocket… Hard to say. I’m going to guess the poetry of Fernando Pessoa, but it could have been anything. I’m typically a cheerful person, and left the borderline-angry part out, myself.)

Trying to find that old back-cover photograph of Algren, I read his Wikipedia page, and got the best treat of this whole trip down memory lane. It turns out that he is the source of one of my favorite quotes–see above about lovelies. Of course, there is no discovery without Zipf’s Law, and here are some of the words that I had to look up in Lévy’s description of Chicago:

…le Chicago des camés, des paumés, des putes, des freaks et des voyous peints par Nelson Algren.

…the Chicago of the junkies, of the lost souls, of the whores, of the freaks, and of the thugs painted by Nelson Algren.

–Bernard-Henri Lévy, American Vertigo

le camé: junkie.
le paumé: lost soul, drop-out.
le voyou: thug.

Definitions from WordReference.com.

Drunks on the subway: Paris edition

You wouldn’t be surprised to hear a raving drunk shush someone else on the Paris métro. Here’s why.

drunk dog — Picture source: https://www.pinterest.com/pin/430727151831775356/.

One evening in Paris I was riding the métro home, minding my own business, when a very, very drunk man got on. He was carrying an open bottle of some sort of hard liquor, and occasionally took a swig. He was so plastered that he could barely stay on his seat as the train swerved. He ranted incoherently–really incoherently. (After he left, I asked the guy next to me: Pardon me sir, was he speaking French? He gave me that look that people in Paris (and New York) give you when approached by a stranger before deciding that you’re OK, and then said: Of a sort.)

A young woman got on the train and took a seat. She had her phone to her ear, and was talking. The drunk, ranting guy leaned over, put his fingers to his lip, and said: Shhhhhh.

What the hell, you’re wondering? In a Parisian context, this actually wasn’t surprising at all. The general French approach to politeness (as I understand it) is: don’t do anything that would inconvenience the other person. This gets actualized in many ways—one that’s a surprise for us Americans is that you don’t bring a bottle of wine to dinner at someone’s house, because it would be a pain in the ass for the hostess to open it.

A very noticeable way that this works out is that in general, the French tend to communicate more quietly than Americans do. If you hear people talking and laughing as they walk down the street, they’re probably not French. And if you see someone talking on the phone on the train, the chances are excellent that they’re not French. It’s just not done.

So, in a French context, it wasn’t that bizarre for a shitfaced lunatic to interrupt his raving to say shhhhh to someone talking on the phone on the métro—he might have been hammered, but she was being rude. In America, someone would have said some equivalent of “it’s a free country, she can talk on the phone if she wants to.” People did hush him up when he got too carried away, but no one criticized him for saying shhhhh to the girl on the phone–that’s just logical.

All of this came to mind this morning because I happened to be thinking about how much I don’t care for hanging around loud people–and then thought about how much louder I am than anyone I know in France. When I get animated in a conversation in a cafe or something, I’m constantly having to remind myself to quiet down. I can’t imagine how miserably rude the people who are too loud for me must come across in France.

Having given you three adjectives that all mean drunk in (American) English slang, here are some French words related to inebriation. If you have some to add to the comments, it would be much appreciated!

ivre: drunk. (Not slang.)
soûl or saoule: drunk. (Also not slang.)
beurré: drunk. (Slang. Literally, “buttered.”)
bourré: drunk. (Slang. Literally, “stuffed” or “filled.”)

I should point out that American college students notoriously have far more slang words for “drunk” than the rest of us; I imagine that that’s more or less universal. (A common Linguistics 101 exercise is to send students home to collect those words in their dorm.) In the Navy, we might have said “three sheets to the wind,” but that’s almost literary for any other English speaker, I suspect.

Why English speakers like to speak French

Despite all of the trash-talking about the French by Republicans around the time of Bush’s ill-fated invasion of Iraq, Americans love to speak French. Here’s why.

One of my former coworkers, a linguist, used to say that her major annoyance in life was “linguists who speak French when it’s not necessary.” Despite all of the trash-talking about the French by (American) Republicans around the time of George Bush’s ill-fated invasion of Iraq, American love to speak French, whenever and to the fullest extent that they can. Here’s why. (Thanks to my French tutor (who shall remain anonymous, like this blog) for editing the French. Any remaining obscenities are purely my responsibility.)

En 1066, les Norvégiens ont attaqué le nord de l’Angleterre, alors que les Normans assaillaient le sud. Harold II, roi d’Angleterre, a vaincu les Norvégiens, mais il est arrivé tard sur la côte sud du pays, où il s’est battu contre l’armée de Guillaume le Conquérant, et est mort avec une flèche dans l’œil, écartelé par les chevaliers de Guillaume, selon… Tout le monde sait que l’anglais et le français ont beaucoup de vocabulaire en commun, et ça remonte à cette époque. Entre les 11ème et 14ème siècles, l’Angleterre était gouvernée par une aristocratie française. Pendant cette période, beaucoup de mots français sont entrés dans la langue anglaise.

Les langages changent au fil du temps, et les sens des mots changent aussi. Il arrive souvent que des mots similaires qui sont partagés par deux (ou plusieurs) langues évoluent différemment dans chaque langue. Alors, aujourd’hui il y a beaucoup de mots dans l’anglais et le français qui sont “les mêmes,” mais ils ont des sens différents. On appelle ces mots de “faux amis,” ou en anglais, “false cognates.” Ce sont les fléaux des étudiants du français (et aussi de l’anglais, j’imagine). Quelle française n’a pas dit “I don’t have any money” avec vingt dollars dans la main? Quel américain n’a pas dit “préservatif” au lieu de “conservateur?” Cependant, les liens entre les mots français et les mots anglais ne sont pas aléatoires. Il arrive souvent que les mots français en anglais appartiennent à un registre plutôt élevé. On a ces mots anglais qui sont français de souche : abattoir, ablution, abrasion. On a aussi des mots équivalents germaniques de souche—slaughterhouse, washing, scrape—et/mais, les mots anglais qui sont de souche française ont une saveur plus raffinée, plus formelle, plus…poétique.

C’est ce lien entre l’origine des mots et le feeling des mots en anglais qui donne la saveur de la langue française chez les Américains. En anglais je parle de the shade d’une couleur—mot de souche germanique. Mais, en français—la nuance. La nuance, c’est un mot qui existe en anglais, mais avec un autre sens et d’un registre très élevé. Personne ne parle de nuances en anglais, sauf les écrivains, les diplomates, les philosophes. Oh, là là—je me sens plus chic chaque fois que je dis nuance en français. Ça arrive même entre les activités les plus quotidiennes. En anglais, j’achète du gas—un mot court, rude, avec des liens avec l’indigestion et les pets. En français: de l’essence. Qui parle d’essence en anglais? Les parfumeurs, les chefs de cuisine, les philosophes.

La langue française, alors, chez nous les Américains, c’est comme de la poésie. Parlons des accros. En parlant des accros, d’abord on a le mot héroïne. Le mot indiquant la drogue la plus impliquée dans la toxicomanie, et je l’écris avec un tréma. Quels mots ont un tréma en anglais? (Abélard et) Heloïse, c’est tout. Pure poésie. Envisageons les voleurs. Qu’est-ce qu’un voleur fait en français? Selon une analyse statistique, il présume, il dérobe, il pénètre. Qui presumes en anglais? I would never presume to be so forward—très élégant. Qui disrobes en anglais? Les patients chez le médecin, les mannequins chez l’atelier du peintre. (Morphologie un peu différente, sens très différent, sonorité très similaire.) Quoi penetrates en anglais? Light, sunlight, intelligence—pas quelque enculé de voleur. En effet, “la poésie, c’est la langue française.”

In 1066, the Norwegians attacked the north of England, while the Normans attacked the south. Harold II, the king of England, defeated the Norwegians, but arrived late to the southern coast of the country, where he fought the army of William the Conqueror and died with an arrow in his eye, or quartered by William’s knights, depending on which story you believe… Everyone knows that English and French share a lot of vocabulary, and this is due to that outcome. Between the 11th and 14th centuries, England was governed by a French aristrocracy. During that period, many French words entered the English language.

With the passage of time, languages change, and the meanings of words change, as well. It often happens that similar words that are shared by two (or more) languages evolve differently in each language. So, today there are a lot of words in English and in French that are “the same,” but that have different meanings. We call these words faux amis (“false friends”), or in English, “false cognates.” They are the curse of students of French (and also of English, I imagine). What French person hasn’t said “I don’t have any money” with $20 in her hand? (In French, monnaie is change.) What American hasn’t said “condom” when he meant to say “preservative”? (In French, préservatif means condom. A preservative is conservateur.) However, the links between the French words and and the English words aren’t random. It often happens that French words in English belong to a higher register. We have these French-origin words in English: abattoir, ablution, abrasion. We also have equivalents of Germanic origin: slaughterhouse, washing, scrape. The French-origin words have a flavor that is more sophisticated, more formal, more…poetic.

It’s this relationship between the origin of words and the feeling of words in English that gives the French language its flavor for Americans. In English, I talk about the shade of a color–a word of Germanic origin. But, in French: la nuance. Nuance is a word that exists in English, but that has another meaning and is of a higher register. Nobody talks about nuances in English except writers, diplomats, and philosophers. Oh-la-la–I feel more chic every time I say nuance in French. This phenomenon occurs even during the most banal activities. In English, I buy gas–a short word, with associations to indigestion and farts. In French, I buy l’essence. Who speaks of essence in English? Perfumers, chefs, philosophers.

So, for us Americans, the French language is like poetry. Let’s talk about junkies. When we talk about junkies, the first word that we come across is going to be héroïne. It’s a word that means the drug that is most involved in drug addition–and you write it with an umlaut. What words have umlauts in English? Outside of the New Yorker magazine (a famously highbrow publication), it’s (Abélard and) Heloïse, and that’s it. Pure poetry. Consider thieves. What does a thief do in French? I did a statistical analysis of the language that is used to talk about thieves in French. In French, a thief présume, he dérobe, he pénètre. Who presumes in English? I would never presume to be so forward–very elegant. Who disrobes in English? Patients in a doctor’s office, or a model in a painter’s studio. (Different morphology, very different meaning–dérober is “to steal”–but very similar sound.) What penetrates in English? Light, sunlight, intelligence–not some fucking thief. Truly, “poetry is the French language.” (Yes, that’s a Camus reference.)

A historian, a philosopher, and some prison guards

Alexis de Tocqueville‘s book Democracy in America was written in the 1830s, when our country was young and democratic government was only about 50 years old both in the United States and in France. It is one of the classic books on American society. Wikipedia says that it is still required reading for political science and social science majors in American universities, and you probably at least heard of it in college.

Although de Tocqueville wrote about pretty much every aspect of American society, his actual mission was to study our prison system. So, when the French philosopher Bernard-Henry Lévy came to the US to repeat de Tocqueville’s journey, he started at Riker’s Island, the infamous New York prison. I’m reading the book that he wrote about his visit to the US, and of course Zipf’s Law is a prominent part of the experience. Here are some of the words that I had to look up before BHL even got past the reception area:

Barbelés électrifiés. Hauts murs. Un check-point, comme à l’orée d’une zone de guerre, où se croisent les matons, presque tous noirs, qui viennent prendre leur service et, en sens inverse, entassés dans des bus grillagés qui ressemblent à des autocars scolaires, les prisionniers…

Electrified barbed wire. High walls. A check-point, like at the edge of a war zone, where the screws, almost all black, who come to [not sure what prendre leur service means], run into each other, and, in the opposite direction, crammed into fenced-in busses that resemble school buses, the prisoners…

–Bernard-Henri Lévy, American Vertigo

l’orée (n.f.): concrete meanings: edge, fringe, periphery, outskirts. Abstract meanings: brink, cusp, threshold, verge.
à l’orée de: at the edge of, on the outskirts of.
le barbelé: barbed wire. Pronunciation from Collins: [baʀbəle].
le fil barbelé: barbed wire.
barbelé(e): barbed.
le/la maton(nne): “screw” (slang word for a prison guard).
en sens inverse: in the opposite direction. (Linguee.fr)
entasser dans: to cram into.
grillager: to put wire fencing around.

Un dortoir plus soigné, aux draps nets, où un écriteau indique, comme dans les bars de Manhattan, que la zone est << smoke free >>.

A neater dormitory, with clean sheets, where a sign indicates, like in the bars of Manhattan, that the zone is “smoke-free.”

–Bernard-Henri Lévy, American Vertigo

le dortoir: dormitory.
les draps (n.m.pl.): sheets, bed linens.
net: clean. (Lots of other meanings, too.)
l’écriteau (n.m.): banner, sign.

We haven’t even gotten to the interesting words yet–handcuffs, razor blade, newbie, punch. I think you can see where this is going. In the meantime: if you haven’t clicked on the link to Riker’s Island in this post, you really should.

Note: definitions from WordReference.com.

Brigitte gets her hair cut, I say something stupid, and we explore causation in French

One day Brigitte walked into the office looking even more fetching than usual. T’as coupé les cheveux? I asked–did you cut your hair? Je me suis fait couper les cheveux, she corrected me–I had my hair cut. In English, you could say either (as well as some other stuff, like I got my hair cut or I got a haircut or (for a woman, but not a man) I had my hair done, although that’s a bit different, as it could involve things like curling without actually cutting), but in French, if it’s a “caused action,” you have to use the faire construction.

This can actually be a fairly complex construction, in French as well as in English. Laura Lawless‘s page on About.com breaks it down into four possibilities:

The thing that is being acted on is being expressed, but not doing the action.
The thing that is doing the action is being expressed, but not the thing that is being acted on.
The thing that is doing the action and the thing that is being acted on are both expressed.
The one exceptional expression faire voir, “to let someone see something” or to “show someone something.”

Let’s work through these. They all have one thing in common: the verb faire will be followed by an infinitive. So: Je me suis fait couper les cheveux. If you’re doing Laura Lawless’s first option–only mentioning the thing to which the action will be done–you have this formula: faire + infinite + object. For example:

A l’international, Interflora vous permet de faire livrer des fleurs dans plus de 140 pays grâce à un réseau mondial qui regroupe 45 000 artisans fleuristes.

“Internationally, Interflora lets you have flowers delivered in more than 140 countries, thanks to a world-wide network that brings together 45,000 florist artists.”

–Source: http://www.interflora.fr/

Let’s suppose that you’re only going to mention the person (or whatever) you you’re going to cause to do the action. It’s actually the same formula: faire + infinitive + actor. Google gave me this autocomplete:

Screenshot 2016-01-26 04.26.10 — “How to make a teenager study.” Picture source: Google autocomplete screen shot.

What if we want to express both the person (or thing) who we’re going to make do the action, and also the thing to which the action will be done? Now the formula gets interesting. (OK: I freely admit that my definition of “interesting” might be a bit different from yours.) Now we have faire + infinitive + object + à/par + actor. What that means: we’ll have faire + infinitive, as always. Then we have the person or thing that is being acted on. Then we have à or par, followed by the actor. Let’s see some examples. I’m going to borrow/steal them from Laura Lawless’s page, because searching for these things is beastly and I prefer not to make up examples myself:

Je fais laver la voiture par/à David.
I’m having David wash the car.

Il fait réparer la machine par/à sa sœur.
He’s having his sister fix the machine.

How about if we have pronouns? Negation? Reflexives? (Advice from a linguist: when you’re learning a new verbal construction, learn the negated, pronominal, and reflexive forms sooner rather than later.)

Here’s a good example of a negation. Moral of this story: the negative goes on the verb avoir.

En deux ans et demi mon travaille est irréprochable mais j’ai eu quelques rares absenses dut à une maladie, que je n’ai pas fait attester par un médicin.

For two and a half years my work has been flawless but I have had some occasional absences due to an illness that I didn’t have a doctor vouch for. Note: absenses should be absences, travaille should be travail, and I think dut should be dû.

Source: http://www.lesocial.fr/forums/19-2147-5-contrat-vacataire-en-mairie

Here’s a good example of a pronominal actor, from this web page on how to apologize via text message. Moral of the story: the pronoun goes on faire.

Je ne voulais pas te faire souffrir. S’il te plaît pardonne moi. Je ne sais pas comment te dire que je suis vraiment désolé.

I didn’t want to make you suffer. Please forgive me. I don’t know how to tell you that I’m really sorry.

Here’s a good example of a pronominal acted-on. Moral of the story: it’s a direct object pronoun–in this case, la.

Screenshot 2016-01-26 16.26.11 — “Can you send me your corrected composition? I’d like to have my students read it.” Picture source: Screen shot of an email from my French tutor.

Here’s an example of a situation where you’re going to have something done to or for yourself. Moral of the story: the reflexive particle (in this case, me) goes on faire.

Arrête-moi, je vais me faire tatouer…
Stop me, I’m going to have myself tattooed…

Source: http://bescherelletamere.fr/arrete-moi-je-vais-me-faire-tatouer/

The final faire causatif construction that Laura Lawless tells us about is faire voir: let (me) see. Faites voir! is the formal form, and fais voir! is the informal form. They both mean “let me see!”

Can these constructions be ambiguous? I’ll bet they can. Consider this example from ibo.org, which I found thanks to the marvelous linguee.fr web site:

Puis-je faire envoyer mon relevé de notes à mon adresse personnelle ?

Can I have my transcript sent to my personal address?

Could that possibly be interpreted as “Can I have my transcript sent by my personal address?” I’ll have to ask native speakers to jump in here, but it almost certainly can. A human wouldn’t make this mistake, but a computer has no way to avoid it without some knowledge of the kinds of things that can send things, and the kinds of things that can be sent. We talked about this kind of issue for computer interpretation of language here. (I saw a great example of this with Hollande, the current president of France, but can’t find it now.) There’s plenty more to know about the faire causatif construction–what if you’re having something done to or for yourself? What if there are two pronouns (“I’m making her do it”)? How about passives? If you want to know more about how all of these things work in the faire causatif, do check out Laura Lawless‘s page on About.com–it’s really very clear.

Zipf’s Law, the Poisson Distribution, reflexive verbs, and terrorism in the age of social media

Screenshot 2016-01-25 23.09.23 — “Islamic State dramatizes the macabre will and testament of the terrorists of the Paris attacks.” This is the *mettre en scène* (non-reflexive) form of the expression. Picture source: screen shot of http://www.bfmtv.com/societe/dans-une-nouvelle-video-daesh-met-en-scene-les-auteurs-des-attentats-de-paris-946083.html.

By now, we know what goes hand-in-hand with Zipf’s Law: the Poisson Distribution. Zipf’s Law explains why we run into words that we don’t know in a foreign language every stinking day, and the Poisson Distribution shows how even rare events can come in clusters. Three rock stars die in one month, and the like. This morning I ran into two occurrences of an expression that I’d never seen before at all. There was an interesting twist, in that it’s actually two expressions, one with a regular verb, and one with the reflexive form of the same verb. Reflexive verbs in French can refer to performing an action on oneself–je me mouche “I blow my nose,” je mouche le bébé “I blow the baby’s nose” (no, I didn’t make that up–look here). In this case, the meaning of moucher is the same–it’s just a question of whose nose is getting blown. However, non-reflexive and reflexive verbs can also have different meanings, and that’s the case with the expression that had me going to the dictionary before I even had breakfast this morning.

Mettre is one of those common and rather irregular verbs that shows up in a bazillion expressions. This one has two forms. The non-reflexive, mettre en scène, means to stage or to dramatize (definitions from WordReference.com). The reflexive form, se mettre en scène, is to put on a performance. I saw it on Twitter today: L’atroce vidéo de l’Etat Islamique montre que nous avons changé d’époque. L’ultraviolence la plus sordide se met en scène façon Hollywood. “The atrocious Islamic State video shows that we have changed eras. The most sordid ultraviolence puts on a show Hollywood-style.”

I wish that I had some clever way to wrap up this discussion, but learning yet more vocabulary by way of terrorism just depresses me. Yuck. If you’re interested in theorizing about terrorism and the media in general and social media in particular, try this Wikipedia page for starters. Sigh…

mettre en scène: to stage, to dramatize.
se mettre en scène: to put on a performance.
la mise en scène: staging, directing; dramatization; stageplay, stage direction.

Screenshot 2016-01-25 23.15.07

Doing computational lexical semantics with your web browser: An approach to using data to build semantic representations

Here’s how you can do computational lexical semantics in the comfort of your own home–and how to talk about it in French.

A lot of my work involves something called lexical semantics. Lexical semantics is the study of how words mean things. (That means that there’s some interaction with the question of how sentences mean things, since part of the meaning of a sentence comes from the words that it contains, but in lexical semantics, the focus is on the words and how they contribute to and interact with the semantics and the syntax (the phrasal relationships) of a sentence.) In particular, I do something called computational lexical semantics. That means that I use large bodies of data as a crucial part of my work, and I evaluate my work in part by trying to use it as the basis of computer problems. If that doesn’t work, then I figure that what I’ve done needs to be improved.

My advisor is one of the world experts on computational lexical semantics. (I won’t name her, since I try to keep this blog anonymous.) As far as I know, she was the first person to demonstrate that large bodies of naturally-occurring data can, in fact, be used to test theories of lexical semantics. This was important because semantic theories often haven’t really been tested in any way that would count as a “test” in science, and as we’ve seen in other posts, linguistics is the scientific study of language. She often says that semantics is not a suitable subject of study for linguistics, since it’s so subjective. I’m never sure whether or not she’s kidding; regardless of whether she is or not, one of my professional ambitions is to take the subjectivity out of computational lexical semantics.

Part of my approach to that has been to try to develop a systematic methodology for developing semantic representations of words. In particular, I work with verbs, and with nouns that are derived from those verbs–for example, the verb phosphorylate (I specialize in biomedical language) and the related noun phosphorylation, or the verb receive and the related noun receptor. (You’ll notice that there are different relationships between the verb and the noun in the two examples–phosphorylation is a noun that refers to the action of the verb, while receptor is a noun that refers to the thing that does the receiving.)

One of the bedrocks of my approach is that I try to base my representations of the meanings of words on data that I didn’t come up with myself. (Note that I didn’t invent this idea, or any of the other aspects of the approach that I describe here—this is just my recipe for putting them all together.) I mostly work with scientific journal articles. There are two parts to what I do:

Coming up with the representation of the meaning of the verb (or noun).
Coming up with examples that let me test the representation, both by providing examples of the different effects that I think that the meaning of the verb has on how it behaves on sentences, and also doing a quick check to make sure that I don’t see any examples that argue against my representation of the meaning of the verb.

This is a pretty iterative, complementary process–I typically start out by looking at a bunch of examples of the verb to get a general sense of how it works, then write up a quick representation of the semantics, and then look for examples more systematically to see if my representation works. Some of the goals that I keep in mind when I’m searching for these examples are:

I want to know whether or not humans can be the agent of this verb.
I want to be sure to get the full range of prepositional complements, as these can mark a variety of semantic relations.
I want to get a variety of semantic classes as the subject and the object of the verb.
If there is a deverbal nominalization, I want to get that, too.

Knowing whether or not humans can be the agent of the verb is important to me for a number of reasons. People often question whether or not humans can perform the actions of particular verbs in the biomedical domain. For example, Wikipedia describes the action of the verb phosphorylate as ” the addition of a phosphoryl group (PO₃²⁻) to a molecule. ” That doesn’t sound like something that a human could do, right? But, you can find sentences like this:

In order to determine the number of phosphorylated sites in human cardiac MyBP-C samples, we phosphorylated the recombinant MyBP-C fragment, C0-C2 (1-453) with PKA using (gamma32)P-ATP up to 3.5 mol Pi/mol C0-C2.

–Source: http://www.ncbi.nlm.nih.gov/pubmed/18573260

I’m trying to represent the semantics of the language, not the semantics of phosphorylation, so I need to take into account all of the data about the language, and that includes this kind of counter-intuitive use of the verb. Why do we care about humans so much, though? It’s because humans are the prototypical example of things that act with volition, or as a result of their will–what we call agents in the English-language terminology of linguistics–and agents get represented in a special way in lexical semantics. So, we need to know if there can be a human agent for these biomedical verbs so that we can know if they can have agents at all, essentially.

Getting examples of the full range of prepositional complements (e.g. phosphorylate at, phosphorylate to) is important to me because different prepositions sometimes mark different aspects of the semantics of the verb. For example, when we investigate phosphorylate at, we see that the semantics of phosphorylation involve a specific location on a molecule, and when we investigate phosphorylate to, we see that the semantics of phosphorylation involve something becoming something else–the to marks not a location at which the molecule ends up, but what the molecule becomes–like I had it converted to a round-trip ticket, in “normal” English.

Getting examples of a variety of semantic classes as the subject and the object of the verb is important to me for two reasons. One reason is that I’m doing computational lexical semantics, specifically, which, as you might recall, means that I test my semantic representations by trying to use them as the basis of a computer problem. I know that it can be important to know what kinds of things are taking part in the action of a verb in order to know how to interpret both the verb itself, and the sentence that it occurs in. Imagine these situations: the author finished the book, the student finished the book, and the goat finished the book. In the first, this means that the author completed the writing of the book. In the second, this means that the student completed the reading of the book. In the third, this means that the goat finished the eating of the book. Can there be other interpretations of these sentences? Of course–authors also read, students also write, and in a work of fiction, you could certainly imagine a goat reading a book. But, none of these are the intuitively obvious interpretations of those sentences, and the reason for that is the expectations that the different subjects—author, student, goat–lend to our interpretations of the sentences.) The other reason that I want to get a decent range of the types of semantic classes that can be the subjects and the objects of a verb is that I work with ontologists quite a bit. I find that their models of the domain often don’t objectively seem to have taken full advantage of what the literature of the domain has to say about how those models would need to look if they’re going to be adequate, and collecting examples of lots of different semantic classes taking part in an action is my stab at being helpful.

So, how does one going about doing this with a minimum of subjectivity and a maximum of data-centeredness? I follow roughly the following steps, pretty much in this order, allowing for some going back and forth between them as I fine-tune things:

Look at what other people have done. I didn’t always do this, as I wanted to see how different what I came up with was from what other people had come up with, but by now I have a decent feel for what kinds of differences there are likely to be (they’re related both to the different content matter and to the different writing styles that my work and previous work are based on), and I usually start by looking at the representations in the Unified Verb Index. (Search for the verb of your choice.)
Look at some random examples of the verb in use to get a general sense for how well the representation in the Unified Verb Index matches up with biomedical data. I use the Sketch Engine interface to do my search for random examples, but you can use Google, specialized textual search tools, or whatever is easy for you.
Look for examples of human agents. I usually go to Google for this one, as the data that I have uploaded to Sketch Engine doesn’t have very many humans, in general. My two tricks:
1. I use Google’s site: operator to search just within the National Library of Medicine’s web site. That way I can be almost positive that I’ll get examples of how the word is used in the biomedical domain.
2. The first thing that I try is a Google exact phrase search with we plus the past tense of the verb. You mark a phrasal search by putting the exact phrase that you’re looking for in double quotes. So, my search for we phosphorylated looked like this: site:http://www.ncbi.nlm.nih.gov/pubmed/ "we phosphorylated"
Look for the full range of prepositional complements. I do this with Sketch Engine’s word sketch function.
Look for a variety of semantic classes as the subject and the object of the verb. Again, I use Sketch Engine’s word sketch function for this.

Then it’s time to see if the semantic representation actually covers everything that I’ve found using the strategy above. If it does, then we’ll do a larger-scale project of marking up all of the examples of the verb in some large body of data, followed by trying to write a computer program that can make use of the representations and the examples to learn how to identify the semantics of the verb when shown new examples.

Here is some of the vocabulary that you will need if you’re going to talk about this kind of stuff in French. Here is some data from the French Wikipedia page about semantics. This will give us some of the vocabulary of semantics in general–then we’ll move on to lexical semantics.

La sémantique est une branche de la linguistique qui étudie les signifiés, ce dont on parle, ce que l’on veut énoncer. Sa branche symétrique, la syntaxe, concerne pour sa part le signifiant, sa forme, sa langue, sa graphie, sa grammaire, etc ; c’est la forme de l’énoncé.

la sémantique: semantics
le signifié: the “signified,” the concept or mental representation that is the locus of meaning. (I should point out that it is unfortunately rare for English-speaking linguists to use this old Saussurean terminology, at least in my corner of linguistics.)
énoncer: to formulate, state, or pronounce (definition from Wikipedia.org).
la syntaxe: syntax.
le signifiant: the “signifier,” the spoken (or, in my field, written) form that corresponds to the signifié or “signified.” (See above about unfortunate tendencies to not use Saussurean terminology.)
la graphie: written form (definition from WordReference.com).
un énoncé: in linguistics, this usually corresponds to the technical term “utterance,” but since we’re talking specifically about syntax here, it may be better translated as “wording” (see WordReference.com).

Now let’s move on to some vocabulary that’s more specific to lexical semantics. We’ll take this material from the book Introduction au TALN et l’ingénierie linguistique, by Isabelle Tellier.

La sémantique lexicale est l’étude du sens des “mots” -ou plutôt des morphèmes- d’une langue. Cette définition est en réalité assez problématique, puisque la notion même de “sens” n’a rien d’évidente. Le problème tient précisément à ce que, pour définir le “sens” d’un mot, on recourt en général à d’autres mots. Pourtant, la consultation d’un dictionnaire d’une langue donnée est de bien peu d’utilité si on n’a pas déjà d’un minimum de connaissance de cette langue. Comment échapper à cette “circularité du sens” ? Nous evisageons dans ce chapitre (et le suivant) diverses tentatives qui peuvent être regroupées en trois familles…

la sémantique lexicale: lexical semantics.
le morphème: morpheme.
avoir rien d’évidente: I don’t know! Can someone help out with this?
tenir à qqch: to come from, stem from, arise from. (Note: tenir à has a bazillion other meanings–see WordReference.com for this definition and many others.)
recourir à: to resort to, appeal to. (Definition from WordReference.com.)

Want to learn more about the kind of approach to (computational) lexical semantics that I’m talking about here? Check out my advisor’s book on the subject–Martha Palmer, Daniel Gildea, and Nianwen “Bert” Xue’s Semantic role labeling. (I’m not telling you which of these people was my advisor–still anonymous!)

Waiting for the train confuses me way more than it ought to

Waiting for things in France involves confusing vocabulary.

train schedule sign — Sign announcing incoming RER B trains. Amazingly, I found a picture of the sign for my train on line, although you can tell that this is from a different stop by the fact that it has a time for a train headed to my station. Picture source: http://www.francetravelplanner.com/go/paris/trans/air/choose_train.html.

The last time I was in Paris, I tried to figure out the optimum time to leave my apartment in order to minimize my wait for the train to the little town where I work. This requires complicated (at least for a humanities major like me) record-keeping in which I track the time that I leave the house, the time that the metro comes to the metro station by my apartment, the time that I get to the actual train station, and the time that my actual train shows up.

All of this involves a certain amount of time staring at electronic signs telling me the wait time for the next train, and that’s often where Zipf’s Law enters my day. There are a few words with very similar appearances, but very different meanings, and I confuse them constantly. They have so many related nouns, so many reflexive forms, and so many related colloquialisms that I’m going to start with just the verbs. (Definitions from WordReference.com, with some editing.)

attendre: to wait, to wait for; to expect.
attenter à: to make an attempt on (to attack).
s’attendre à: to expect (definition from Phildange).
attenter à: to be a slur on something/one.
atteindre: to reach, to get to (a place); to achieve, to meet (a goal); to affect or harm (someone).
étendre: to stretch out; to spread out; to open out (definition from Phildange).
s’étendre: to lie down; to talk at length; to pervade, etc. (definition from Phildange).
éteindre: to extinguish, to put out (a fire, a cigarette); to turn off (a light, a machine).
s’eteindre: to die, to go out, to switch (oneself) off.
s’éteindre: to pass away.
s’entendre: to get along (with each other) (definition from Phildange).

Bottom line: leave the house at 07:55 and I get to work in an hour and a quarter, with very little of a wait at the train station–that’s important when it’s cold. Leave the house at 08:00 and it’s a totally different story–over a 20-minute wait at the train station, and who knows how long it takes to get to work.

The first rule of talking about how people talk in Cincinnati is, don’t talk about how people talk in Cincinnati

I get disoriented, Zipf’s Law shows up, and I have breakfast.

I walked out of my room today–I’m on the road, visiting a research center with which I have a long-standing collaboration–and ran into a local. We greeted each other politely, saying “good morning” and remarking on the exceptionally cold weather–see this post on the subject of saying hello to strangers in the US–and it was clear from his accent, as well as the accent of the other people that I ran into on my way out of the building, that I was in Kentucky. I walked outside, looked around, and it was immediately clear that I wasn’t in Kentucky at all, but rather Ohio. Southern Ohio, specifically–and therein lay my accent mix-up.

Southern Ohio has an identity issue, especially in the moderately large city in which I find myself: is it in the North, or is it a part of Appalachia? The local Kentucky-like dialect is very strongly socially marked, and people around here do not like having their dialect remarked on, especially if they speak that particular dialect. Ohio dialects are actually quite diverse–Columbus, in the middle of the state, has four dialect boundaries, roughly corresponding to the four parts of the city divided up at the intersection of High St. and Broad St.–and around here in Cincinnati, there is a long history of prejudice related to social class. Around here, that social class is reflected most strongly by which dialect you speak, or at least which dialect you speak in public.

The Zipf’s Law connection: I stopped by the cafeteria in the research center to pick up some breakfast, and was happy to see a big vat of Cream of Wheat, a childhood favorite with which my father’s second wife often fed me. There was something wrong with it, though–what were all of those little yellow specks in it? A quick look at the menu confirmed my suspicion: it was not Cream of Wheat at all, but rather grits. Grits is a food of the southern United States, similar to a thin polenta. Staring at a vat of grits immediately raises a question: how do you say grits in French? In turns out that you don’t. It’s actually a complicated issue. The word can be singular or plural in English, and the French Wikipedia article on it starts out like this:

Le ou les Grits est une préparation culinaire…

“The (singular) or the (plural) Grits is a culinary preparation…”

That is, the word the shows up twice–once in a singular form, le, and once in the plural form, les. To find some actual French Zipf’s-Law-type words related to grits, let’s look at a couple of sentences from the French Wikipedia article, this time on the subject of the manufacture of grits:

Le grits trouve son origine dans la préparation du maïs par les amérindiens. Traditionnellement, la semoule du grits est réalisée par un moulin en pierre qui broie le maïs. On tamise ensuite et la poudre la plus fine est utilisée comme farine, alors que la plus grossière est destinée au grits.

“Grits originates in the preparation of corn by the Native Americans. Traditionally, the semolina of grits produced by a wood mill that grinds/crushes the corn. It is then sifted and the finest powder is used as flour, while the coarsest powder is reserved for grits.”

Let’s just focus on the verbs. Definitions from WordReference.com:

réaliser: to make, produce, or create. (Several other meanings, too, but that’s the one here.)
broyer: transcription: [bʀwaje]. To grind or crush; figuratively, to destroy or wreck.
tamiser: to sift, to sieve.
destiner à: to reserve for.

Click here for a collection of materials from different Ohio dialects. And yes, the title of this post is a reference to the book/movie Fight Club.

Molière, Tartuffe, Dr. Seuss, and the Grinch Who Stole Christmas

An unexpected connection between Dr. Seuss and one of the greatest French dramatists of all time.

Grinch 8d6b92b2-a41e-4740-8689-e986a12416fd — The Grinch, from Dr. Seuss’s “How The Grinch Stole Christmas.” Picture source: http://www.playbuzz.com/nedbullock10/how-much-of-a-grinch-are-you.

Molière was one of the great French dramatists. He lived just after Shakespeare, and you can compare them quite a bit in terms of their skill with language–reading his play Le Tartuffe in the original was almost adequate recompense for two years of studying French, mostly in the hours before sunrise.

Dr. Seuss is one of the most beloved American children’s authors. His classic Green Eggs and Ham was the first book my child ever read out loud, and the notorious American politician-nihilist Ted Cruz read it out loud on the floor of the Senate during his attempt to shut down the American government by filibuster.

One of Dr. Seuss’s most famous characters is the Grinch. In his book How The Grinch Stole Christmas, the Grinch is a nasty, bitter character who decides to ruin everyone else’s Christmas by stealing their Christmas presents.

Reading the commentaries on Le Tartuffe, I was thrilled to see the character of Mme. Pernelle, the curmudgeonly mother of one of the main characters, described as grincheuse. WordReference.com defines grincheux/grincheuse (male and female forms) as “grumpy, grouchy, or cranky.” Could this be the origin of the name “Grinch”? Wikipedia says yes! Who would’ve guessed?

	Anonymous on The many ways to spell “…
	Anonymous on Nightmare after nightmare: How…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …