Zipf’s Law, the Poisson Distribution, reflexive verbs, and terrorism in the age of social media

Screenshot 2016-01-25 23.09.23
“Islamic State dramatizes the macabre will and testament of the terrorists of the Paris attacks.” This is the mettre en scène (non-reflexive) form of the expression.  Picture source: screen shot of http://www.bfmtv.com/societe/dans-une-nouvelle-video-daesh-met-en-scene-les-auteurs-des-attentats-de-paris-946083.html.

By now, we know what goes hand-in-hand with Zipf’s Law: the Poisson Distribution.  Zipf’s Law explains why we run into words that we don’t know in a foreign language every stinking day, and the Poisson Distribution shows how even rare events can come in clusters.  Three rock stars die in one month, and the like.  This morning I ran into two occurrences of an expression that I’d never seen before at all.  There was an interesting twist, in that it’s actually two expressions, one with a regular verb, and one with the reflexive form of the same verb.  Reflexive verbs in French can refer to performing an action on oneself–je me mouche “I blow my nose,” je mouche le bébé “I blow the baby’s nose” (no, I didn’t make that up–look here).  In this case, the meaning of moucher is the same–it’s just a question of whose nose is getting blown.  However, non-reflexive and reflexive verbs can also have different meanings, and that’s the case with the expression that had me going to the dictionary before I even had breakfast this morning.

Mettre is one of those common and rather irregular verbs that shows up in a bazillion expressions.  This one has two forms.  The non-reflexive, mettre en scène, means to stage or to dramatize (definitions from WordReference.com).  The reflexive form, se mettre en scène, is to put on a performance.  I saw it on Twitter today: L’atroce vidéo de l’Etat Islamique montre que nous avons changé d’époque.  L’ultraviolence la plus sordide se met en scène façon Hollywood.  “The atrocious Islamic State video shows that we have changed eras.  The most sordid ultraviolence puts on a show Hollywood-style.”

I wish that I had some clever way to wrap up this discussion, but learning yet more vocabulary by way of terrorism just depresses me.  Yuck.  If you’re interested in theorizing about terrorism and the media in general and social media in particular, try this Wikipedia page for starters.  Sigh…

  • mettre en scène: to stage, to dramatize.
  • se mettre en scène: to put on a performance.
  • la mise en scène: staging, directing; dramatization; stageplay, stage direction.

Screenshot 2016-01-25 23.15.07

Doing computational lexical semantics with your web browser: An approach to using data to build semantic representations

Here’s how you can do computational lexical semantics in the comfort of your own home–and how to talk about it in French.

A lot of my work involves something called lexical semantics.  Lexical semantics is the study of how words mean things.  (That means that there’s some interaction with the question of how sentences mean things, since part of the meaning of a sentence comes from the words that it contains, but in lexical semantics, the focus is on the words and how they contribute to and interact with the semantics and the syntax (the phrasal relationships) of a sentence.)  In particular, I do something called computational lexical semantics.  That means that I use large bodies of data as a crucial part of my work, and I evaluate my work in part by trying to use it as the basis of computer problems.  If that doesn’t work, then I figure that what I’ve done needs to be improved.

My advisor is one of the world experts on computational lexical semantics.  (I won’t name her, since I try to keep this blog anonymous.)  As far as I know, she was the first person to demonstrate that large bodies of naturally-occurring data can, in fact, be used to test theories of lexical semantics.  This was important because semantic theories often haven’t really been tested in any way that would count as a “test” in science, and as we’ve seen in other posts, linguistics is the scientific study of language.  She often says that semantics is not a suitable subject of study for linguistics, since it’s so subjective.  I’m never sure whether or not she’s kidding; regardless of whether she is or not, one of my professional ambitions is to take the subjectivity out of computational lexical semantics.

Part of my approach to that has been to try to develop a systematic methodology for developing semantic representations of words.  In particular, I work with verbs, and with nouns that are derived from those verbs–for example, the verb phosphorylate (I specialize in biomedical language) and the related noun phosphorylation, or the verb receive and the related noun receptor.  (You’ll notice that there are different relationships between the verb and the noun in the two examples–phosphorylation is a noun that refers to the action of the verb, while receptor is a noun that refers to the thing that does the receiving.)

One of the bedrocks of my approach is that I try to base my representations of the meanings of words on data that I didn’t come up with myself.  (Note that I didn’t invent this idea, or any of the other aspects of the approach that I describe here—this is just my recipe for putting them all together.)  I mostly work with scientific journal articles.  There are two parts to what I do:

  1. Coming up with the representation of the meaning of the verb (or noun).
  2. Coming up with examples that let me test the representation, both by providing examples of the different effects that I think that the meaning of the verb has on how it behaves on sentences, and also doing a quick check to make sure that I don’t see any examples that argue against my representation of the meaning of the verb.

This is a pretty iterative, complementary process–I typically start out by looking at a bunch of examples of the verb to get a general sense of how it works, then write up a quick representation of the semantics, and then look for examples more systematically to see if my representation works.  Some of the goals that I keep in mind when I’m searching for these examples are:

  1. I want to know whether or not humans can be the agent of this verb.
  2. I want to be sure to get the full range of prepositional complements, as these can mark a variety of semantic relations.
  3. I want to get a variety of semantic classes as the subject and the object of the verb.
  4. If there is a deverbal nominalization, I want to get that, too.

Knowing whether or not humans can be the agent of the verb is important to me for a number of reasons.  People often question whether or not humans can perform the actions of particular verbs in the biomedical domain.  For example, Wikipedia describes the action of the verb phosphorylate as ” the addition of a phosphoryl group (PO32−) to a molecule. ”  That doesn’t sound like something that a human could do, right?  But, you can find sentences like this:

In order to determine the number of phosphorylated sites in human cardiac MyBP-C samples, we phosphorylated the recombinant MyBP-C fragment, C0-C2 (1-453) with PKA using (gamma32)P-ATP up to 3.5 mol Pi/mol C0-C2.

–Source: http://www.ncbi.nlm.nih.gov/pubmed/18573260

I’m trying to represent the semantics of the language, not the semantics of phosphorylation, so I need to take into account all of the data about the language, and that includes this kind of counter-intuitive use of the verb.  Why do we care about humans so much, though?  It’s because humans are the prototypical example of things that act with volition, or as a result of their will–what we call agents in the English-language terminology of linguistics–and agents get represented in a special way in lexical semantics.  So, we need to know if there can be a human agent for these biomedical verbs so that we can know if they can have agents at all, essentially.

Getting examples of the full range of prepositional complements (e.g. phosphorylate at, phosphorylate to) is important to me because different prepositions sometimes mark different aspects of the semantics of the verb.  For example, when we investigate phosphorylate at, we see that the semantics of phosphorylation involve a specific location on a molecule, and when we investigate phosphorylate to, we see that the semantics of phosphorylation involve something becoming something else–the to marks not a location at which the molecule ends up, but what the molecule becomes–like I had it converted to a round-trip ticket, in “normal” English.

Getting examples of a variety of semantic classes as the subject and the object of the verb is important to me for two reasons.  One reason is that I’m doing computational lexical semantics, specifically, which, as you might recall, means that I test my semantic representations by trying to use them as the basis of a computer problem.  I know that it can be important to know what kinds of things are taking part in the action of a verb in order to know how to interpret both the verb itself, and the sentence that it occurs in.  Imagine these situations: the author finished the book, the student finished the book, and the goat finished the book.  In the first, this means that the author completed the writing of the book.  In the second, this means that the student completed the reading of the book.  In the third, this means that the goat finished the eating of the book.  Can there be other interpretations of these sentences?  Of course–authors also read, students also write, and in a work of fiction, you could certainly imagine a goat reading a book.  But, none of these are the intuitively obvious interpretations of those sentences, and the reason for that is the expectations that the different subjects—author, student, goat–lend to our interpretations of the sentences.)  The other reason that I want to get a decent range of the types of semantic classes that can be the subjects and the objects of a verb is that I work with ontologists quite a bit.  I find that their models of the domain often don’t objectively seem to have taken full advantage of what the literature of the domain has to say about how those models would need to look if they’re going to be adequate, and collecting examples of lots of different semantic classes taking part in an action is my stab at being helpful.

So, how does one going about doing this with a minimum of subjectivity and a maximum of data-centeredness?  I follow roughly the following steps, pretty much in this order, allowing for some going back and forth between them as I fine-tune things:

  1. Look at what other people have done.  I didn’t always do this, as I wanted to see how different what I came up with was from what other people had come up with, but by now I have a decent feel for what kinds of differences there are likely to be (they’re related both to the different content matter and to the different writing styles that my work and previous work are based on), and I usually start by looking at the representations in the Unified Verb Index.  (Search for the verb of your choice.)
  2. Look at some random examples of the verb in use to get a general sense for how well the representation in the Unified Verb Index matches up with biomedical data.  I use the Sketch Engine interface to do my search for random examples, but you can use Google, specialized textual search tools, or whatever is easy for you.
  3. Look for examples of human agents.  I usually go to Google for this one, as the data that I have uploaded to Sketch Engine doesn’t have very many humans, in general.  My two tricks:
    1. I use Google’s site: operator to search just within the National Library of Medicine’s web site.  That way I can be almost positive that I’ll get examples of how the word is used in the biomedical domain.
    2. The first thing that I try is a Google exact phrase search with we plus the past tense of the verb.  You mark a phrasal search by putting the exact phrase that you’re looking for in double quotes.  So, my search for we phosphorylated looked like this: site:http://www.ncbi.nlm.nih.gov/pubmed/ "we phosphorylated"
  4. Look for the full range of prepositional complements.  I do this with Sketch Engine’s word sketch function.
  5. Look for a variety of semantic classes as the subject and the object of the verb.  Again, I use Sketch Engine’s word sketch function for this.

Then it’s time to see if the semantic representation actually covers everything that I’ve found using the strategy above.  If it does, then we’ll do a larger-scale project of marking up all of the examples of the verb in some large body of data, followed by trying to write a computer program that can make use of the representations and the examples to learn how to identify the semantics of the verb when shown new examples.

Here is some of the vocabulary that you will need if you’re going to talk about this kind of stuff in French.  Here is some data from the French Wikipedia page about semantics.  This will give us some of the vocabulary of semantics in general–then we’ll move on to lexical semantics.

La sémantique est une branche de la linguistique qui étudie les signifiés, ce dont on parle, ce que l’on veut énoncer. Sa branche symétrique, la syntaxe, concerne pour sa part le signifiant, sa forme, sa langue, sa graphie, sa grammaire, etc ; c’est la forme de l’énoncé.

  • la sémantique: semantics
  • le signifié: the “signified,” the concept or mental representation that is the locus of meaning.  (I should point out that it is unfortunately rare for English-speaking linguists to use this old Saussurean terminology, at least in my corner of linguistics.)
  • énoncer: to formulate, state, or pronounce (definition from Wikipedia.org).
  • la syntaxe: syntax.
  • le signifiant: the “signifier,” the spoken (or, in my field, written) form that corresponds to the signifié or “signified.”  (See above about unfortunate tendencies to not use Saussurean terminology.)
  • la graphie: written form (definition from WordReference.com).
  • un énoncé: in linguistics, this usually corresponds to the technical term “utterance,” but since we’re talking specifically about syntax here, it may be better translated as “wording” (see WordReference.com).

Now let’s move on to some vocabulary that’s more specific to lexical semantics.  We’ll take this material from the book Introduction au TALN et l’ingénierie linguistique, by Isabelle Tellier.

La sémantique lexicale est l’étude du sens des “mots” -ou plutôt des morphèmes- d’une langue. Cette définition est en réalité assez problématique, puisque la notion même de “sens” n’a rien d’évidente. Le problème tient précisément à ce que, pour définir le “sens” d’un mot, on recourt en général à d’autres mots. Pourtant, la consultation d’un dictionnaire d’une langue donnée est de bien peu d’utilité si on n’a pas déjà d’un minimum de connaissance de cette langue. Comment échapper à cette “circularité du sens” ? Nous evisageons dans ce chapitre (et le suivant) diverses tentatives qui peuvent être regroupées en trois familles…

  • la sémantique lexicale: lexical semantics.
  • le morphème: morpheme.
  • avoir rien d’évidente: I don’t know!  Can someone help out with this?
  • tenir à qqch:  to come from, stem from, arise from.  (Note: tenir à has a bazillion other meanings–see WordReference.com for this definition and many others.)
  • recourir à: to resort to, appeal to.  (Definition from WordReference.com.)

Want to learn more about the kind of approach to (computational) lexical semantics that I’m talking about here?  Check out my advisor’s book on the subject–Martha Palmer, Daniel Gildea, and Nianwen “Bert” Xue’s Semantic role labeling.  (I’m not telling you which of these people was my advisor–still anonymous!)

Waiting for the train confuses me way more than it ought to

Waiting for things in France involves confusing vocabulary.

train schedule sign
Sign announcing incoming RER B trains. Amazingly, I found a picture of the sign for my train on line, although you can tell that this is from a different stop by the fact that it has a time for a train headed to my station. Picture source: http://www.francetravelplanner.com/go/paris/trans/air/choose_train.html.

The last time I was in Paris, I tried to figure out the optimum time to leave my apartment in order to minimize my wait for the train to the little town where I work.  This requires complicated (at least for a humanities major like me) record-keeping in which I track the time that I leave the house, the time that the metro comes to the metro station by my apartment, the time that I get to the actual train station, and the time that my actual train shows up.

All of this involves a certain amount of time staring at electronic signs telling me the wait time for the next train, and that’s often where Zipf’s Law enters my day.  There are a few words with very similar appearances, but very different meanings, and I confuse them constantly.  They have so many related nouns, so many reflexive forms, and so many related colloquialisms that I’m going to start with just the verbs.  (Definitions from WordReference.com, with some editing.)

  • attendre: to wait, to wait for; to expect.
  • attenter à: to make an attempt on (to attack).
  • s’attendre à: to expect (definition from Phildange).
  • attenter à: to be a slur on something/one.
  • atteindre: to reach, to get to (a place); to achieve, to meet (a goal); to affect or harm (someone).
  • étendre: to stretch out; to spread out; to open out (definition from Phildange).
  • s’étendre: to lie down; to talk at length; to pervade, etc. (definition from Phildange).
  • éteindre: to extinguish, to put out (a fire, a cigarette); to turn off (a light, a machine).
  • s’eteindre: to die, to go out, to switch (oneself) off.
  • s’éteindre: to pass away.
  • s’entendre: to get along (with each other) (definition from Phildange).

Bottom line: leave the house at 07:55 and I get to work in an hour and a quarter, with very little of a wait at the train station–that’s important when it’s cold.  Leave the house at 08:00 and it’s a totally different story–over a 20-minute wait at the train station, and who knows how long it takes to get to work.

 

The first rule of talking about how people talk in Cincinnati is, don’t talk about how people talk in Cincinnati

I get disoriented, Zipf’s Law shows up, and I have breakfast.

IMG_3370
Grits. Photo source: me.

I walked out of my room today–I’m on the road, visiting a research center with which I have a long-standing collaboration–and ran into a local.  We greeted each other politely, saying “good morning” and remarking on the exceptionally cold weather–see this post on the subject of saying hello to strangers in the US–and it was clear from his accent, as well as the accent of the other people that I ran into on my way out of the building, that I was in Kentucky.  I walked outside, looked around, and it was immediately clear that I wasn’t in Kentucky at all, but rather Ohio.  Southern Ohio, specifically–and therein lay my accent mix-up.

Southern Ohio has an identity issue, especially in the moderately large city in which I find myself: is it in the North, or is it a part of Appalachia?  The local Kentucky-like dialect is very strongly socially marked, and people around here do not like having their dialect remarked on, especially if they speak that particular dialect.  Ohio dialects are actually quite diverse–Columbus, in the middle of the state, has four dialect boundaries, roughly corresponding to the four parts of the city divided up at the intersection of High St. and Broad St.–and around here in Cincinnati, there is a long history of prejudice related to social class.  Around here, that social class is reflected most strongly by which dialect you speak, or at least which dialect you speak in public.

The Zipf’s Law connection: I stopped by the cafeteria in the research center to pick up some breakfast, and was happy to see a big vat of Cream of Wheat, a childhood favorite with which my father’s second wife often fed me.  There was something wrong with it, though–what were all of those little yellow specks in it?  A quick look at the menu confirmed my suspicion: it was not Cream of Wheat at all, but rather grits.  Grits is a food of the southern United States, similar to a thin polenta.  Staring at a vat of grits immediately raises a question: how do you say grits in French?  In turns out that you don’t.  It’s actually a complicated issue.  The word can be singular or plural in English, and the French Wikipedia article on it starts out like this:

Le ou les Grits est une préparation culinaire…

“The (singular) or the (plural) Grits is a culinary preparation…”

That is, the word the shows up twice–once in a singular form, le, and once in the plural form, les.  To find some actual French Zipf’s-Law-type words related to grits, let’s look at a couple of sentences from the French Wikipedia article, this time on the subject of the manufacture of grits:

Le grits trouve son origine dans la préparation du maïs par les amérindiens. Traditionnellement, la semoule du grits est réalisée par un moulin en pierre qui broie le maïs. On tamise ensuite et la poudre la plus fine est utilisée comme farine, alors que la plus grossière est destinée au grits.

“Grits originates in the preparation of corn by the Native Americans.  Traditionally, the semolina of grits produced by a wood mill that grinds/crushes the corn.  It is then sifted and the finest powder is used as flour, while the coarsest powder is reserved for grits.”

Let’s just focus on the verbs.  Definitions from WordReference.com:

  • réaliser: to make, produce, or create.  (Several other meanings, too, but that’s the one here.)
  • broyer: transcription: [bʀwaje].  To grind or crush; figuratively, to destroy or wreck.
  • tamiser: to sift, to sieve.
  • destiner à: to reserve for.

Click here for a collection of materials from different Ohio dialects.  And yes, the title of this post is a reference to the book/movie Fight Club.

 

 

Molière, Tartuffe, Dr. Seuss, and the Grinch Who Stole Christmas

An unexpected connection between Dr. Seuss and one of the greatest French dramatists of all time.

Grinch 8d6b92b2-a41e-4740-8689-e986a12416fd
The Grinch, from Dr. Seuss’s “How The Grinch Stole Christmas.” Picture source: http://www.playbuzz.com/nedbullock10/how-much-of-a-grinch-are-you.

Molière was one of the great French dramatists.  He lived just after Shakespeare, and you can compare them quite a bit in terms of their skill with language–reading his play Le Tartuffe in the original was almost adequate recompense for two years of studying French, mostly in the hours before sunrise.

Dr. Seuss is one of the most beloved American children’s authors.  His classic Green Eggs and Ham was the first book my child ever read out loud, and the notorious American politician-nihilist Ted Cruz read it out loud on the floor of the Senate during his attempt to shut down the American government by filibuster.

One of Dr. Seuss’s most famous characters is the Grinch.  In his book How The Grinch Stole Christmas, the Grinch is a nasty, bitter character who decides to ruin everyone else’s Christmas by stealing their Christmas presents.

Reading the commentaries on Le Tartuffe, I was thrilled to see the character of Mme. Pernelle, the curmudgeonly mother of one of the main characters, described as grincheuse.  WordReference.com defines grincheux/grincheuse (male and female forms) as “grumpy, grouchy, or cranky.”  Could this be the origin of the name “Grinch”?  Wikipedia says yes!  Who would’ve guessed?

 

I am getting soooo tired of learning vocabulary related to terrorist activity

It’s impossible to keep up with current events these days without a robust vocabulary related to terrorism and police actions.

lance-roquettes-perquisition
Rocket launcher seized in a police raid in Lyon on November 13, 2015, three days after the attacks in Paris. Picture source: http://www.lefigaro.fr/actualite-france/2015/11/26/01016-20151126ARTFIG00199-des-perquisitions-contestees-mais-efficaces-selon-les-policiers.php.

With the world apparently going violently crazy–this year is only 14 days old, and so far we’ve had terrorist attacks in Istanbul, Tel Aviv, Jakarta, Hurghada (Egypt), Marseille, Kabul–more than I have the patience to list, actually (Wikipedia says 28 of them), and that’s not even counting the nutjobs occupying federal property in Oregon, who haven’t actually killed anyone yet–the news from France is full of stories of searches, arrests, and the like.  A bit of trivia: according to Wikipedia, the English word terrorism comes from the French word terrorisme, which originally referred to the policies of the revolutionary government during the Reign of Terror (1793-1794).  Here’s some of the relevant vocabulary that keeps showing up in the news these days.  Definitions from WordReference.com:

  • la garde à vue: custody.  Not to be confused with:
  • le garde-à-vous: attention (the military posture).  If you’re French: to us Americans, this sounds just like garde à vue–really.
  • être placé en garde à vue: to be held in custody.
  • mettre en garde à vue: to put in custody or to send back to custody–I’m not sure about this one.
  • perquisitionner: to search.
  • la perquisition: a police raid, a police search.
  • faire une perquisition: to carry out a search.
  • le mandat de perquisition: search warrant.
  • les menottes (f): handcuffs.
  • passer les menottes à qqn: to handcuff someone.

 

 

The actor, the drug lord, and Zipf’s Law

Screenshot 2016-01-12 13.31.17
Screen shot of Rolling Stone’s tweet announcing the Sean Penn/El Chapo interview. That’s Penn on the left and El Chapo on the right.

The French press is just as abuzz about the whole Sean Penn/El Chapo thing as the American press is.  In case you’re reading this 10 years from now: “El Chapo” is the nickname of Joaquin Guzman, until recently one of the biggest drug dealers in the world–he may still be, although from behind bars.  He was arrested and imprisoned in 1991, and then escaped.  He remained free until 2014, when he was recaptured and, again, imprisoned.  In 2015 he escaped again, this time apparently being driven down an underground tunnel on the back of a motorcycle.

All was well for him until he decided that a movie should be made about his life.  He reached out to the motion picture industry, which somehow led to him being “interviewed” by American actor Sean Penn, the resulting interview being published in Rolling Stone magazine.  This exposure led to him being captured once again, and at the time that I write this, he is still in jail, hoping like hell he doesn’t get extradited to the United States.

We actually have two Zipf’s Law connections with this story–one regarding Sean Penn, and one regarding El Chapo.

Some time ago, we read about the filles du roi–the French orphans who were sent to Canada to get married and increase the French-speaking population in North America between 1663 and 1673.  You might recall from that post that Madonna is descended from a fille du roi.  In 1985, she broke the heart of every young man in America by marrying Sean Penn.

El Chapo’s Zipf’s Law connection is lexical–i.e., one of those bazillion words that is not particularly unusual, but that you nonetheless almost never hear (in the big scheme of things).  As I said, El Chapo is all over the news, and he is almost referred to by his name–Joaquin Guzman–followed by surnommé El Chapo, meaning “nicknamed El Chapo.”  Here’s an example from France 24‘s web site:

Les autorités mexicaines ont annoncé vendredi l’arrestation de Joaquin Guzman, surnommé “El Chapo”, le plus important narcotrafiquant mexicain, qui s’était évadé de façon rocambolesque d’une prison de haute sécurité le 11 juillet dernier.

On Friday, Mexican authorities announced the arrest of Joaquin Guzman, nicknamed “El Chapo,” the most important Mexican drug trafficker, who had escaped in an extraordinary fashion from a high-security prison last July 11th.

Surnommer is a great example of Zipf’s Law–not particularly unusual, but low-frequency enough that I haven’t run into it in two years of studying French quite seriously.  Again, it means “to nickname.”  I’m not telling you any of my nicknames…

 

 

How to irritate a linguist, Part 2: I probably shouldn’t have cursed in that work-related email

Screenshot 2016-01-09 09.46.08
Picture source: screen capture by me of Sketch Engine’s analysis of the behavior of the word “differentiate” in a set of documents about mouse genomics.

One of my pet peeves is people making spurious claims that there are some languages in which there are ideas that just can’t be expressed. This is often preceded by uninformed crap like “there are primitive languages that only have 100 words, right?” (There are no such languages.)  Do I know words in language X that we don’t have in English? Sure. For example, there is no equivalent single word in English for the Yiddish word נחת (nakhes).  Does that mean that you can’t express the idea in English?  Certainly not.  Nakhes is a mixture of pride and pleasure–in the prototypical case, you get it when your children do something good.

Now, you may be thinking of a Wittgensteinian counter-argument to this.  My French tutor has the following quote from Wittgenstein in her email signature: The limits of my language are the limits of my world.  What did Wittgenstein mean by this?  He didn’t mean (and, yes, I will have this post reviewed by a philosopher, as I am not one) that it’s the limits of his language that form the limits of his world, and that if he spoke some other language, perhaps the limits of his world would be different.  Rather, his point was that the limits of language in general are the limits of philosophy–he maintained that what we can’t talk about, we can’t think about, and so–as they might put it on the French high school exit exam–philosophy is doomed to always be betrayed by language.  Wittgenstein is not claiming that his language limits his “world”–he’s claiming that all language in general limits the ability to think, and hence to philosophize.

The Zipf’s Law connection: I needed a word today, and I couldn’t find it in my native language, but I did find exactly what I needed in French.  I was writing an email in which I tried to explain the good points and the bad points of a web-based tool that I use in my work, some of which involves formalizing the semantics of the language of bioscience.  As an example, I pointed out what hints it gives me about the kinds of prepositions that can go with the word differentiate, and in particular, how they encode the fact that a cell can differentiate in a particular location–cortex and gonad in the examples above–but that if it differentiates into something, that has nothing to do with location at all, but rather with the outcome of the differentiation–in the examples, we’re talking about cells differentiating into different kinds of cells.  (It’s the ability of stem cells to do this that makes them stem cells.  Cellules de souche, I think they’re called.)  This is a specific weirdness of the word differentiate–it has a very word-specific relationship with the preposition into.  What to call that “specific weirdness”?  Here’s the best that I could come up with:

Screenshot 2016-01-09 10.01.00

I was using the French word spécificité here.  La spécificité refers to a special feature of something.  You could translate it with the word specifics, but that wouldn’t go right in this sentence.  I could imagine using the word idiosyncracies here, but that has an implication of abnormality that wasn’t quite right for the context, either–as a linguist, I don’t think in terms of normativity where language is concerned.  Spécificité has no such connotation in French, as far as I know–for example, when people describe me in French in terms of my profession, they often say that one of my spécificités is that I work on biomedical language.

So: on my journey to learn to speak French, I’ve finally come upon one little thing that I know how to say in French, but not in English (my native language).  If you don’t mind, I’m going to think about this not as a reflection of my inarticulateness in English, but as a mark of improvement in my French.  Undoubtedly this bit of hubris will be retaliated for by the embarrassment of me not understanding a simple question about the location of the bathroom at some point in the course of the day, but I’ll live with that.

Want to try Sketch Engine?  Follow this link.

How to irritate a linguist

strikeback
Picture copyright: The Speculative Grammarian http://www.specgram.com

Want to irritate a linguist?  Caution: this technique will get you an immediate lecture, and you’re probably not going to find that lecture very interesting.  Here’s what you do: ask them how many languages they speak.  This will irritate the shit out of us.  We will respond something like this: “linguists are not people who speak lots of languages.  People who speak lots of languages are called polyglots.  Linguists are people who study the phenomenon of language as a system–not necessarily any particular language, but language in general.”  For example: my master’s thesis was about vowels and the evolution of the vocal tract.  My doctoral dissertation was about using linguistic field work techniques to test software programs.

Personally, if you push me, I will disappoint you by responding “one–English.”  I would never say that I “speak” Spanish, although I’m quite at home in it, or that I “speak” French, even though I’m reaching the point where I can use it professionally.  This is a guaranteed technique for irritating a linguist!  It’s not the only one, though–watch this space for others.

  • Le/la polyglotte: polyglot.  Note that it can be either grammatically masculine or grammatically feminine.  This democratization of the grammatical gender of words for professions that don’t have explicit gender-marking is a hot topic in France right now, particularly in the case of Madame le ministre versus Madame la ministre for government ministers like Ségolène Royal.  (By the way: I just Googled her to make sure that I was spelling Royal correctly, and discovered that Google’s autocomplete suggestions are consistent with the hypothesis that people often look for naked pictures of her.)  The French Academy declines to endorse the Madame la ministre construction, but nobody listens to the French Academy much anyway.

 

It’s just a jump to the left and then a step to the right of Notre Dame

denton
Picture source: http://www.shawnhalldesign.com/denton-billboard/.

Charlemagne (allegedly) said that “To have another language is to possess a second soul.” Countless people have expressed similar sentiments.  Fellini: “A different language is a different vision of life.”  Delacroix: “The individual’s whole experience is built upon the plan of his language.

I’ve always thought that this kind of thing was bullshit.  However, on Christmas night, my (limited) French skills did, indeed, show me a new world.

At the Studio Galande, a tiny movie theater on a little back street in the Latin Quarter–almost across the river from Notre Dame (42 rue Galande, 75005)–you can see the Rocky Horror Picture Show on Friday and Saturday nights.  The movie has French subtitles (crappy translation, actually), and–more importantly–a highly rehearsed group of people who act out the movie, singing along with it and offering non-stop commentary, almost entirely in French.

Of course, there is the usual throwing of rice, although the water pistols of my teenaged years and their gentle sprinkle have been replaced by 2-liter bottles of water and some serious soaking.  When I was a teenager, going to the Rocky Horror Picture Show meant knowing a few things to shout in unison with the rest of the audience at appropriate times–today, that’s basically a semi-professional production, and that’s what the highly rehearsed group of people do.

In Paris, this was a far more surreal experience than it is in the US, and it’s pretty surreal in the US.  I’ll try to paint the picture.  Since it’s me that’s doing the painting, it’s mostly a picture of language.  Here’s what you have: (a) the movie, in English.  (b) The subtitles, in French.  (c) The performers (I’m not actually sure what to call them) doing their accompaniment in French.  (d)  The performers also doing their accompaniment in heavily accented English.

It’s an incredibly rich linguistic hodgepodge, and it all comes at you in this confused, non-stop torrent.  It was far too much for me to be able to retain very many data points for you, but here’s a nice example of a way that the humor combined English and French.  There’s a point in the film that shows a map featuring a city called Denton.  So, you’ve got “Denton” splashed prominently across the screen.  Now, if “Denton” were French, it would be pronounced like dans ton–“in your.”  Right at that point in the movie, one of the performers holds up a sign with the word cul right next to the name “Denton.”  Cul means “ass,” so now you have dans ton cul–“in your ass.”

If I may be permitted a bit of hubris: the thing that I was proudest of came during the Time Warp.  If you haven’t seen the Rocky Horror Picture Show: there’s a big dance number featuring–yes–the Time Warp.  (It’s just a jump to the left, and then a step to the right.)  At various points in this, it would not be inappropriate to yell “ooh-ah.”  So, at one of those points, one of the performers says Et maintenant, on va dire ooh-ah en verlan: ah-ooh.  What that means: “We’re going to say “ooh-ah” in Verlan now–“ah-ooh.””  Why I was so proud of catching that–it requires some cultural knowledge, which is: what is Verlan?  Verlan is the name of a kind of French slang that originated in the banlieus défavorisées (bad neighborhoods, mostly surrounding Paris) and has since become pretty broadly popular.  Words are formed in Verlan by reversing things–verlan is itself l’inverse, “backwards,” reversed.  So: what would ooh-ah be in Verlan?  It would be ah-ooh.

So, that’s my bit of hubris.  Hubris is always followed by a fall, and no doubt my hubris will be followed by me not understanding however it is that someone will say “good morning” to me tomorrow.  As Brad says in the French subtitles to his song in the floor show scene: ça me dépasse–“It’s beyond me.”

An evening at the Rocky Horror Picture Show would be an unforgettable and unique Parisian experience, and I recommend it highly.  I’ll leave you with two pieces of advice on the subject.  (a)  If you haven’t seen the movie before, the whole evening will mostly be lost on you, but if you have, this is an experience that you are unlikely to forget; (b) don’t, don’t, don’t sit in the front row (rang 1).  Trust me on this.

  • le travesti : transvestite.  At one point in the movie, you are to shout Garçon, garçon, il y a un travesti dans ma soupe–waiter, waiter, there’s a transvestite in my soup.  In the context of the film, it makes perfect sense.
  • travesti (adj.): disguised; in fancy dress; in costume.
  • Ça me dépasse: it’s beyond me.
Ukrainian Humanitarian Resistance

Resisting the russist occupation while keeping our humanity

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE