I saw a guy peeing on the street today: Philip II, modern art, and the smells of Paris

The smell of pee can be your connection to centuries of Parisian history.

Centre Georges Pompidou
The Centre Georges Pompidou, France’s national gallery of modern art. Picture source: http://www.rpbw.com/project/3/centre-georges-pompidou/.

I saw a guy peeing on the street today.  It was on one of those concrete things outside of the Centre Georges Pompidou, France’s national museum of modern art.  It seemed odd, because there was a free public toilet not 5 meters away.  I could smell the large–and rapidly growing–puddle of piss as I walked by.

Paris has always smelled, and by “smelled,” I mean stunkPhilip II (Philip Le Dieu-donné) tried to deal with the stench from the streets by paving them, sometime between 1180 and 1223.  It didn’t work–the description of sources of the stench in one neighborhood in the 1680s or so (I think it was near what’s now Place d’Italie) included “a stream that served as public sewer and a waste dump for the Gobelins factory, a pig farm, a neighborhood tanner, a starch maker, and an abattoir which emptied blood into the street.”  (Robert Cole’s excellent A traveller’s history of Paris, the source of this quote, proceeds by historical period; each chapter details the stench situation at that point in time.)  Louis XIV moved his residence from the Louvre to Versailles in 1682 in part to get away from fractious Parisian mobs, but also to get away from the stench of Paris.

Poop can still be an issue.  Edmund White used to walk his dog by the Centre Pompidou expressly to have it poop in the ventilation duct of the office of a guy who had refused to give him a writing job.  However, in these days of modern sewer systems, the main issue is pee, and there’s not actually that much of an issue with that.  Despite everything that you hear about Parisian men peeing all over the place, you are only really likely to smell it in the metro stations.  It’s claimed that a couple liters of perfume are dumped into the Métro ventilation system every day in an attempt to cover the smell of pee.  (See here for the closest I’ve been able to come to verifying this.)

Really, in a European context, France is not that bad in the urine-smelling department.  In Belgium, I once had to pee in the kitchen of a decent restaurant–because that’s where the urinal was.

  • faire pipi: to urinate.
  • pisser: to urinate.
  • uriner: to urinate–I found this in my medical dictionary.
  • la miction: urination.  Also from my medical dictionary, so it might just be a technical term.  The English cognate is micturition.
  • pisser dans un violon: to waste your breath, to talk to a wall.  Literally: “to piss in a violin.” 
  • gey kakn afn yam: “go shit in the ocean.”  This is Yiddish, not French, but it’s really the only Yiddish you need to know.

Bilingual dictionaries: how to pick them, how to use them

I was in the Navy with an Armenian woman.  (No, you don’t have to be a citizen to serve in the American military, and that’s probably true in most countries.  In France, you can get citizenship by serving in the military–you are français par le sang versé, “French by spilt blood.”  This isn’t the case in the United States–you can apply for citizenship as a member of our military, but there actually isn’t any guarantee that you’ll get it.) We’ll call her Nairi (not her real name).  Like many members of the Armenian diaspora, Nairi was massively multilingual–she spoke Armenian, Arabic, and Spanish natively, and French and English as very strong second languages.  (I once saw her mother test her to make sure that she wasn’t forgetting any of them.)  One day Nairi came back from leave (what we call vacation in the military) with a seven-language dictionary.  I admired it, and she insisted that I take it.  I refused, she insisted, I refused, she insisted, I refused, she insisted, and finally, I took it.  What I didn’t realize was that in Armenian culture, if someone admires something of yours, you must insist that they take it.  Armenians know that they most certainly should not take it–I didn’t.  Now I do.  Stupid me–every time I see that dictionary on my bookshelf, I feel like a total jerk.

In a recent post, we talked about monolingual dictionaries–that is, dictionaries that list words in some language and give definitions of them in that same language.  Today, let’s talk about bilingual dictionaries–that is, words that list words in some language and give corresponding words in another language.  Of course, anything that we might say about bilingual dictionaries applies equally to dictionaries with even more languages, like the one that I stupidly took from poor Nairi.

I carefully said “corresponding” words just above–I carefully didn’t say “equivalent” or “the same” words.  This is because it’s often the case that there isn’t a single translation from one word in one language to one word in another language.  Even when there is one, it doesn’t necessarily “mean” the same thing, in some sense of the word “meaning.”  To give you an example from my college French 101 textbook: a fenêtre in French is a window in English–fine so far.  But, say window in English, and the referent is most likely a casement window, specifically–one that slides up and down.  Say fenêtre in French, and the reference is most likely a window that opens in the middle–horizontally.  (We would call this a French window in English.  See this post for a list of things that we call French something-or-other in English that aren’t called anything of the sort in French.)  And, as I said, there often isn’t just one.  A language that I worked on in grad school has the word invert.  But: invert what?  If you’re inverting a hollow object, that’s one verb–if you’re inverting a solid object, it’s another verb.  French has maybe two words for snow–la neige, and la poudreuse (powder snow).  Depending on how you count, English has 13 or 55 or 120 (scroll down past the Inuit words) or 182 words for snow.  So: not a 1-to-1 correspondence.

Having at least mentioned some of the theoretical issues, let’s look at the practical points of buying and using a bilingual dictionary.  In these days of Amazon, you can use reader reviews in a way that we never could before–it’s really a nice advantage over the old pre-Internet days.  However, there are also some specific things to look for.

  • Example sentences: you want a dictionary with example sentences, at least in the language that’s foreign to you.
  • Verb + preposition combinations: a good dictionary should tell you which prepositions, if any, go with which verbs.  You need to know, for instance, that in English you shoot at something, you lean toward (have a preference for) something, and you stop doing something, with no preposition.  Likewise, in French you need to know that you tirer sur or tirer contre (shoot “on” or shoot “against”) something, you pencher pour (lean “for”) something, and you arrêter de (stop “from”) doing something.
  • If you are working with language(s) that have gender, you want the gender to show up both in the Language1 -> Language2 section and in the Language2 -> Language1 section.  If you look up kitchen towel and find that the translation to French is torchon, you don’t want to then have to go to the French -> English section to see whether it’s le torchon (it is) or la torchon (it isn’t).
  • This might seem obvious, but make sure that the pronunciation is given for the words in any language whose pronunciation isn’t obvious from the spelling–and, yes, that includes both English and French.
  • This takes a while, but: when you find the word that you’re looking for in the other language, you might want to look it up in the other direction.  For example: suppose that you look up the English word towel in a crappy bilingual English/French dictionary.  In a crappy dictionary, you might find the following: serviette, torchon.  Both of those can, indeed, be used to translate towel from English to French–but, they’re not equivalent.  Serviette is for a bath or beach towel, while torchon is for a kitchen towel.  You want a dictionary that will distinguish between the various possible translations.  It’s often useful to look the French words up in turn (or the English words, if you’re going from French to English).  If you do that, you’ll find that a serviette can be a towel, but also a napkin, or a briefcase.  A torchon, you’ll find, can also be a messy document, or a rag.  It’s good to be on top of this kind of thing when you’re trying to choose between supposed synonyms.
  • Labelling of registers, or levels of appropriateness: you most definitely want a dictionary that includes slang, obscenities, informal words, etc., or you’re not going to get very far in real life.  However, you also want a dictionary that labels words that are non-standard–offensive words, etc.  This kind of thing can be really, really hard to catch when you’re learning a language from movies, your neighbors, etc.

The always-awesome Lawless French web site has a good page on the subject of how to use a bilingual dictionary, and it has much better examples than I do.  You can find it here.

So, what are some good bilingual English/French dictionaries?  Here are some options.

  • The best thing out there these days is almost certainly WordReference.com.  It has lots of language pairs, example sentences, colloquial expressions, pronunciations, male and female forms of adjectives, plurals, a verb conjugator, and a reverse look-up feature that does exactly what I suggest you do in the last bulletted item above.  The auto-c0mplete feature in the search box saves me enormous amounts of time (and guessing about spellings).  There’s an excellent WordReference iPhone app.  Be aware, though, that the iPhone app will not generally let you look up obscenities–you have to go to the web site for that.
  • For the Kindle or for the Kindle app on your phone, the Collins English-French and French-English dictionaries are quite good.  They’re quite highly rated on Amazon.com.  I have the Collins dictionaries on my phone, and use them whenever I don’t have Internet access and therefore can’t get to WordReference.com.  The Collins dictionaries also have an advantage over WordReference: they don’t give as many super-subtle translations.  The only bad thing about WordReference is that it can sometimes give an overwhelming number of other-language translations.  That’s great when you want it, but when you don’t, you might prefer the Collins dictionary.  As it happens, there is a Collins dictionary tab on the WordReference site, and it’s easy to click on that.
  • Linguee.fr is fantastic for seeing things in context.  You will generally get lots of example sentences.  There’s an iPhone app for that, too.
  • Reverso.net is another good one for seeing things in context.  It sometimes has better coverage of colloquial, slang, and obscene language than Linguee does.  Again, there’s an iPhone app.

I found Nairi on Facebook recently.  I sent her a friend request–no response.  Is it because she doesn’t remember who the hell I am?  Is it because she hates me for taking her dictionary?  I have no idea.  Nairi, if you’re reading this: I’m sorry!

Refugees are dying and I can’t understand the word for “capsize”

Refugees and migrants are dying in shocking numbers in the Mediterranean. Here is some vocabulary that you’ll need to know to talk about the tragedy in French.

Map of the European migrant crisis as of 2015. Picture source: https://en.wikipedia.org/wiki/European_migrant_crisis#/media/File:Map_of_the_European_Migrant_Crisis_2015.png.

One of the ways that the world is sucking right now is the migrant crisis in Europe.  As I write this (in April 2016), there are tens of thousands of refugees and migrants stranded in Greece.  Many of these people cross from Turkey to Greece by boat, and many go from North Africa to Italy by ship.  Tragically high numbers of these sink; in April of last year, five vessels sank, with a death toll of about 1,200 people.

The other day I was listening to the news on the radio.  It was yet another story about the refugee crisis.  The word aufrage kept coming up, but I couldn’t find it in my dictionary.  Un aufrage, I kept hearing.  Looking up similar stories on line solved the mystery: it was not un aufrage, but un naufrage–a capsizing or shipwreck.  I had “segmented” (as linguists say) the n of naufrage as part of a separate word, coming up with un aufrage. 

This isn’t an uncommon phenomenon.  One of the surprises for students in introductory linguistics classes is that in speech, there are no breaks between words–if I showed you a spectrogram (a sort of recording of a sound wave) of a sentence, you would see a continuous sound.  “Segmenting” that stream of speech into smaller units is something that humans do–it’s not something that’s there in the acoustics.

Occasionally speakers of a language will, over time and as a community, “reanalyze” words in a way that changes the segmentation, and eventually the pronunciation.  The word uncle is a word that has undergone this process.  A variant of the word in English is nuncle.  Oxford describes it as archaic or dialectal, but it’s there.  You can see it in Shakespeare:

Can you make no use of nothing, nuncle?

–King Lear, Act 1, Scene 4

The word is thought to have come from a segmentation of phrases like mine uncle as my nuncle, thine uncle as thy nuncle, etc.

The same thing can happen in other languages, too–any time people speak, there’s an opportunity for segmentation errors.  Children who are learning their mother tongue often try out different segmentations.  For example: in a past post, we looked at some bear-related vocabulary in French and English.  Here are various and sundry relevant phrases:

  • un ours: a male bear.
  • une ourse: a female bear.
  • un ourson: a baby bear; a teddy bear.
  • un nounours: a teddy bear.

I once read a great blog post in which a French guy wrote about his toddler producing three different pronunciations of the word ours (male bear) in one day: ours, nours, and I believe lours (the last one would be a reanalysis of l’ours, “the bear”).  (Sorry I’m guessing about that last one–I can’t find the guy’s post.)

Linguistics geekery, which you should feel free to skip: one of my homeworks in Phonetics 101 was to look at spectrograms and find indications of syllabic association, which can correspond to word segmentation, on occasion.  It’s possible to do so–sometimes.  For nasals in French, as far as I know, it would be restricted to some variability in when a vowel is nasalized before a nasal consonant, versus when it’s produced as a sequence of an unnasalized vowel before a nasal consonant.  American English speakers, who have no contrast in nasalization versus lack of nasalization before a vowel, are unlikely to be able to perceive it, and I don’t know at what age a French kid would be likely to acquire it.

I have no clue how the current situation will or should be resolved.  Obviously, if your town is being destroyed by the Syrian government, or ISIS, or whatever other assholes are causing death and misery in the Middle East these days, it makes sense that you would take your family and go elsewhere, and it’s simple human decency to shelter people in that situation.  However, the situation is not clear in other ways–even the fact that the Wikipedia article on the subject is titled European migrant crisis and not European refugee crisis is a loaded choice, and one that has implications about how the people who are affected should be treated.  The situation continues to evolve, with European and world sympathies tilting now one way and now the other–in favor of sheltering the affected people after a tragedy like the widely-publicized drowning of a Syrian toddler, and in opposition to it after the despicable assaults on women by crowds of migrant men last New Year’s Eve in Germany.  Certainly the situation will have long-range effects on Europe.  I began this post by talking about one of the ways in which the world sucks right now–the existence of this crisis.  One of the ways in which the world doesn’t suck right now is that many people in many countries have been very active in welcoming refugees, providing real support services for them, and generally acting like decent human beings.  This will get worked out.


It’s raining, it’s pouring, the old man is snoring: how to talk about rain in English and French

How to talk about rain in English and French.

It’s raining, it’s pouring, the old man is snoring,

He went to bed and he bumped his head and he didn’t get up ’til the morning.

–Children’s song

Adam Gopnik once described Paris as “a scowling gray universe, relieved by pastry.”  The “gray” part comes from the observation that it’s very often cloudy here.  Actually, one of the things that I love about Paris is that it rains here.  In the US, I live in a very sunny, dry part of the country–300 days of sunshine a year.  However, I grew up in a very, very wet part of the country, and I miss that.  So, coming to Paris in March and seeing flowers bursting from wet earth on my walk to work through the forest is a real treat.

Being from a very wet place, I have a large vocabulary for talking about rain in English.  Here are some examples of relevant verbs.  These are all impersonal verbs, using what linguists call a pleonastic pronoun, i.e. it’s:

  • to rain: the default verb.
  • to pour: to rain hard–see the children’s song above.
  • to rain cats and dogs: to rain hard.
  • to rain/pour buckets: to rain hard.
  • to mist: to rain very lightly.
  • to drizzle: to rain, especially if it’s cold.  (I’ve seen a couple definitions of this as “to rain lightly.”)
  • to sprinkle: to rain, especially for a short period of time.
  • to storm: to rain very hard, often with thunder and lightning.

Usage examples:

  • pleuvoir: to rain.  Il pleut: it’s raining.  (I always seem to confuse this with il pleure, “he’s crying.”
  • Il pleut à verse: it’s pouring.  (Native speakers: can we do the liaison here?, i.e. il pleu tà verse?)
  • Il pleut des cordes: it’s raining cats and dogs, it’s pouring rain.
  • Il tombe des cordes: same thing.
  • Il bruine: it’s misting.
  • Il crachine: it’s sprinkling.
  • y avoir de l’orage: to storm.
  • faire de l’orage: to storm.

I’ve focussed entirely on verbs here.  For lots of nouns and adjectives related to rain in English, see this great post from the EngVid.com web site.



Dictionaries and sexism

One day a friend and his wife dropped by my office to share the good news that they’d just seen an ultrasound of their baby-to-be.  They didn’t speak English, so we spoke Spanish.  Is the baby a macho or a hembra?, I asked-a boy, or a girl?  My friend and his wife cracked up (American English for “started laughing hard,” although it can also mean “to go crazy”–be careful).  It turns out that macho and hembra are used only in what you might think of as a biological sense–that is, to refer to male or female animals.  A baby boy or baby girl human is a niño or niña.  There’s a similar set of words in French for describing biological sex, as distinct from gender, and that set of words can come in handy. We’ll see more on this below, but first some big-picture issues.

Most dictionaries today are descriptive, rather than prescriptive, meaning that their goal is to describe how language is used, rather than to try to prescribe the way that the editors think that it should be used.  With that goal in mind, what should the editorial stance be towards the ways that language reflects society, and in particular, shitty things in a society–say, sexism in America and the United Kingdom?  Here’s an article on the subject from the New Yorker, and if you like it, be sure to follow the link in it to Deborah Cameron’s article–she is an amazing linguist.  (Full disclosure: I took sociolinguistics from her as an undergrad.  Favorite quote: “Well, that rather fucks the theory up, now, doesn’t it, Kevin?”)


Relevant French vocabulary, with a quote from the French Wikipedia page on sexism:

Le sexisme est une attitude discriminatoire adoptée en raison du sexe.

La critique du sexisme dénonce l’idée selon laquelle les caractéristiques différentes des deux genres masculin et féminin impliqueraient l’attribution de rôles, droits et devoirs distincts dans la société. Elle dénonce cette construction de la société qui attribue un caractère, un rôle, des prédispositions physiques et affectives selon le sexe. La notion de sexe n’est alors plus une notion de sexe biologique (mâle et femelle) mais une construction sociale du genre féminin et du genre masculin limitant par là même le développement de l’individu sur les plans personnel, affectif, professionnel et social.

  • dénoncer: to denounce or condemn; to back out of, to renege on.
  • le devoir: duty, obligation; homework, assignment.
  • affectif: emotional.
  • le mâle: male, in a biological sense.  Slang: studmuffin.
  • la femelle: female, in a biological sense.  Slang: bitch.

None of this stuff is simple or straightforward. As a sociolinguist once said to me: if a language reflects sexism, homophobia, or whatever other nastiness, that’s data. The claim of some of the people interviewed in the article is that when a lexicographer includes sexist language in a dictionary, they’re not just describing it, even if they think that that’s what they’re doing–they’re endorsing it. A good descriptive lexicographer would protest against that claim–see this recent post. How does the person on the street see it? Is the interviewee right in asserting that people perceive the dictionary as an authoritative stamp of approval on the language, rather than seeing it as descriptive of the language, like the lexicographer does? That’s an empirical question, and I don’t know the answer. If you go out and do a survey on this, please let the rest of us know the result…

What to expect as a graduate student (in my lab, anyways): computational linguistics/natural language processing edition

When you’re looking for an advisor, it’s good to get a realistic picture of what their expectations are. Here’s a computational linguist’s take on that.

I’m meeting with a prospective student today.  That means that I’ll want to know a lot about her motivations and background.  It also means that I’ll want her to walk out of my office with a realistic picture of what my expectations are for students and post-docs.  I thought that I’d share them here not because I’m sure that they’re valuable, but so that I can get feedback from people in similar positions–and from students.  Some of the specifics of this are only relevant to people interested in computational linguistics or natural language processing, of course.  Some of the more general stuff might apply to anyone in graduate school.  You choose.

My expectations basically fall into three categories.

  1. Things that you need to do in order to become an independent researcher, find a teaching job after you graduate, and be professionally successful as an academic.
  2. Things that you need to do in order to be able to get a job in industry, if that’s the direction that you decide to go, and to be an efficient and productive researcher if you stay in academia.
  3. Things that you will need to do in order to participate in the (intellectual) life of the lab.

Things that you need to do in order to become an independent researcher, find a job after you graduate, and be professionally successful as an academic:

  • Learn something about language.  Learn facts about how it works (to the extent that we know how it works), and learn something about what the interesting open questions are–and how you might try to answer some of those questions from a computational perspective.
  • Take a course in natural language processing.  Most people don’t come to this field with a background in hand, at least not in the biomedical world that I live in.  Formal coursework with homework, tests, etc. is a good way to get on your feet.
  • Publications, publications, publications.  Any time that you’re thinking about a project, think about how you’re going to publish it.  That means rotations, comps, and anything that you let yourself get dragged into just because it looks interesting.  If  you can’t even imagine a way to publish the work, consider finding a new topic for your rotation/comps/whatever.
  • When you’re thinking about projects, don’t fall into the trap of “just” developing technology.  There’s almost always an interesting scientific question that your work could be relevant to–figure out what it is.  If you can’t figure out what it is, consider dropping the project.
  • Read How to complete and survive your doctoral dissertation, or some equivalent.  You need more advice than I will ever think to give you.  You also need better advice than I claim to know how to give.  Sternberg’s book is pretty much how I approached my own graduate school experience, and it worked.  It’s also a pretty fair approximation of what I’ll expect from you, and a good approximation of what you should expect from me (and demand, if I don’t give it to you).  I typically keep a couple copies in my office, and hand one to anyone who wants to join my group as a doctoral student.

Things that you need to do in order to be able to get a job in industry, if that’s the direction in which you decide to go, and to be an efficient and productive researcher if you stay in academia:

  • Test and document your code and your projects.  Learn to use a unit testing framework and a version control system.
  • Learn something about databases.  This is the most common thing that I see in job ads that grad students often don’t have.
  • Learn one architecture.  Some choices are UIMA, GATE, and BioC.  Industrial-strength language processing often requires industrial-strength software architectures.  You should know one of them.
  • Learn some programming languages.  You’re very likely to need one object-oriented, compiled language, and one scripting language.  I don’t really care which, but you need to become comfortable in something.
  • Learn some of the useful open source tools of our profession.  You don’t need to know all of these by any means, but learn at least some of the following:
    • Lucene
    • R
    • NLTK, or alias-i, or Stanford CoreNLP (I use all three pretty routinely)
    • See above about databases

Things that you will need to do in order to participate in the (intellectual) life of the lab:

  • Plan to report to me and to everyone else in the lab what you’ve accomplished in the past week and what you plan to do in the week to come.  This will not only shame you into making regular progress, but if you go into industry, this is the best tool that I know of for maintaining a good professional relationship with your boss and with your colleagues more generally.  Even today, I meet with my boss on a weekly basis to report what I’ve done, pass on what I plan to do, and ask their opinion about the latter.  This way, you both have the same conception of what the priorities are, and if you are running into trouble, everyone else will know–and figure out how they can be helpful–early in the game.
  • Learn to be comfortable with asking questions.  Learn to be comfortable with asking for help.  Regarding the former: the absolute best graduates that our program has ever produced have been people who showed up in my office every week not to report on what they’d done, but to ask questions.  Regarding the latter: as a mentor once told me, it’s never a sin to ask for help, but it’s always a sin to wait to ask for help until it’s too close to the deadline for anyone to do anything to help. 
  • Participate actively in lab meetings and our reading group.  That means being prepared to explain what you’re doing and ask for input from your colleagues; taking an active interest in what your colleagues are doing and asking them questions about it; keeping an eye on the literature, finding stuff that you think is interesting, and volunteering to present it.
  • Be prepared to be independent to some degree.  I will do my best to keep on top of what you’re doing and how you’re progressing with it, but you need to be the one who makes sure that you show up for work every day (whether that’s in the lab or at your kitchen table) and put in a solid day’s effort.  (That can and should include spending time discussing your work and your fellow students’ work–that always counts, actually.  But, in order to do that, you usually need to be physically in the department.  Or, the campus pub.  Whatever, as long as you’re physically there, and you’re there to work.)  You need to go into the relationship with your advisor knowing that if they’re worth committing several years of your life to, then they’re probably pretty busy, and you need to be able to drive yourself.

I’d love to hear feedback on this stuff, both from people who educate grad students and from grad students themselves.  What works?  What doesn’t?  What am I leaving out?  As Kurt Vonnegut used to put it: thanks for your time and attention.

How linguists think about dictionaries: why the first word that we look up is f…

Hand a linguist a dictionary and the first thing they’ll do is look up a vulgarity. Here’s why.

Trigger warning: this post contains a vulgar word related to sex.

Linguists have a complicated relationship with dictionaries.  On the one hand, many of us were the kinds of kids who would sit around flipping through dictionaries, reading the definitions of random words just for fun.  On the other hand, most of us are very aware of the many deficiencies of dictionaries–it’s the kind of thing that you talk about in a Linguistics 101 class.

When I say “dictionary,” I mean a resource that gives definitions for words.  This is what would typically be called a “monolingual dictionary.”  There are other kinds of dictionaries.  For example, “bilingual dictionaries” give not a definition per se, but an equivalent in some other language.  “Visual dictionaries” give an image of the thing that is represented by the word, rather than a textual definition, per se.

The “Dictionary of the French Academy,” probably the best-known prescriptive dictionary in the world. Picture source: http://www.dictionnaires.culture.fr/fichedic/Dico-academie.html.

If you hand a linguist a dictionary, the first thing that they’re likely to do–assuming that it’s a dictionary of English–is to look for the definition of the word fuck.  This is not because of a puerile obsession with sex, but because knowing whether or not the dictionary includes fuck tells you where the dictionary fits with respect to the most primary distinction: between descriptive dictionaries, and prescriptive dictionaries.  In general, a prescriptive dictionary lays out the editors’ take on the proper use of the words in the language.  A prescriptive dictionary would, thus, be unlikely to include a definition of the word fuck. 

Webster’s 3rd, probably the best-known descriptive dictionary in the world. Picture source: http://www.britannicaindia.com/html/products/print/dictionary/MW%20International%20dictionary.html.

In contrast, a descriptive dictionary has no conception of properness or correctness–its goal is not to “prescribe” how the language should be used, but rather to “describe” how it is used.  You can imagine uses for both of these.  Personally, I almost always want a descriptive dictionary–if I need to look up an English word that I don’t understand (it does happen, even in your native language), I don’t care about how it’s “supposed” to be used–I want to know how it is used, so that I can get back to reading my book.  On the other hand, imagine that I’m trying to write a professional email in French.  I have very little awareness of the social significance of any of the words or constructions that I know in French, and I would love to be able to make informed decisions about whether to use a normal-register word like désormais (now and into the future) or a higher-register word like dorénavant (also now and into the future, but more refined, formal, elegant–or so I’m told).  That’s the kind of thing that would create one of the very rare situations when I would want a dictionary with a prescriptive bent.  Most English-language monolingual dictionaries today are descriptive, but that has not always been the case, and when the monumental Webster’s 3rd came out in 1961–a descriptive dictionary, unlike the preceding Webster’s 2nd, which was prescriptive–it was hugely controversial.  Wikipedia describes it as “the opening shot in the culture wars, as conservatives detected yet another symbol of the permissiveness of society as a whole and the decline of authority.”

Another important distinction between dictionaries is how they order their definitions.  There are at least three options:

  • Historical
  • Centrality-based
  • Frequency-based

It’s the nature of human languages (or, more accurately, the nature of human languages in a social context) to change with the passage of time.  In a dictionary that orders its definitions historically, you have the oldest definition first, and the most recent definition last.

It’s pretty common to think of some definitions of a word as more “central” to the meaning of the word than others.  For example, we had a couple of blog posts a few weeks ago on the role of the prototype in theories of how words have meanings (see here and here).  For example, we recently talked about the French word fort, which can mean “strong,” and also “very” or “quite.”  You might think of “strong” as more central, with “very” being related to that in some way, but not really a “central” or principle meaning of the word.  Centrality-ordered dictionaries give you the most central meaning(s) first, followed by the increasingly peripheral ones.

Thanks in part to the existence in recent years of large bodies of text that have been “annotated” (manually marked) with specific meanings of words that have multiple meanings, we can now do a fair amount of statistical study of the distributions of the frequencies of word meanings, as opposed to just the frequency of the words.  Things like this make it possible for us to order the definitions in a dictionary by how common they are in actual language.  Of course, this raises all kinds of issues: e.g., when we say “common in actual language,” what do we mean by “actual language”?  It’s not like we have a stratified random sample of every word of English out there, and even if we did, it would take an enormous amount of time to mark which words had which meanings, even if we knew what all of the meanings of the words were.  (Often there is not even agreement about how many meanings a word has, let alone what they are.  Susan Brown of the University of Colorado has researched this extensively.)

How do you tell which of these you are holding in your hand–historically ordered, centrality ordered, or frequency ordered?  You pretty much have to read the introduction to find that out.  Yes, dictionaries do typically have introductions–and, yes, there do exist geeks such as myself that read them.

Which dictionary do I use?  Probably not a shocker: I have many monolingual English dictionaries lying around my place, and there are some electronic ones that I use, as well.  As we’ve seen, dictionaries can fill different kinds of needs; the kinds of classifications that we talked about today can you help you pick the ones (or several) that are good matches for what you’re trying to do with it.


Parallel corpora, collocations, and crazy people on the Métro

In which an encounter with a crazy guy on the subway leads to a statistical analysis of French adverbs.

One evening I was riding the metro home when a guy got into the car with some used books to sell.  A man sitting across the aisle from me asked to see them.  He flipped through one of them, then took a pen out of his jacket pocket and began circling words–in this book that the other guy was trying to sell.  Are you going to buy that?, the would-be bookseller asked the guy with the pen.  They exchanged words–the bookseller was not happy about having his books marked up.  The bookseller said something that Mr. Pen apparently thought was obvious or stupid.  Il est fort, lui, he snorted–he’s a sharp one. 

The central meaning of fort/forte is “strong,” but it can also be used adverbially.  You hear it a lot that way, and I’ve been trying to figure out exactly when you can use it in that way–it’s often the case that there are word combinations that are possible in a language, but that don’t sound right.  Rather, there are particular words that are conventionally used in very specific combinations.  Violeta Seretan of the University of Geneva gives some examples of English words that are used to describe the magnitude of various nouns.  The semantics of each of these is the same, but the words that are typically used are quite different.  We talk about big problems, heavy rain…  How about injury?  (Answer below.)  It would certainly be possible to say large problem, but it’s nowhere near as likely, and it sounds odd, as a native speaker.  For example, you could say large problem, but it seems odd.  I wanted to be able to demonstrate that this corresponds to some actual statistical tendency, not just my intuitions, so  I searched the enTenTen corpus, a collection of almost 20 billion words of written English, looking for big problem and large problem.  Here are the frequencies:

  • big problem: occurs 6 times per million words.
  • large problem: occurs 0.5 times per million words.

Big problem occurs twelve times more often than large problem–the latter is possible, but it’s not really what you would expect to hear from a native speaker.  We call these things like big problem “collocations”–combinations of words that occur statistically more often than you would expect by chance.

You can find collocation dictionaries for English, and they’re quite useful for second-language learners.  I don’t know of any for French, though, or at least not where to find them in the US, which is where I am at the moment.  (I’ve seen similar things in Canada.)  I additionally want to know how these adverbial uses of fort should be translated into English, so I need a way to figure this kind of thing out for myself.

First step: find a whole lot of French text in some easily searchable form.  I started with the French section of EUROPARL–a collection of documents from the European Parliament, translated to/from a wide variety of languages.  The French section of EUROPARL contains about 59 million words–so, a whole lot–and you can access it through the Sketch Engine web site–so, easily searchable.  A quick search showed me that fort is quite common in that data set:

Screenshot 2016-04-10 13.23.54
Fort shows up 17,130 times in French section of the EUROPARL corpus–257 times per million words.  That’s pretty frequent.

Once I know that, I know that there will be enough data to calculate the collocations–recall that this is a statistical thing, so you need plenty of data.  The Sketch Engine interface gives me a number of options for how to do the calculations (scroll down to get past the screen shot):

Screenshot 2016-04-10 13.26.44

…which I show you just so that you’ll see that there are a lot of approaches to doing this. I just went with the defaults.

The calculations yielded quite a few possibilities.  Here are some of them:

Screenshot 2016-04-10 13.30.59

If you’re a stickler for data, you might have noticed that the collocations are ordered by the log of the Dice coefficient, which you could think of as a measure of the statistical effect, I guess.  I am really looking for the most common collocations involving fort, though, so I’ll reorder by the cooccurrence count, i.e. the raw count of how often the collocations occurred:

Screenshot 2016-04-10 13.53.36

Crap–that basically tells me nothing.  Why not?  Zipf’s Law.  Remember that Zipf’s Law tells us not only that most words are pretty rare, but also that some words are really, really common, and in French, that certainly includes de (“of”), et (“and”), une (“a”), and the rest of what we’re seeing here.  (Moral of the story: don’t expect the most frequent things in a language to necessarily be the most revealing things in a language.)  If I scroll down a bit, though, I see bien on the list.  683 examples of this–a frequency of 10.25 per million words.  Bien is often an adjective, which would presumably make fort adverbial in these cases, so we’re on to something now.  Let’s check out some of those examples:

Screenshot 2016-04-10 13.58.14.png

So, now I have some cases where it would make sense to use fort, but I want to know how they would correspond to English, too.  This requires that I have access to the corresponding English text.  No problem–recall that the EUROPARL corpus is multilingual.  In particular, it is what is known as a parallel corpus, which means that it contains the same contents in multiple languages, not just similar contents (although that kind of corpus can be useful, too).  I searched for the phrase fort bien.  Here’s an example of the output:

Screenshot 2016-04-10 14.12.24

So, now I have some French/English equivalents for fort bien:

  • Étant donné les prévisions de la politique structurelle ­ que je connais fort bien With these forecasts of the structural policy – which I know very well
  • ce que Jean-Pierre Chevènement a fort bien nommé récemment… referred to recently, and very aptly, by Jean-Pierre Chevènement
  • C’est pourquoi, comme l’a déjà fort bien expliqué M. Kalas  Hence, as Mr Karas has stated to his credit
  • je comprends fort bien la préoccupation  … I have a great deal of sympathy for the unease
  • Vous savez fort bien que…  You know very well that
  • non seulement parce que le président le connaît fort bien…  …not only because the President is very familiar with it…
  • Il est fort bien d’ organiser des réunions, mais ce sont les résultats qui comptent.  Meetings are all very well, but it is the result that counts.
  • ils se tirent fort bien d’affaire.  …they are managing really rather well.
  • et je les comprends fort bien.   …which I fully understand.
  • Ils les connaissent fort bien et un par un.  They recognise each and every one of them very well.

I’m feeling good about how to use fort bien now, but I want to know about other ways that fort could be used with an adjective.  So, I’ll do another search of the parallel corpus (i.e. the matched French and English texts), but this time I’ll just search for fort, and I’ll specify that I want it to be an adverb.  Here are some of the results:

Screenshot 2016-04-10 13.39.56

Now I have some general examples of how to use fort:

  • Nous estimons fort positif que  We see it as a very positive sign that
  • Le rapporteur constate également fort justement que The rapporteur has also quite rightly stated that
  • Ce que nous faisons maintenant est probablement fort important…  What is being done may well be very important
  • …l’ Union européenne a fort justement octroyé  …the European Union was right to support…
  • nous entretenons des relations bilatérales fort satisfaisantes avec  …We have very satisfactory bilateral relations with

I don’t know every adjective with which it would be OK to use fort, but I know one more than I did when I got out of bed this morning, and I’m cool with that–one less time when I’ll have to use très, which is all that they teach us in school.

A colleague had some observations on this:

On top of being used in collocations, it also marks a style / genre which is somewhat formal or elevated (“soutenu”). This might explain why it remains frequent mostly in collocations and is less frequent (or more marked) in freer combinations. This gives the expression a literary turn or a pretense to a higher register.  Both in speech and in writing, it is “soutenu.”

Another native speaker had this to say about it:

“Fort” is used as a synonym of “très”, before adjectives or adverbs . You can use it in about any case, it’s just more elegant than “très”, but not really literary .

The Mr. Pen guy on the subway turned out to be pretty crazy, as far as I could tell.  At one point he snapped at my adorable cousin, who happened to be visiting, and I told him to cut it out.  This was followed by an initially amusing conversation between him and me that at some point degenerated into a loud tirade on his part.  I kept telling him that my French wasn’t that good and I couldn’t understand him, but he just kept going and going.  Eventually French people around us began telling him to stop being an asshole and words to that effect, so I assume that it wasn’t very nice, but honestly, I couldn’t tell you.  At some point a large and very drunk French guy got on the subway car, and started seriously getting in Mr. Pen’s face–it was clear that this was going to turn violent.  Mr. Pen was a very diminutive Haitian man, and I wasn’t going to watch him get the shit beaten out of himself no matter how bizarre he was being, so I got involved.  The train stopped, Mr. Pen jumped out, and Mr. Drunk Guy launched into an animated discussion with me about American heavy metal, punctuated by snatches of Metallica songs.  All in all, an unusual evening on the metro, but not an unpleasant one by any means–just part of life in The Big City, as we say in English.

Oh: it’s serious injury.



The Great Sardine Can Mystery

My search for a healthier breakfast leads to a three-day investigation of a one-syllable word.

2016-03-16 17.33.33
Picture source: me.

I’ve been struggling to get up the hill on the way to work lately.  I decided that this was due to my proteinless French breakfast of coffee, bread, and Nutella, and stopped at the little store just outside the train station and picked up a can of sardines for after the climb.  Zipf’s Law being what it is, this set off a three-day struggle to figure out how to read the label on the can.  I spend a lot of time in France trying to differentiate and remember the meanings of words that look alike, and this was one of those occasions.  After three days of this, I still don’t have it all figured out.  At its base, this is an issue of various and sundry words that look or sound like forms of the word arrêter.  Read on if you want to feel my pain.

2016-03-16 17.33.08
“Nothing stops you.”  Picture source: me–it’s a water bottle that I got from the cafet’.

arrêter: The basic meaning of this verb is “to stop,” which is simple enough, but there are some subtleties involving the pronominal form of the verb (s’arrêter) and “direct” versus preposition-specific forms of the verb.

arrêter: The verb can also mean “to arrest,” as in taking someone into police custody.  Scroll down–this picture is too good to shrink.

Screenshot 2016-03-27 03.15.40

Screenshot 2016-03-27 03.17.13
“VIDEO. Ukraine: Chewbacca arrested after campaigning for Dark Vador.” Picture source: http://www.lexpress.fr/insolite/video-ukraine-chewbacca-arrete-apres-avoir-fait-campagne-pour-dark-vador_1729418.html.

arrêter de: this is followed by an infinitive, and would translate as “to stop verbing.”

Screenshot 2016-03-25 00.52.09
“The ten rules for stopping smoking.” Picture source: screen shot of http://www.stop-tabac.ch/fr/10-regles-pour-arreter-1.
Screenshot 2016-03-25 00.54.53
“I’m going to stop judging myself.” Picture source: screen shot of https://www.facebook.com/pages/Jarr%C3%AAte-de-me-juger/634289229988133.

l’arrêt: a stop, as in a bus stop, a work stoppage, etc.  Also: a decision, as of a court.

Medical-care-specific: The verb arrêter can have a very specific meaning, which is to put someone on sick-leave.  The subject of the verb presumably has to be someone who is capable of putting you on sick-leave.   (Linguistics geekery: this kind of behavior, where the meaning of a verb can change substantially depending on the subject and/or object of the verb, is probably best accounted for by the Generative Lexicon theory, pioneered by James Pustevosky of Brandeis, and more recently elaborated by Elisabetta Jezek of the University of Pavia.)

Screenshot 2016-03-25 00.59.40
Mon médicin m’a arrêté: “my doctor put me on sick leave.” Picture source: screen shot of http://www.viedemerde.fr/travail/7503604, the #shitlife web site.  (Sorry–that’s what vie de merde means.)

However, just because “doctor” is the subject doesn’t mean that the verb necessarily has that meaning:

Screenshot 2016-03-25 01.02.41
“It’s because my doctor took me off of The Pill.” Picture source: screen shot of a page from the book Lettre overte d’un médecin à une société malade,” by Alain Bellaiche, taken from Google Books.

Here, the m’ is an indirect object pronoun, and it’s la pilule (“The Pill”) that is the direct object.  (That bit of extra information probably doesn’t help much–sorry!)

l’arrêté (n.m.): this is a noun meaning “order” or “decree.”  It shows up quite a bit in official communications of various and sundry sorts–see below.

Screenshot from 2016-03-25 11:34:02
“The decree concerning fighting termites.” Picture source: screen shot from http://www.ville-guyancourt.fr/Cadre-de-vie/Urbanisme/L-arrete-de-lutte-contre-les-termites.

l’arête (n.f.): another noun, meaning (among other things) “fishbone.”  This is the one that finally drove me over the edge to look up all of these various and sundry meanings.  I would’ve gotten this one a lot quicker, but it took me, like, three days to realize that there was only one R.

2016-03-16 17.33.33
“Sardines in extra-virgin olive oil–boneless.” Picture source: me.
    • un arrêt: a judgement or decision, in a legal context.  Un arrêt de la cour: a court ruling.
    • un chien d’arrêt: a pointing breed of dog.

être à l’arrêt:

      to be on point (of a dog).

Le chien est à l’arrêt:

      the dog is on point.  (Thanks to native speaker


    for these.)

Even after this in-depth investigation, I don’t understand all of the various and sundry permutations of these words with their as, their rs, their ês, and their ts.  Here’s an example that I came across–I have no clue whatsoever what it means.  (Various native speakers have suggested that it’s an error–see the Comments section.)  PS: in the end, sardines aren’t that great of a solution to the whole I-need-more-protein problem–they smell, and I do have office mates.  Time to ramp up my already-enormous cheese consumption yet again, perhaps…

Screenshot 2016-03-25 00.47.33
Picture source: screen shot of the lemonde.fr web site.


I guess it’s not so secret any more: why linguists study children’s language games

Why would a linguist study children’s language games? It turns out that they can tell us a lot.

Screenshot 2016-04-08 13.19.56
Picture source: screen shot from the LINGUIST mailing list.

I subscribe to a mailing list that gets me news about current events in linguistics–upcoming conferences, tables of contents of new journal issues, fellowship opportunities–and notices about newly published books.  Often I look at some of this stuff and wonder: what the hell must non-linguists think when they see something like this?  Today’s email brought me an excellent example of the phenomenon: the book notice that you see above.  How could a Berber equivalent of the Pig Latin that most of us learned as kids (unless you’re French, in which case maybe you learnt Louchebem) possibly be worth a book-length treatment?  Actually, secret languages, also known as language games (described as game-like variants on some actual language, typically used by kids to mystify the uninitiated), can be quite interesting, from a linguistic point of view.  For example: in teaching introductory linguistics, many of my fellow grad students would use an example from Pig Latin to illustrate a non-intuitive fact about the English sound system.  In English, the sound that we spell ch is actually a combination of two sounds–t, as tip, and sh, as in ship.  Say t-ship with the t and the sh immediately next to each other–tship–and you’ll find that it comes out as chip.  If you survey a large class of Ohio State undergraduates, you’ll find some for whom the Pig Latin word for chip is ipchay.  For others, it’s shiptay.  What does that tell you?  For the Buckeyes (Ohio natives) with the shiptay form of chip, it’s pretty clear that ch is, on some level, represented mentally as the sequence of two sounds that it actually is.

Googling around a bit for examples of the use of secret languages in linguistic research, I came across this paper from Ruth Day at the famous Haskins Labs.  Day developed a simple secret language (take English words and substitute an r for every l and an l for every r) and taught it to subjects.  She also put them through what are known in the psychology literature as dichotic fusion tests.

normal bimodal distribution
Normal distribution (upper left): results cluster around a single typical value, plus or minus a bit.  Bimodal distribution (lower right): results fall into one of two groups, plus or minus a bit.  Picture source: click here.

Dichotic fusion tests assess how people process sounds.  They have an unusual property.  Most tests of sound processing have what’s called a normal distribution.  This means that there’s some typical result, and the results mostly cluster around that value.  In contrast, dichotic fusion tests are bimodally distributed–rather than everyone clustering around some typical value, people fall into one of two categories.  Day found that some of the subjects were good at learning the secret language, and some of them weren’t.  She also found a relationship between how people behave on dichotic fusion tests and how adept they are at learning the secret language: people who were good at learning the secret language mostly fell into one group on the dichotic fusion test, and people who were bad at learning the secret language mostly fell into the other group on the dichotic fusion test.  She speculates that this might be related to individual differences in how “bound” different speakers are by the nature of what the pioneering structural linguist Ferdinand de Saussure called langue and parole–two very different ways of categorizing language.  It’s not immediately obvious that these two different categories exist, and you could interpret Day’s experimental findings as being consistent with the hypothesis that they do.  (And, yes–linguistics as we know it today was invented by a French-speaking Swiss guy.  Even the English-language technical vocabulary of linguistics has kept Saussure’s original French terms, langue and parole.)

Language games are sometimes presented as a form of evidence regarding speakers’ models of syllable structure.  You didn’t know that you have a model of syllable structure?  That’s the nature of knowledge about language–it’s mostly not conscious, and, as we say, “not accessible to introspection”–meaning, even if you think about the rules of language and try to figure them out, you mostly can’t.  (If you’re an English speaker: can you explain when to use the and when to use a?  Probably not, but you certainly know, on some unconscious level, how to do so, and you certainly recognize when someone who doesn’t natively speak a language that has an equivalent of the and a messes them up in English.)  The ship-tay speakers were surprised to have this pointed out.  It wasn’t something that they were consciously aware of, but on some level, they seemed to “think” of ch as a sequence of t and sh. 

The French connection: there’s a form of slang in France called Verlan.  It’s not clear whether Verlan should be considered a secret language/language game as such, versus a form of slang, but even if it should be considered a slang, it is clear that its words are formed by a language game.  Phildange explained a bit about how it works in his comments on a recent post.  From a cross-linguistic perspective, it’s quite unusual.  If you observe secret languages from around the world, they tend to work on the basis of one or two of four different kinds of phonological processes (phenomena involving doing something to sounds):

  • insertion of sounds
  • rearrangement of sounds
  • substitution of sounds
  • deletion of sounds

From the perspective of this kind of classification, France’s Verlan is unusual in that it combines a multitude of different kinds of phonological processes.  For lots of details, see this set of lecture notes from Stuart Davis of Indiana University.

I hope that no Berbers were planning on using the waw/ra? secret language to pass messages around linguists in the future, as I guess it’s not so secret any more.  Are you thinking that the book about it would make a great Christmas present for someone?  You can pick up a copy here.

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues