Too many Killians, or the weirdest relative pronouns on Twitter

In which a relative pronoun about which I know nothing turns out to show up even in the most illiterate tweets imaginable.

Yesterday we met the various forms of the relative pronoun lequel:

Masculine Feminine
Singular lequel laquelle
Plural lesquels lesquelles

 We saw how when it’s the object of the preposition à, we get various derived forms:

Masculine Feminine
Singular auquel à laquelle
Plural auxquels auxquelles

There’s a similar set for the situation where it is the object of the preposition de.  Here’s the paradigm:

Masculine Feminine
Singular duquel de laquelle
Plural desquels desquelles

You don’t think anyone ever actually uses these?  Neither did I–Laura Lawless says that these are the hardest pronouns in French for English speakers, and I believe it.  I didn’t think that French people used them much, either.  Then I did a Twitter search.  Holy cow–they’re everywhere!  How did I manage not to run into these before?

Screenshot 2015-12-30 23.44.06
“There are too many Killians in this world–damn, I never know which one we’re talking about.”
Screenshot 2015-12-30 23.46.22
“A passenger on the Air France flight on board which a suspicious object was discovered has been remanded to custody.”
Screenshot 2015-12-30 23.49.49
“Yeah, it’s cool to find a person who you don’t get tired of, but if she gets tired of you, it’s hard.”  (se lasser de: to get tired of, to get bored with)
Screenshot 2015-12-30 23.52.23
“The message of John Lennon: no religion in the name of which you are ready to kill or ready to die.”
Screenshot 2015-12-30 23.54.50
“Everything depends on which ones we’re talking about.”
Screenshot 2015-12-31 00.02.17
“Liberty, equality, education…Themes about which to think.”
Screenshot 2015-12-31 00.09.51
“Grand values in the name of which we are at war against Daesh, if I remember correctly.”
Screenshot 2015-12-31 00.11.40
“Pretending to not recognize the toxic people from whom I have drawn away when I run across them in the street. #Resolution2016″

Take this relative pronoun with a grain of salt

Just because you have a sweet voice doesn’t mean that you won’t use a relative pronoun here and there. Or poison someone.

I’m a sucker for female singer-songwriters with a certain kind of voice.  Ingrid Michaelson has it.  Yael Naïm, too.  Regina Spektor has it, sometimes.  Carole King has it sometimes now that she’s older, but didn’t when she was younger.

Recently I’ve found a French-Canadian woman who has it in spades.  Ingrid St-Pierre is a woman from Quebec who studied psychology, then went on to become a singer.  Why she didn’t win when she competed on the Canadian equivalent of “The Voice,” I’ll never know; she has had a successful recording career, just having released her third album; gets daily airtime on Canadian radio; and is touring like crazy.

Her songs are generally serious, but she has one very funny one, Pâtes au basilic (“Basil pasta”), that is really quite cute.  See the video above–it’s about a psycho-killer ex-girlfriend.  The Zipf’s Law connection for the day: the first verse of the song uses a type of word that I’m not very comfortable with.  It can function as a relativizer, i.e. it can mark a relative clause; it can also be used to ask questions.  (Hence the fact that it is commonly called an interrogative something-or-other.)  In this case, it’s a marker of a relative object, i.e. it’s standing in for a relativized thing, and the thing that is modified by the relative clause is the object, rather than the subject, of the verb.

Enough grammatical terminology–let’s turn to Ingrid’s song.

Mon amour, je t’ai préparé des pâtes au basilic My love, I’ve made you basil pasta
j’ai pris soin d’y mélanger les trucs auxquels t’es allergique I’ve taken care to mix in all of the things that you’re allergic to
faut surtout pas t’inquiéter pour l’arrière goût qui pique  It’s especially important that you not worry about that last taste that burns
j’espère que j’ai bien dosé les gouttes d’arsenic I hope that I’ve regulated well the taste of arsenic

(Translation by me, so take it with a grain of salt.)  There are sooo many things that we could talk about just in this one verse, but let’s stick with the relative.

The word auxquels is related to the word lequel and its different gender-related and number-related forms.  For starters, let’s review those:

Masculine Feminine
Singular lequel laquelle
Plural lesquels lesquelles

 We typically see those forms when the word is being used as an interrogative (Je veux un de ces livres.  Lequel?  “I want one of these books.  Which one?”)  or as a subject relative.  However, even with object relatives, we can see this form in the feminine singular: J’aime la femme à laquelle je pense en ce moment “I love the woman about whom I’m thinking at this moment.”  When the word is being used to relativize an object, we’ll typically have a preposition in front of it, and if that preposition is à, then that undergoes the typical à + le = au and à + les = aux mergers.  So, that’s how we get auxquels in the song.  We have lesquels for trucs “things,” and we’re talking about the person being allergic “to” those things.  À plus les gives us aux, so we have les trucs auxquels t’es allergique–“The things to which you’re allergic.”  For the sake of completeness, here’s the full paradigm:

Masculine Feminine
Singular auquel à laquelle
Plural auxquels auxquelles

There’s a similar set for the situation where the preposition is de, but let’s do those another time.  In the meantime, enjoy the Ingrid St-Pierre video.

It’s just a jump to the left and then a step to the right of Notre Dame

Picture source:

Charlemagne (allegedly) said that “To have another language is to possess a second soul.” Countless people have expressed similar sentiments.  Fellini: “A different language is a different vision of life.”  Delacroix: “The individual’s whole experience is built upon the plan of his language.

I’ve always thought that this kind of thing was bullshit.  However, on Christmas night, my (limited) French skills did, indeed, show me a new world.

At the Studio Galande, a tiny movie theater on a little back street in the Latin Quarter–almost across the river from Notre Dame (42 rue Galande, 75005)–you can see the Rocky Horror Picture Show on Friday and Saturday nights.  The movie has French subtitles (crappy translation, actually), and–more importantly–a highly rehearsed group of people who act out the movie, singing along with it and offering non-stop commentary, almost entirely in French.

Of course, there is the usual throwing of rice, although the water pistols of my teenaged years and their gentle sprinkle have been replaced by 2-liter bottles of water and some serious soaking.  When I was a teenager, going to the Rocky Horror Picture Show meant knowing a few things to shout in unison with the rest of the audience at appropriate times–today, that’s basically a semi-professional production, and that’s what the highly rehearsed group of people do.

In Paris, this was a far more surreal experience than it is in the US, and it’s pretty surreal in the US.  I’ll try to paint the picture.  Since it’s me that’s doing the painting, it’s mostly a picture of language.  Here’s what you have: (a) the movie, in English.  (b) The subtitles, in French.  (c) The performers (I’m not actually sure what to call them) doing their accompaniment in French.  (d)  The performers also doing their accompaniment in heavily accented English.

It’s an incredibly rich linguistic hodgepodge, and it all comes at you in this confused, non-stop torrent.  It was far too much for me to be able to retain very many data points for you, but here’s a nice example of a way that the humor combined English and French.  There’s a point in the film that shows a map featuring a city called Denton.  So, you’ve got “Denton” splashed prominently across the screen.  Now, if “Denton” were French, it would be pronounced like dans ton–“in your.”  Right at that point in the movie, one of the performers holds up a sign with the word cul right next to the name “Denton.”  Cul means “ass,” so now you have dans ton cul–“in your ass.”

If I may be permitted a bit of hubris: the thing that I was proudest of came during the Time Warp.  If you haven’t seen the Rocky Horror Picture Show: there’s a big dance number featuring–yes–the Time Warp.  (It’s just a jump to the left, and then a step to the right.)  At various points in this, it would not be inappropriate to yell “ooh-ah.”  So, at one of those points, one of the performers says Et maintenant, on va dire ooh-ah en verlan: ah-ooh.  What that means: “We’re going to say “ooh-ah” in Verlan now–“ah-ooh.””  Why I was so proud of catching that–it requires some cultural knowledge, which is: what is Verlan?  Verlan is the name of a kind of French slang that originated in the banlieus défavorisées (bad neighborhoods, mostly surrounding Paris) and has since become pretty broadly popular.  Words are formed in Verlan by reversing things–verlan is itself l’inverse, “backwards,” reversed.  So: what would ooh-ah be in Verlan?  It would be ah-ooh.

So, that’s my bit of hubris.  Hubris is always followed by a fall, and no doubt my hubris will be followed by me not understanding however it is that someone will say “good morning” to me tomorrow.  As Brad says in the French subtitles to his song in the floor show scene: ça me dépasse–“It’s beyond me.”

An evening at the Rocky Horror Picture Show would be an unforgettable and unique Parisian experience, and I recommend it highly.  I’ll leave you with two pieces of advice on the subject.  (a)  If you haven’t seen the movie before, the whole evening will mostly be lost on you, but if you have, this is an experience that you are unlikely to forget; (b) don’t, don’t, don’t sit in the front row (rang 1).  Trust me on this.

  • le travesti : transvestite.  At one point in the movie, you are to shout Garçon, garçon, il y a un travesti dans ma soupe–waiter, waiter, there’s a transvestite in my soup.  In the context of the film, it makes perfect sense.
  • travesti (adj.): disguised; in fancy dress; in costume.
  • Ça me dépasse: it’s beyond me.

The worst thing about being Jewish

It’s not what you’re thinking.

I’ve been exploring the back streets of my neighborhood.  This being France, that means a lot of houses of worship (for a very interesting reason, in this very secular country–another time, perhaps), and this being Paris, they are quite diverse.  When you think France, you think Gothic, but the other day, I visited an Art Deco church down the street from my house.  (Turns out Art Deco comes from France–who knew?  The name is short for arts décoratifs.)  On Saturday morning, I went to services at a shul (synagogue) a few blocks from my apartment.  I don’t have a religious bone in my body, but I like to sit in a synagogue every once in a while and be surrounded by the murmur of one of the languages of my childhood, and I find it interesting to see the varieties of Judaism in different countries.

France has tremendous importance in the intellectual history of Judaism.  This is due to the work of the medieval scholar Rashi.  He lived from 1040-1105 in Troyes, in the Champagne region of France, and is thought to have made his living as a vintner (wine maker).

Rashi’s importance comes from the set of commentaries that he wrote on the Bible and the Talmud, a 62-book set of volumes that is one of the central texts of Judaism.  His commentaries are clear and insightful.  They always serve to clarify, but also often form the basis of complex interpretations by later scholars.  In a traditional Jewish context, it would be strange to study either the Bible or the Talmud without consulting Rashi’s commentary–if you have a traditional religious education, you start with him as soon as you start studying the Bible in grade school, and continue with him until you die.  Rashi’s commentaries sometimes incorporate translations into Old French from a time when that language was not written very often, so they are one of the sources that scholars of Old French have for the pronunciation and lexicon of that language.  As an American kid, you just kinda gloss over those, but from my perspective as an adult and a linguist, I realize how precious they are.

So, yes–I went to services the other day.  The congregation that I went to is associated with what’s called the Conservative movement, which, despite the name, is the progressive movement in Judaism.  The people were about the same as what you would find in a typical Conservative shul in the United States–mostly white, a couple blacks; the main difference was that in the US, there would have been some Chinese girls, while here in France, there were quite a few North Africans.

A common courtesy in a shul is to offer a visitor an aliya–the honor of saying a blessing during the reading of the Torah (bible), and I was offered one.  More on this later.

A prominent feature of the Conservative movement is a strong commitment to gender equality, and the Torah readers included a little old French lady who was so tiny that she had to be helped up on a stool to be able to reach the upper part of the scroll.  She was helped up politely, and then her pronunciation while reading was corrected just as diligently as that of the men.  (Torah scrolls are written without vowels, and two people on either side of the lectern follow along during the reading in regular books that do have vowels in order to make sure that there are no pronunciation errors.)

So, back to the worst thing about being Jewish: the worst thing about being Jewish is that sometimes you have to stand up and sing in public.  No karaoke, no musical instruments accompanying you–it’s just you and your voice, and I truly can’t sing.  Assuming no genocides, the Jewish male has three trials in his life.  Number one: eight days after birth, he gets circumcised.  Number three: at his wedding, he has to stomp on, and break, a glass, with everyone watching and without slicing a tendon.  Number two: in the middle, at the age of 13 (12 for girls in those communities in which girls do this), he has his bar mitzvah, which means that he has to stand up in front of everyone and chant and sing.  (Totally different tonal systems, and you have to do both.)

Now, one of the problems with visiting congregations other than your own is this: the tunes are different.  I sound different from most Jews to begin with, since when I read Hebrew I have a Yiddish accent, while most Jews today read Hebrew with a Middle Eastern accent, and that is definitely the case in France, where there is a very heavy North African presence in the Jewish community.  (In turn, about 10% of Israelis speak French, again due to the large number of North Africans there.  I think it’s the third-most-studied foreign language in Israeli schools, after English and Arabic.  See Jean-Benoît Nadeau and Julie Barlow’s The story of French for the numbers.)  You can fake the tune OK in a different congregation if you’re lost in the crowd and you know the words already, but if you have to stand up and sing then you stand out, and that was exactly what I had to do when I said the blessing during the bible reading–stand up in front of the congregation and sing alone.  So, what to do about the different tunes?  I couldn’t possibly fake the French one, so I knew that I had to do the American one.  And, in case I haven’t been clear about this: I can’t sing a note, and I hate to do it in public.  Happily, my long and pathetic history of losing judo matches has made me comfortable with looking like an idiot in front of crowds, so I went up to the lectern, gave my name and my father’s name when asked (easy in French), and got ready to do my thing.  I fixed my eyes firmly on the siddur (prayer book) so that I wouldn’t see all the pretty French girls and get any more nervous than I already was, and belted it out–in my heavily Yiddish-accented Hebrew, and with the American tune.

“You’re American?”, asked the guy doing the reading–in English.  Bien sûr, I answered.  “I could hear it,” he said.  Oh, well.

Zipf’s Law comes up in a house of worship as much as it does anywhere else.  Consequently, here are some of the words that I had to look up later:

  • l’office (masculine noun): religious service.
  • Moïse: Moses.
  • le cercueil: coffin, casket.  (Pronunciation from [sɛʀkœj].)  The sermon was a long analysis of the significance of the close conjunction in the Torah reading of two occurrences of the Hebrew word aron with its two different meanings: ark, and coffin.  Oddly, the French word also refers to some kind of beverage made by just dumping a bunch of different kinds of booze together.

Some days the bear eats you, some days you eat the bear

Some days the bear eats you, some days you eat the bear: bear-related vocabulary in French.

Some days the bear will eat you,

Some days you eat the bear.

–Joan Armatrading, Eating the Bear

2015-12-26 12.56.36
Les deux oursons, “The two bear cubs,” a tux/party dress rental store near the apartment that I rent when I’m in Paris. Picture source: me.

There’s a store in my little neighborhood whose sign has always puzzled me somewhat.  In order to understand it, I needed to learn bear-related vocabulary in French:


  • l’ours (nom masculin): male bear; boor, curmudgeon.
  • l’ourse (nom fémenin): female bear.  Pronounced [urs], not [urz].
  • l’ourson (nom masculin): bear cub; teddy bear.
  • le nounours: teddy bear.  Pronounced [nunurs]–the s is not silent, as you would expect it to be.

L’ourson (teddy bear) is not to be confused with:

  • l’oursin (nom masculin): sea urchin.  Yes, they are eaten in France, not just in Japan.

Some of the words meaning “bear” are used in idiomatic expressions:

  • un vrai nounours: “a real teddy bear.”  The French Etc. web site explains the expression like this: “un vrai nounours means ‘a real gem’…un vrai nounours is used to describe a person who is really sweet, going along with everything easily.”
  • un vrai ours: “a real bear.”  French Etc. explains it like this: “‘a real boor’, a person who isn’t sociable.”

“Bear” has some interesting slang meanings in English.  Here are a couple:

  • A difficult situation or thing.  “I have to finish a project proposal by New Year’s Eve–it’s a bear, because I have to find a way to smoothly integrate high-throughput assay analysis and theoretical linguistics.”
  • A gay man who is big, bearded, and hairy.

Mostly failures, but the occasional great satisfaction

In learning a language, sometimes it’s the smallest triumphs that feel the biggest.

Sometimes it’s the smallest triumphs that feel the biggest. One of the disorienting things about being in a foreign country is not understanding any of the little conversations going on around you. Yesterday, however, while hitting the ATM in my little Parisian neighborhood, I followed a conversation between two bums squatting next to me on the sidewalk. One of them was absolutely shit-faced drunk, and was obsessing out loud, endlessly and in French, about what day of the week New Year’s Day would fall on. Was it going to be this week, or next week? (As I said, this guy was really drunk.) Would it be Monday, or Tuesday? Finally he turned to me: what day of the week will New Year’s Day be: Monday, or Tuesday?  I didn’t have the heart to tell him that it would be a Friday, so I bought him a baguette and went on my way. And so goes my battle to learn to speak French–little trials, little triumphs; mostly failures, but the occasional great satisfaction.

Rentrez: More text mining from the biomedical literature with the R programming language

An introduction to the rentrez package for text mining from PubMed/MEDLINE. Natural language processing is not a natural task for the R programming language, but it can do a lot to help.

My work involves mining information from text, mostly from scientific journal articles in the PubMed/MEDLINE collection of documents.  PubMed/MEDLINE contains references to about 23 million articles in the domain of biomedical science, broadly construed.  R is a programming language that is becoming very popular, and I love it.  It’s not necessarily the first language that I would think of for text processing–nowhere near, in fact–but there are some “libraries” or “packages” that give it some nice abilities in that area.  In a previous post, we had a little tutorial on the pubmed.mineR package.  In this one, we’ll look at a package called rentrez.  For more information about rentrez, see the documentation here.

The purposes of the two packages are quite different.  Pubmed.mineR helps you process documents once you’ve gotten them–rentrez helps you get them in the first place.  It is intended for querying National Library of Medicine databases in general.  One of those databases is PubMed/MEDLINE, and we’ll concentrate on that functionality here.

The most basic rentrez function for our purposes is the one that lets us search an NCBI database.  This function is entrez_search().  At a minimum, it takes two arguments: the name of the database that you want to search, and a set of search terms.  The name of the database will go in a variable called db, and the set of terms will go into a variable called term.  Let’s try it.  Note that we can print the variable to which we assigned the results of the search, and it will give us useful information:

Screenshot 2015-12-06 12.12.53

I’d like to have some indication of whether or not I can broadly have faith in the results, so let’s try an easy form of metamorphic testing–we’ll change something for which we can predict in a general way (a) whether or not there should be a change in the results, and (b) the trend of the results.  I’m going to try a query that (a) should give me a different set of results, and (b) has the property that I expect it to give me a smaller set of results, specifically.

Screenshot 2015-12-06 12.16.40

Indeed, the result set is (a) different, and (b) smaller, so I can move forward with some assurance that something sensible is going on behind the scenes.

All of this has been blazingly fast so far.  However, rentrez is apparently only returning us 20 of those thousands of PMIDs.  The retmax argument will let us change that–as the name suggests, it defines the maximum number of results that you want to get back.

Screenshot 2015-12-06 12.23.10.png

Still blazingly fast.  However, when we try to turn the retmax up to a realistic value, things don’t go so well:

Screenshot 2015-12-06 12.24.32

Let’s try a little experiment to see what an acceptable value for retmax is (or, more accurately, the smallest unacceptable value).  Here’s a little experiment:

Screenshot 2015-12-06 12.30.59

Here are the results:

Screenshot 2015-12-06 12.32.35.png

It looks fine!  What the heck?  I have no clue.  [See below for a comment from one of the rentrez creators about this.]

Let’s look at the results object in a bit more detail.  It’s a type of R data structure called a list:

Screenshot 2015-12-06 12.40.29.png

We can get the keys of the list with the summary() command:

Screenshot 2015-12-06 12.41.39.png

Here’s what those actually are:

  • $ids: a vector of the PubMed identifiers.  (Explain what the rest are.)
  • $count: how many PMIDs are returned by the search.  Note that this has no necessary connection with the size of $ids.  While $ids tells you how many PMIDs could be gotten in response to this query, $count tells you how many have actually been returned.  If you don’t change the value for $retmax (see immediately below), it won’t be any more than 20.
  • $retmax: what the retmax variable (see above–it controls how many PMIDs you get back) was set to.
  • $QueryTranslation: the query with any modifications that PubMed/MEDLINE might have made to it, such as adding synonyms or MeSH terms.
  • $file: from the documentation: “either an XMLInternalDocument xml file resulting from search, parsed with xmlTreeParse or, if retmode was set to json a list resulting from the returned JSON file being parsed with fromJSON.”

Don’t underestimate how much work rentrez saved us in identifying and retrieving those documents.  I’d like to save you some work myself, so here’s a little script that I wrote to read in queries from a file and write the resulting PubMed IDs to their own files.  (If you use it for a publication, please cite this blog post.)  You’ll want a text file containing the queries, one per line, with the header QUERIES:

Screenshot 2015-12-23 20.39.05

We set the whole thing up with a couple of function definitions:

Screenshot 2015-12-24 12.06.13

Screenshot 2015-12-24 12.07.16

Now do the work:

Screenshot 2015-12-24 12.09.01

[Note: I wrote to one of the creators of the package about the odd behavior when retmax is set to 100000, and got this response: “This behaviour with retmax is very odd.  Diffing a little, I suspect the 100,000 is being represented as 1e5 when the API  call is being built. I will test this and report it as a possible bug to the maintainer of the package that’s handling all of that internally.”  He also had the following comment on large result sets: “For using very large queries, if you want to do anything other than download a list of IDs it’s probably a good idea to use the webhistory features ( these makes talking back and forth w/ NCBI much quicker.”]