Zipf’s Law and my walk to the lab

You know one of the consequences of Zipf’s Law, which describes one aspect of the statistical distribution of the lexicon of a language, namely that it’s a power law (a few words are very common, but most words occur only very rarely): if you’re learning a second language, it’s likely that there will never be a day of your life when you don’t come across words that you don’t know.  I took a different route up the hill to the lab today, which meant that I passed by a lot of houses, rather than walking through the woods.  With the winter at an end, there’s lots of work starting, leading me to run into a lot of large and small construction projects–and all of these new words for me.





Metro sight of the day

Just another beautiful spring day in Paris.  Metro sight of the day: a one-eyed min-pin (miniature Doberman pinscher) being carried by a young guy–in one hand.  In the other: a 6-pack of beer–with one missing.

le Pinscher nain: miniature Doberman pinscher.  How you pronounce pinscher in French: I haven’t a clew.  (I know how to spell, at least in English–that’s British.)

Global warming: At least I’m messing up a better class of verbs

Pride comes before a fall, and sometimes the fall is worse than others.

Most mornings, I sit with my first cup of coffee and a stack of index cards and look up all of the words that I ran into the day before and didn’t know.  My 15 minutes or so of vocabulary every morning is a given–I typically learn about 10 new words a day, which means that despite having grammar that makes my French tutor shudder and an accent like fingernails on a blackboard, I know three ways to say “unremittingly.”

Everything else–conjugation, grammar, pronunciation–I rotate between.  Which is to say: I try to make sure that every week I spend a day on some new verb form, a new tense I don’t know, the order of double pronominal preverbal objects (my current bugaboo–il me le rend? Il le me rend?  FUCK), or something of that ilk.  Hence, I know lots of obscure things to say–but, I don’t necessarily know how to say them, if that makes any sense.

The other morning my plane landed in Paris after a long weekend in the US.  (A work thing, and then I surprised my father for his birthday.  We made fried matzah with schmaltz, which is to say: rendered chicken fat.)  On your first day in Europe, the challenge is to stay awake–fall asleep when you get off the plane and you’ll find yourself in a cycle of décalage horaire-induced sleep cycle disturbance that you won’t work your way out of for a week.  Sundays and Wednesdays it’s easy–there’s a market under the Metro tracks down the block, and getting out in the fresh air and sunshine is a good way to keep yourself moving and conscious.

On market days, I actually start not at the market, but at the fromagerie at the Dupleix metro station.  (Right outside the station was the spot where you were most likely to get taken to face the firing squad, at least as recently as 1871, the last date of which I’m sure.)  Although as an American, I had no clue about this ’til I got here, it turns out that cheeses have seasons; the first thing that I do when I get to Laurent Dubois is check the ardoise in the window to see what’s just come in.

This week: 3 “rare” cheeses.  Bleu du Nil, an obscure tomme, and something even more obscure that had already sold out.  Now, you’ll hear numbers about how many cheeses France has, but in truth, no one really knows how many cheeses France has.  Like the apocryphal Eskimo words for snow (that’s bullshit, by the way), some say 200, some say 300, some say 350…  In truth, there’s no way to know, because it’s not clear how to define “a cheese.”  In the limiting case, since every farmwife who still makes her own cheese is making a cheese unlike any other, the cheeses of France are essentially uncountable. (That’s not to say that there’s an infinite number–uncountable and infinite are different things.  I remember well being baffled by the idea of being countably infinite versus uncountably infinite as a graduate student.  As my wife of the moment said to me: Kevin, if you can’t wrap your head around this, you just can’t take any more math classes.  I thought that that was adorable, since I haven’t taken a math course since the obligatory algebra and trig course in college, and in fact am completely innumerate.)

But, back to the fromagerie.  My copy of Marie-Anne Cantin’s Guide de l’amateur de fromages (“”Cheese-lover’s guide”) lists somewhere around 200 or so French cheeses, but it doesn’t list any of the cheeses that had come in this week, so I asked the adorable pixie-cut saleslady to tell me about them.  It developed that the name of one of them comes from the valley where the cows from whose milk it is made graze.  Except…she didn’t use the word graze, and I didn’t catch the word that she did use.  No problem–I recently learnt the verb to graze.  “Where they paissent?” …I asked, using the verb paître–a favorite of mine, because I love circumflex accents.  Seulement voilà, the only thing is: I’d never had the opportunity to use this delightful lexical item before, and I screwed it up.  I should have said paissent–but, my mind wandered off into the delights of that circumflex, and instead I said paîtent.  Which sounds like pètent…  Which means that I had just asked the nice lady if she were referring to where the cows fart.  Damn it.  Pride before a fall, and all that.  She had the good grace not to laugh.  At least, I think she didn’t–I was too embarrassed to look at anything but the floor.

In the English notes, we talk about the little-known English subjunctive.  The French notes are, of course, devoted to the verb paître.  The bleu du Nil comes from exactly one farm, in Brittany–see the picture above.  It’s delicious–as creamy as butter, with little bits of fenugreek.

English notes

Anglophones complain constantly about the French subjunctive.  Even French teachers get into it, commiserating with us about its chiant existence and teaching us ways to avoid it.  In reality, this most charming of the conjugations of the French language is not one that is completely foreign to us.  Although it’s not widespread, my dialect still has a subjunctive.  It’s easiest to say in the case of the verb to be.  Here’s how it showed up in this post:

I had just asked the nice lady if she were referring to where the cows fart.  

The subjunctive here is were.  You would expect was:

I had just asked the nice lady if she was referring to where the cows fart.

…and indeed, (a) you most certainly could say that, and (b) I would guess that most Americans would say that.  (I hate to guess, but I don’t have any statistics on this–sorry.)  You can find some exercises on the use of the subjunctive in English here, if you’d like to pursue this.  Be aware that there are some differences between American and British English in the use of the subjunctive–the Wikipedia page on the English subjunctive goes into them at some length.

French notes

Paître is the kind of delightfully irregular verb that I just adore.  Along with repaître, native speakers don’t seem to agree on whether either, both, or neither of them can be used for humans, or just for cows and the like; whether either, both, or neither of them can be transitive only, intransitive only, or both; or in which tenses the gets its little chapeau chinois.  (From what I can tell, the Academy’s decision on this has not always been gracefully accepted.)  My Bescherelle maintains that (a) it doesn’t have any of the compound tenses, and (b) le participe passé pu, invariable, n’est utilisé qu’en termes de fauconnerie…. and if you can find a verb that’s cooler than that, I will buy you a beer–and if you’re a woman, I’ll marry you.

Three ways to say unremittingly: 

  • sans trêve
  • sans répit
  • sans cesse

Dulce et decorum est

Engine room, Liberty Ship John W Brown 2012_284_4718-1024x677
Engine room of the Liberty Ship John W. Brown. Picture source:

I spent my last few years in the Navy working in cardiac catheterization labs, doing physiological monitoring–mostly hemodynamics and electrophysiology.  I started out on a ship, though–working in the engine room of a guided missile cruiser.  It was hot down there–the coolest that it got was under one of the giant air vents, where it was about 100 degrees Fahrenheit (37.8 Celsius).

The good side of being on a naval vessel is visiting cool ports of call, seeing the sun and the moon both in the sky at sunrise after a midnight watch, eating roasted squid in Spain–stuff like that.  The bad side is the Navy’s equivalent of combat training.  At the time, that involved a few weeks in the Caribbean, doing “casualty drills” over, and over, and over again, until you could get them right over, and over, and over again.

When a ship is on the receiving end of a nuclear, chemical, or biological attack, you put on gas masks, and you shut its openings down completely while some guys put on serious protective gear, go up on the decks, and scrub down every inch of the vessel.  This takes a long time–my ship, the USS Biddle, was 547 feet (167 meters) long.

So, there you are down in the engine room.  The alarm for an NBC attack (nuclear/chemical/biological) is sounded.  You put on your gas mask and you shut off all of the ventilation–in a space where it was already 100 degrees Fahrenheit in the cool places.  Then you watch the temperature start to rise.

When you learn how to put on a gas mask, they teach you what to do if you vomit with the gas mask on.  (If you can think of a way to not end that sentence with a preposition, I’d love to hear about it.)  This is super-necessary, because one of the old dirty tricks of gas warfare is to mix in an emetic with the gas.  An emetic is something that makes you vomit–the idea is to get you vomiting in the hopes that you’ll rip your gas mask off, in which case you’re toast.  (To be toast explained in the English notes below.)  So, you know what to do if you vomit–but, there’s nothing to be done about what happens when you’re in a super-hot space with a gas mask on, which is that the parts around your eyes fill up with sweat to the point that you can’t see anymore.  We’d sit there, our eye ports filling up with sweat, and watch the thermometer go up, and up, and up.

The temperature in an unventilated engine room goes up fast.  Around 120-130 degrees, the air is so hot that it’s painful to breathe, and when I say painful, I mean really painful.  It’s very quickly so hot that they call it a day for the engineering spaces and let you take off your gas mask and open the ventilation again.  Now, as I said, it takes a long time to scrub down a ship–but, it was clear to everyone that there was no way that we were going to be able to survive that long with the ventilation shut down.  So much for What To Do In Case Of NBC Attack.

Today’s bow to National Poetry Month is about a gas attack.  It’s by Wilfred Owen, an Englishman who fought in the Great War.  You tend to think of poets as wispy, ethereal types who you wouldn’t want backing you up in a tight spot; Wilfred Owen is a pretty good counter-example to this widely-held prejudice.  He was one of the great poets of the First World War–the poem that you’ll find below was the first one I ever memorized that didn’t involve words that I don’t really dare to put on this blog.  He was also a gay man who wrote what appears to be a poem about anonymous sex–and a decorated military hero who volunteered to go to the front repeatedly, even after being wounded twice in artillery shellings.  He was awarded the Military Cross–here’s the text of the commendation:

For conspicuous gallantry and devotion to duty in the attack on the Fonsomme Line on October 1st/2nd, 1918. On the company commander becoming a casualty, he assumed command and showed fine leadership and resisted a heavy counter-attack. He personally manipulated a captured enemy machine gun from an isolated position and inflicted considerable losses on the enemy. Throughout he behaved most gallantly.  (Copied from Wikipedia.)

Before the war, he spent several years teaching French and English in France.  He died there just days before the Armistice–in action.


by Wilfred Owen

blood-shod is an amazing neologism.  “To be shod” means something like “to have footgear (e.g. shoes, boots, and the like).”  Examples below in the English notes.

Bent double, like old beggars under sacks,
Knock-kneed, coughing like hags, we cursed through sludge,
Till on the haunting flares we turned our backs
And towards our distant rest began to trudge.
Men marched asleep. Many had lost their boots
But limped on, blood-shod. All went lame; all blind;
Drunk with fatigue; deaf even to the hoots
Of gas-shells dropping softly behind.

Gas! GAS! Quick, boys!—An ecstasy of fumbling
Fitting the clumsy helmets just in time,
But someone still was yelling out and stumbling
And flound’ring like a man in fire or lime.—
Dim, through the misty panes and thick green light,
As under a green sea, I saw him drowning.

In all my dreams before my helpless sight
He plunges at me, guttering, choking, drowning.

“…the white eyes writhing in his face…” Victim of gas attack by the Syrian government. Picture source:

If in some smothering dreams you too could pace
Behind the wagon that we flung him in,
And watch the white eyes writhing in his face,
His hanging face, like a devil’s sick of sin,
If you could hear, at every jolt, the blood
Come gargling from the froth-corrupted lungs,
Bitter as the cud
Of vile, incurable sores on innocent tongues,—
My friend, you would not tell with such high zest
To children ardent for some desperate glory,
The old Lie: Dulce et decorum est
Pro patria mori.

You can find readings of this very famous poem here.

English notes

Examples of shod:

  • All horses’ hooves are healthier without shoes, and barefoot horses are healthier than shod horses.  (Source)
  • The foot evolved to function unshod. (Source)
  • Paul says in verse 15 that we are to have our feet “shod with the preparation of the gospel of peace.” (Source)

to be toast: to be in a bad situation, to be in trouble.



Juventud, divino tesoro

…con el cabello gris, me acerco a los rosales del jardín…

Darío lived in Paris for 4 months–who knew? 4, rue Herschel, 75006. Picture source: DanielaBPSept.

For National Poetry Month, here’s some Rubén Darío.  I first came across this poem sitting in a night class at Old Dominion University, purveyor of fine educational experiences to a wide range of traditional and non-traditional students, including a hell of a lot of sailors.  The first stanza was carved into the top of the desk at which I was sitting (the desk I was sitting at, more commonly):

Juventud, divino tesoro,
¡ya te vas para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer…

Youth, divine treasure, // you’re gone, never to return! // When I want to cry, I can’t… // and sometimes I cry without wanting to….

I thought it was the saddest thing I’d ever read, and in isolation, it most certainly, certainly is.

In isolation.  And, oddly: even more so at 25 or so than at 55.

Eventually, I tracked down the rest of the poem–much harder back in those pre-Google days–and made it to the end.

Mas a pesar del tiempo
terco, mi sed de amor no tiene fin;
con el cabello gris, me acerco a los
rosales del jardín…

But despite pig-headed // time, my thirst for love is endless; // gray-haired, I approach the rose-bushes in the garden…

(Don’t feel bad–I had to look up terco, too.)

As the grandson of a man who started a new family in the United States in his 60s (¡muy fuerte!, say my Mexican buddies when I tell them the story–I’ll spare you the accompanying gesture of admiration), I think I get the metaphor.  You go, pépère.  You go, Rubén.  Do I ever cry without wanting to?  Rarely–I am certainly an American male of my generation–but, yeah: it happens.  Nonetheless: I’m headed out to the back porch for a cigarette next to the lilacs, and the plum tree, and the flowering chestnut…


Juventud, divino tesoro,
¡ya te vas para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer…

Plural ha sido la celeste
historia de mi corazón.
Era una dulce niña,
en este mundo de duelo y de aflicción.

Miraba como el alba pura;
sonreía como una flor.
Era su cabellera obscura

hecha de noche y de dolor.
Yo era tímido como un niño.

Ella, naturalmente, fue,
para mi amor hecho de armiño,

Herodías y Salomé…
Juventud, divino tesoro,
¡ya te vas para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer…

Y más consoladora y más
halagadora y expresiva,
la otra fue más sensitiva
cual no pensé encontrar jamás.
Pues a su continua ternura
una pasión violenta unía.
En un peplo de gasa pura
una bacante se envolvía…

En sus brazos tomó mi ensueño
y lo arrulló como a un bebé…
Y te mató, triste y pequeño,
falto de luz, falto de fe…

Juventud, divino tesoro,
¡te fuiste para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer…

Otra juzgó que era mi boca
el estuche de su pasión;
y que me roería, loca,

con sus dientes el corazón.

Poniendo en un amor de exceso

la mira de su voluntad,
mientras eran abrazo y beso
síntesis de la eternidad;

y de nuestra carne ligera
imaginar siempre un Edén,
sin pensar que la Primavera
y la carne acaban también…

Juventud, divino tesoro,
¡ya te vas para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer.

¡Y las demás! En tantos
climas, en tantas tierras siempre son,
si no pretextos de mis rimas
fantasmas de mi corazón.

En vano busqué a la princesa

que estaba triste de esperar.
La vida es dura. Amarga y pesa.
¡Ya no hay princesa que cantar!

Mas a pesar del tiempo
terco, mi sed de amor no tiene fin;
con el cabello gris, me acerco a los
rosales del jardín…

Juventud, divino tesoro,
¡ya te vas para no volver!
Cuando quiero llorar, no lloro…
y a veces lloro sin querer…
¡Mas es mía el Alba de oro!

Keeping your…together: Reproducibility in computational research

A failure to archive some data leads to discussion of a colorful American English expression. Trigger warning: rampant obscenity referring to poop.

You’ve probably heard: there is a crisis in science.  You don’t have to be on top of the literature to be aware of this–it’s covered in the popular press, too.  This piece in Forbes is representative: How the reproducibility crisis in academia is affecting scientific research.  The term “crisis” might be a bit overblown, but certainly researchers in many fields have recently been paying a lot more attention to planning their analyses for reproducibility, which can sometimes mean planning the experiments that precede the analysis for replicability (also known as repeatability).  The contrast between these is that you can think of reproduction as arriving at the same values, the same findings, or the same conclusion as an earlier study; replication, on the other hand, refers to the ability to repeat the initial experiment.  Replicability is important for a number of reasons, one of them being that as an initial attempt to assess the reproducibility of a study, you might want to see if you can replicate the results when you repeat the original experiment.

Lately I’ve been talking and writing about this kind of thing a lot.  When I do that, I’ve found that what audiences and reviewers seem to enjoy the most is when I give details on my own failures to be able to repeat my own studies.  The irony is lost on exactly no one, including (obviously) me: in theory, I have some expertise on the relevant issues, and yet I struggle just to keep my own shit together in this regard.  (To keep one’s shit together explained in the English notes below.)

So: for your amusement, I present today’s reproducibility fail.  To wit: I just had a paper accepted that involved doing manual examination of hundreds and hundreds of words, all of which started with letters that could be one of the negating morphemes of English. (A morpheme is a part of a word.  For example, cat has one morpheme, while cats has two: cat, and the plural -s.)  When I say negating morpheme, I mean things like the prefix de in deoxygenate, or the prefix in in inefficient.  

Now, I said that we were examining words that start with letters that could be one of the negating morphemes of the English language.  Those strings are not always negative–think of examples like these:

  • ineffective (not effective) versus intuitive (nothing negative in there)
  • unclear (not clear) versus uncle (nothing negative in there)
  • deactivate (cause to not be active) versus deal (nothing negative in there, although the word’s current association with Donald Trump–the molesting, draft-dodging, tax-dodging, race-baiting, disabled-mocking, religiously bigoted, lying assclown that is now the president of my fatherland–makes it somewhat nauseating for me to type it)

…the moral of which is that you can’t find all of the words with negative prefixes in a text just by starting with a list of negative prefixes and looking for all words that start with them.  Doing this would lead you to count intuitive, uncle, and deal as words that start with negatives, which they are not.

affix: something that cannot be a word, but can be added to one.  English examples: un-, pre-, -‘s.

prefix: an affix that is added to the beginning of a word.  English examples: un-, pre-, pro-.

suffix: an affix that is added to the end of a word. English examples: -‘s, -ing, -ed.

So, when I wanted to find out how the incidence of affixal negation compares between different kinds of biomedical texts–I care about that kind of thing because my job involves researching computer programs that do things with biomedical texts, and I need to know things like how much does negation add to the burden of understanding medical texts by patients’ family members?–I knew that I could write a program to pull out all of the words that start with things like de-, un-, in-, and anti-, but I also knew that I would have to have actual human beings look at those lists and mark which ones actually started with negative prefixes, and which didn’t.

Now, when you do something like this–that is to say, when you have humans look at data (linguistic or otherwise) and make judgments about it, you typically want to have more than one person do it.  Then you calculate how often they agree with each other.  If they agree with each other, say, 90% percent of the time, then you probably have pretty good judgments in hand.  On the other hand, if they agree with each other only 60% of the time, then you’ve got a problem.  Maybe you’ve defined a task that’s just too difficult for humans to do consistently, in which case you want to redefine it in a way that makes more sense.  Maybe you wrote crappy instructions, in which case you want to improve them.  Maybe one of your humans is smoking what we call in France shit (marijuana–no, I do not indulge).  In any case, it’s that calculation of agreement between the humans that lets you decide whether or not you have a problem that needs to be dealt with.

Coincidentally, at the moment I’m teaching a course on what I do for a living, and I wanted to give my students the opportunity to get some hands-on practice with the process of making the human judgments that provide the data that we use to do our research.  This little project seemed like a good one to offer them, for a number of reasons:

  1. It’s relatively straightforward (we got good agreement on the original project)…
  2. …while still difficult enough to be challenging (we had to take a couple passes at developing the instructions, and even then, we didn’t have complete agreement on everything)…
  3. …plus, you don’t need a special program to record the judgments, while more complicated tasks frequently do require that the human learn a complicated program in order to record their analyses (did you notice that little subjunctive?  …that a human learn… versus …that a human learns…?)…
  4. …and I actually need the data for future research, which means that I’ll use it to write papers, which means that the students will have the opportunity to participate in writing the papers, and for students, published papers are the key to getting your doctorate and getting the hell outta Dodge.

Now, because I care about reproducibility of research, I use a publicly available web site to archive the code (computer programs) that I use to do my analyses.  (You can find the stuff for the project that I’m talking about here.)  So, getting my students started seemed like it would be straightforward: send them to the web site, and tell them to download the instructions, the list of words that needed judgments, and the actual judgments of the two analysts so that they could use those to figure out how to use the analysis program and to evaluate their own judgments.

It happens that I was one of those analysts, and that a colleague who happens to be a practicing emergency room physician was the other.  It also happens that we annotated a randomized mixture of text from two sources: from scientific journal articles, and from the clinical records (totally anonymized, and available free to researchers) of actual patients.  In the case of the clinical records, I found when doing the analysis of our agreements and disagreements that when we disagreed, I mostly thought that he was right and I was wrong.  (Not surprising, since he is currently practicing, and I haven’t touched a patient since 1991.)  In contrast, I tended to be right when we disagreed on the scientific journal articles–not surprising, either, since I spend all day, every day with my nose stuck deep in them.  So: it was super-important to me that my students have access to both of our data, both so that they could compare their own judgments to it, and so that they could see what kinds of things we had disagreed on.  (It’s usually the differences in the world that are the most interesting, right?)

Seulement voilà, the thing is: when I went to the web site where I had archived all of the code and the data on which the analysis was based, I saw that I had totally forgotten to put the other analyst’s data there.  Think about the context of this:

  1. In theory, I have some expertise on issues of reproducibility in computational science.
  2. I was very deliberately making an effort to make this experiment as repeatable as possible.

…and yet, I still screwed it up.  This is important in that when you read about reproducibility problems in science, sometimes you’ll see–often implicitly, and sometimes even explicitly–the view that reproducibility problems come from deliberately deceptive actions on the part of the researcher.  Now, I know that a certain amount of self-deception can take place pretty easily in research, typically taking the form of screwing around with statistical tests of significance. But, that’s a pretty different thing from deliberately publishing crap research.  When you consider that someone who is pretty deeply invested in doing, and in promoting, reproducible research–that is: me–can still fail to archive everything that would be needed to repeat one of his own experiments, it gives you an object example of how difficult it can be to ensure even the less-ambitious goal of repeatability of one’s work…and a fortiori, reproducibility of one’s results.

In French, there are some very interesting things associated with affixal negation, including the phenomenon of verbs like dératiser and décafardiser that we talked about in the post that you can find here.  Several of the English-language examples in this post come from this paper on affixal negation by Chantal van Son, Emiel van Miltenburg, and Roser Morante, all of the Vrije Universiteit Amsterdam.

English notes

One of the expressions in this post, along with its many relatives, strikes me as interesting because it contains the word shit, which is almost always an “inherently negative” word, and yet it describes a desirable state.  The expression in question: In theory, I have some expertise on the relevant issues, and yet I struggle just to keep my own shit together in this regard.   (I should point out that you can only use these expressions in contexts, and with people, such that it would be acceptable to use obscenity.  So, I would use this with my siblings and cousins, maybe or maybe not with my aunts, depending on which one, and most definitely not in front of my grandmother.)  Unless otherwise stated, the examples here come from the OPUS2 English corpus, a collection of 19.7 billion words of English texts.  I searched it through the Sketch Engine web site, purveyor of fine linguistic corpora and tools for searching them.

to have one’s shit together is the most basic of this surprisingly large family of expressions.  In its most central sense, it means something like to be functioning in an efficient way.  Here are some examples of how it’s used.

  • That’s because I have my shit together and I prioritize properly.
  • If you don’t have your shit together chances are it’s because you surround yourself with people who don’t have their shit together  (Twitter) (Note: “chances are” means “probably.”)
  • I know I probably sound like I have my shit together , but really I feel confused inside.
  • And pretending I have my shit together when it comes to deadlines and paperwork is one of my specialties, a skill to which I probably owe every job I’ve ever had.
  • I thought, perhaps naively, that by almost a year along I would have my shit together – or at least have some sort of clue and I do not.
  • Turns out she’s sharp as a tack and really has her shit together.  (Note: “to be sharp as a tack” means “to be quite intelligent.”)
  • And so perhaps this is why he doesn’t find himself attracted to his students, and instead finds himself attracted to Audrey’s silver hair and faintly lined face: these things signify a woman who has her shit together, who has moved on to the next level.

With that established: if to have one’s shit together means to be in a particular state–the state of having one’s shit together–to keep one’s shit together means to maintain that state.  Some examples:

  • I am now one of the countless unemployed because I could not keep my shit together.
  • I’m trying to navigate the holiday season as a crafter, keep my shit together at work, plan the holidaze both for Thanksgiving at my mom’s house and what will surely be a painful Christmas Eve . . .if we could decide who’s host/essing it.
  • “As long as you keep improving.” I raised my eyebrows. “Is that your way of telling me to keep my shit together?”  (Michelle Hodkin, The evolution of Mara Dyer)
  • Ruby Pelletier put her hands on her skinny hips, threw her head back, and bellowed laughter.  “You think you can keep your shit together when twelve CB cowboys pull in all at once and order scrambled eggs, bacon, sausage, french toast, and flapjacks?”  (Stephen King, The dead zone.  CB cowboy is a truck driver–very, very old slang, although not quite as old as I am.  1970s, I would say.  French toast is pain perdu.  Flapjacks are pancakes.)

There are several more of these odd expressions where shit means something positive–so many that if I tried to get them all into one post, I would be writing this for the next two weeks.  Watch this space for more, as the spirit moves me–and don’t say this stuff in front of my grandmother.


Bonnets, lice, and dialect differentiation

You’ll often hear people bemoan the loss of dialects due to television. No worries: it’s not happening. Also, a cute kid and some off-color language.

Just in case you’re worried that television is taking away all of the dialects: every study I’ve ever heard of concludes the opposite.  What’s different today from 100 years ago is that geographic proximity has become somewhat less important in dialect differentiation relative to other social factors.  For example, all of the action in American dialects at the moment is in increasing differentiation between urban and rural speech.  All other things being equal, the dialects spoken within an American city are more likely to change to become more like dialects in other cities than they are like the dialects in the immediately surrounding rural areas.  That’s not to say that they’re all becoming the same, either!  But, what drives dialect differentiation in early-21st-century America is increasingly about the urban/rural split.  (Note that I’m not claiming that urban/rural distinctions in America or anywhere else are new–this is about a shift in degrees of relative importance, not a wholesale appearance of a new phenomenon.)

In fact, regional dialects of English can still be sufficiently distinct that native speakers of one regional dialect of English can’t necessarily understand native speakers of other regional dialects of American English.  Hilarious stories of the ensuing misunderstandings abound, but few of them (or few of my stories of the ensuing misunderstandings, at any rate) are as adorable as this video of a little Scottish girl speaking with her (much bigger) Scottish father.  Take a look/listen–if you can’t follow it, there’s a version with subtitles floating around out there somewhere.  Scroll past the video for linguistic details (and considerably less cute examples of the local dialect), if you’re interested.

Linguistics trivia, for those who are geeky enough to care

It’s actually difficult to find evidence for television playing a role in language acquisition–that is, learning by children of their native language(s).  However, here’s a nice paper by Jane Stuart-Smith and her colleagues that shows a role for television in increasing dialect differentiation amongst adolescents.

Here is a master’s thesis written byMichaela Zikmundová, a graduate student in the Czech Republic at the time, on language in the novel Trainspotting.  It is a very Scots novel, and Americans have to watch the movie version with subtitles.  She concludes that Scots is itself composed of so many different dialects that it should be considered a language, not a dialect of English.  (This is not actually a valid argument, for my money–see this blog post on the surprising irrelevance of linguistics to the definitions of the terms language and dialect.)

Family trivia, for those of us who are related closely enough to me to care

My father is extremely fond of mixing languages with…wild abandon, I guess one might say.  A typical email from him might have six, I would suppose–English (obviously), Portuguese, Hebrew, French, Latin, and Yiddish being the standards, with occasional guest appearances by Hawaiian (my father remains convinced that the similarity between the Hawaiian word for priest (kahuna) and the Hebrew word for priest (kohen) are evidence for one of them being the original human language, and I’ll give you a hint: he’s not thinkin’ Hawaiian.  More on this another time), Polish, and a déclinaison of Scandinavian tongues.

Despite this linguistic profligacy, none of those languages is ever Scots.  In fact, my childhood exposure to Scots consists all and only of the following:

O wad some Pow’r the giftie gie us

To see oursels as ithers see us!

…which is from Robert Burns, and therefore, I hear, linguistically suspect as to authenticity (whatever exactly that means in this context).  However: it’s a damn good line or two to keep in mind, nonetheless.

More Scots listening practice

Here’s an interview with a Scottish airport worker, just after he attacked a terrorist who had just dumped gasoline over himself and gone after a police officer; from what I’ve read, he kicked him in the balls bollocks couilles beitzim cojones testicles so hard that he broke his foot.  Crucially (from our perspective), he did this after yelling something at the guy in Scots which I understand every Scottish male would mean let’s fight!, but that is so not like any dialect that I speak that I can’t remember it.  (Native speakers?  C’mon’en, or something like that?)  I can understand maybe a bit more than 90% of this, I would guess–for perspective, I’ve been a native speaker of English for my entire life, and I’m old.  (And, for more perspective: this guy’s a native speaker of English, too.  See this blog post for how that’s an example of using what’s called mutual intelligibility for labelling things as languages versus dialects.)



Red, but not off-color

I can’t find a picture of Dimitar Pantaleev, so here’s a picture of Schliemann’s wife. I’ve loved it since childhood, though I couldn’t tell you why.  Here she’s dressed in jewelry that they dug up in Turkey. Picture source: Storia Illustrata n. 167. Public domain.

My father once told me that the archaeologist Heinrich Schliemann spoke 14 languages, all of which (other than his native German) he learnt by memorizing a book in the language.  Whether or not this is true, I don’t know–my father’s level of willingness to just make things up is non-zero (although never malicious).  But, memorizing things in your language of choice makes as much sense to me as any other way of learning a language, and it’s certainly more fun than memorizing long lists of vocabulary.  Unfortunately, my choices of what to memorize are mostly drawn from the stuff that I like to read, which means that (from what I’m told), way too much of what comes out of my mouth is either off-color (Céline, Queneau) or marivaudage (Laclos, Molière).  (See the English notes below for what off-color means.)

As National Poetry Month continues, here’s the first poem that I ever tried to memorize in Bulgarian.  I only got as far as the first stanza, which may explain why my Bulgarian sucks (see here for a good example of the trouble you can get into when you don’t speak Bulgarian quite as well as you think you do).  By Dimitar Pantaleev, in theory it’s a Communist poem, although I don’t understand why, since it’s entirely anti-authoritarian–the title means I cross against a red light.  Also, as far as I can tell, he was considered a formalist, and Communists (Reds, if you will) were pretty anti-formalism, to the best of my (very limited) knowledge.

Amazingly, the poem was recorded as a song by the officially government-sanctioned rock group Diana Express.  The production is as unfortunate as most music recorded in the 1980s that wasn’t either Joan Armatrading, Simon and Garfunkel, or Elton John, which isn’t to say that you shouldn’t spend $0.99 (literally) to buy it on Amazon, just for its inherent cool-value.  Amazingly, these guys are still around–here’s footage of them giving a concert two months ago in Atlanta, Georgia.

минавам на червена светлина

by Dimitar Pantaleev

Аз пазя свято земните закони,
но в дни, когато леден дъжд се рони
и трябвада спася една старица,
едно дете, една ранена птица
или една разплакана жена –
минавам на червена светлина.

Когато гинат младите тополи,
когато нечий глас за помощ моли
или когато в топлата ни есен
внезапно слъхва млада чиста песен
по чужда необмислена вина –
минавам на червена светлина.

А някой път, когато трябва смело
да се спаси едно човешко дело,
една любов или една страна,
провиквам се на кръстопътя ясно:
минете, въпреки, че е опасно,
минете на червена светлина.

English notes

Off-color means something like not quite obscene, but not quite OK, either, at least not for the context.  Here are some examples from the OPUS2 and enTenTen13 corpora, collections of 1.1 million and 19.7 billion words of English, respectively, that I searched through the Sketch Engine web site, purveyors of fine linguistic data in more languages than I care to count.  You’ll notice that it frequently modifies either joke or remark, and almost always a noun whose semantics have inherently to do with communication–

  • It’s an off-color remark, it was highly inappropriate.
  • Tom never tells off-color jokes.
  • You also need to be careful of the language you use – nothing off-color, or discriminatory.
  • For the most part, Smith said she overlooked the off-color jokes, sexist remarks and rituals that permeated the fighter pilot culture.
  • The event host was Leonard Maltin who remained professional during an event riddled with technical problems and a few off-color moments.
  • on many other websites normal people converse sans real names and do so without rancor, without hostility, without profanity, without racism, without sexism, without misogyny, without venom, without bile, without hatred, bigotry, obscenity and lame off-color jokes.
  • A disgruntled employee, or one with an off-color sense of humor, could post something reckless under the company’s name.
  • One of the most vivid characters in the show, whose off-color tantrums have become an audience favorite the way Kramer’s clumsy entrances once were.
  • This is the time to Get Yer Ya-Ya’s Out! in terms of cursing and off-color talk.
  • Crow is the most likely of the four movie-riffers to make off-color or lewd comments during the film, and receives frequent scoldings from Joel, Mike, and occasionally Tom because of this habit (see Crow Syndrome ).
  • And we hear what is believed to be Tiger telling an off-color joke.
  • Sure this is an extreme case but it’s a reminder that we all eventually have that moment when we get a complainer, an off-color remark, or misleading information posted by users on our social media sites.
  • The humor here is ribald and off-color and noone is safe from abuse including tuners, parents and vendors.
  • You’ll be teaching him the principles of keyword searching; at the same time, you’ll be able to steer him away from off-base or off-color content.

To get a really solid sense of how to use off-color, it’s useful to look at the other words that it occurs with.  (With which it occurs, if you prefer your sentences non-preposition-final.)  Here’s a screen shot of something called a “word sketch”–again, from the Sketch Engine web site.  Scroll down past the figure and I’ll talk you through it.

Screenshot 2017-04-14 06.01.28
Picture source: screen shot from the Sketch Engine web site.

At the top left, you see the adverbial modifiers that are most commonly associated with off-color.  Note that they are similar–mildly and slightly.  What you’re not seeing here are intensifiers–you wouldn’t typically say that something is very or horribly “off-color.”  Why?  I don’t know–that’s just the statistical tendency with this adjective.  You certainly could say that–but, a native speaker probably wouldn’t.

In the next column, you see the nouns that have the strongest statistical associations with off-color.  You won’t be surprised to see that the most common ones are joke, remark, and humor.  Most of the other words with strong statistical associations are other nouns that refer to humor–gag, limerick, hilarity, quip, banter, antic, and pun.

The next column over is a nice example of how you can get insight into a word by seeing what other words it’s joined with by and or or.  Most of the words that are joined with off-color in this way fall into one of two categories: they’re either related to humor, or they’re clearly negative.  The first category is related to the fact that off-color itself is so often used to modify joke and, as we saw in the preceding paragraph, other words that refer to humor.  In that category, we have:

  • hilarious
  • quirky
  • humorous
  • funny

In the second category, we have:

  • tasteless
  • vulgar
  • inappropriate
  • incorrect
  • crude
  • offensive
  • racist
  • rude
  • dirty
  • racial

If you had any questions about whether being off-color is good or bad, this should make it pretty clear to you that it’s not good.

In case you’re wondering: no, Sketch Engine does not pay me to shill for them.  In fact, I pay them quite a bit of money every year for access to their corpora and search engine.

Bon ménage

It’s amazing how many Republican politicians have gone down in flames over the years because they talked a lot of shit about immigration and then turned out to have an illegal housecleaner.  Some examples:

  • Meg Whitman, Republican candidate for governor of California, 2010
  • Andy Puzder, Trump pick for Labor Secretary, 2017
  • Tom Tancredo, long-time Republican congressman from Colorado and one of the worst of the hypocritical people in the area of pushing anti-immigrant policies and then hiring them.  He bragged about turning in a high school student when an article about him receiving an honors scholarship mentioned that he was in the US illegally–and then got busted hiring illegal immigrants to work on his mansion.

I’ll point out here that two of Bill Clinton’s nominees for Attorney General (the highest law-enforcement office in the United States) went down over illegal nannies–and I’ll also point out that unlike the Republicans, they were not hiring illegal immigrants while hypocritically talking trash about hiring illegal immigrants.  Of course, most past misdeeds seem less relevant under the Trump administration, which seems positively gleeful about being a bunch of crooks, bigots, and–I suspect we’ll soon know clearly–traitors.

On that note, here’s a nice post from the France Says blog on the subject of French vocabulary related to people who clean things.  Enjoy–and if you’re going to hire illegal aliens to work for you, have the grace not to build your career on talking about how bad they are!

Source: Bon ménage

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues