Walk of Shame defined: How to show that your research topic is novel

On some interpretation of the word “scientist,” I am a scientist.  In practical terms, that means that I publish a lot, and that when I publish, it’s in journals with names like Genome Biology or Suicide and Life-threatening Behavioror in conferences with names like American Medical Informatics Association Annual Symposium.  That whole “publish a lot” thing is important, and in order to convince people to publish scientific work, you need to convince them that what you’re writing about is something new.  

A common trope for convincing other scientists that what you’ve done is new is by saying some version of “no one has ever published anything about this before.”  …or at least, not very much–not enough.  But, how do you convince an editor that no one else has ever written about your topic?  The best way that I know of is to tell the reader what has been written about the topic that’s close to, and relevant to, what you’re doing, but not quite what you’re doing.

Here’s a version of the convince-the-reader-that-they-should-trust-that-you-know-what-is-and-isn’t-novel-because-you’ve-read-a-ton-about-the-topic strategy in action.  Note that the author doesn’t assert that what they’re writing about has never been written about before–they tell you about a bunch of stuff that has been written about on closely related topics (citing eight specific other works on the subject), and then asserts that there’s still an open question (which, of course, they will answer in their paper):

Screen Shot 2017-11-10 at 05.13.52

Having been beaten about the head and shoulders with eight citations on the topic, the reader is pretty likely to accept that you know what has and hasn’t been done here.

Here’s another take on the strategy.  The following paragraph follows a fairly detailed discussion of 76 (that’s seventy-six, for those of you who, like me, are so old that you have trouble reading numerals on screens) related papers, books, etc.:

Screen Shot 2017-11-10 at 05.23.22

The author isn’t letting you decide for yourself what that pile of references means from the perspective of their work–they’re telling you what it means.  That extra step of summarizing is actually the author taking the opportunity to place their work relative to everybody else’s, which lets them show you how their work is different from all of the stuff that ‘s been done before.

These laborieuses pensées (never miss a Montaigne allusion, say I) came to mind while ordering breakfast the other day in Ted’s Bulletin, a cool restaurant in Dupont Circle (or maybe Adams Morgan–I never know where to draw the line between them) with a damn fine breakfast menu.  Perusing said breakfast menu, I noted the following:


Now, I know what a burrito is.  I also know what a “breakfast burrito” is–you would expect it to have eggs and other traditionally American breakfast stuff within.  I also know what the Walk of Shame is.  Here are some definitions, scraped from the wonder that is the Internet:


Screen Shot 2017-11-10 at 05.32.54
Source: screen shot of Google’s suggested definition.
Screen Shot 2017-11-10 at 05.36.29
Screen shot from urbandictionary.com


Casting about for reliable sources to cite in this blog post, I came across the following in Brett Lunceford’s paper The walk of shame: A normative description, in the journal ETC: A Review of General Semantics.

Screen Shot 2017-11-10 at 05.39.49
Source: Lunceford, Brett. “The walk of shame: A normative description.” ETC: A review of General Semantics 65.4 (2008): 319-329.

One notable thing about this paper is that in it, Lunceford takes a nice approach to the why-you-should-believe-me-when-I-say-that-no-one-has-done-this-before move.  Here’s where he asserts that what he’s doing is novel (new):

Screen Shot 2017-11-10 at 05.43.31.png

Did you catch that footnote (endnote, actually) at the end of the first sentence?  Let’s track it down:

Screen Shot 2017-11-10 at 05.46.36.png

The “only” in I could find only one article implies “I looked hard.”  The description of what the article does and doesn’t say on the subject implies “I actually did read the paper.”  Personally, as an editor, or as a reviewer, or as a reader, I find that convincing.  Sure, it’s possible that the writer of something like this just did a really shitty job of searching–but, that’s not the most obvious conclusion to draw, and most people are going to accept this support for the novelty of the work if they don’t actually, personally, know of relevant prior work that you’ve missed–and if they do, they’re doing you a favor by telling you about it, right?  Brett (the author of the Walk of Shame paper) sees it this way himself.  As he put it to me in a recent email:

I was wary of writing that I couldn’t find anything else on the topic, because there is always a risk that someone will then respond, “You idiot! How could you not know about XYZ’s article on this very topic?” But Penn State had an amazing library with a wide range of sources available to me, so I figured if I couldn’t find it there it was likely not available. But that risk is why most people seem to hedge their bets, as we saw in some of the earlier examples here.

(I didn’t tell Brett that the earlier examples were all from my papers.  I also didn’t tell him that I didn’t realize that it sounded like I was hedging my bets–d’oh!)

As it turns out, Brett’s paper is also an interesting example of how you find a novel topic, as well as the role of your literature review in convincing not just the reviewers, but yourself that you are following a new line of thought.  Brett himself again:

In the field of rhetoric, we are much less concerned with novelty in the subject than we are in a new and different reading of the topic. After all, there are many people still combing over speeches by MLK and Lincoln. So I was under less pressure to find novelty than someone in the sciences. For me, it was genuine surprise – how have we as a scholarly community managed to overlook this? I did not set out to discuss the walk of shame, though. Rather, the walk of shame came to me. When I was teaching a course in small group communication, one assignment was to present an infomercial of a product, real or imagined. One student group presented a “walk of shame” kit. I asked if this was a major issue on campus, and they replied that some people would get up early to taunt those returning to the dorms. This was what got me thinking about the notion of shame. When I looked to see what had been written on the topic, I was surprised at how difficult it was to find anything at all. I found stuff in Cosmo and other popular magazines, but nothing else. It’s rare to find a topic so common, but so overlooked, so I wrote the article.

Picture source: imgflip.com

(That’s my emphasis.)  Now, just because you’ve established that something hasn’t been done before doesn’t mean that you’ve established that it would be worth doing it.  Brett takes this up head-on in the conclusion of his article.  He actually combines it nicely with a technique called introducing a nay-sayer—that is, explicitly pointing out a potential counter-argument to what you’re asserting (or in this case, to the idea that anybody should be bothering to read what you’ve written).  The conclusion to his article is as good of an ending to this blog post as I could have written, so I’ll leave you with his words:

The walk of shame may seem an inconsequential matter but linguistic practices that work to police female sexual behavior in this way are links in the chain of female oppression.  These chains can–and should–be broken through critical evaluation of sexual norms and a redefinition of female and male sexual behavior.  But before we can act, we must first begin with how we think about these norms–in short, we must begin with the very words we use to define ourselves, our actions, others, and our relationships.

French notes

la référence or la référence bibliographique : a citation, as in an academic paper.  Frome WikipediaUne référence bibliographique est un ensemble de données permettant d’identifier un document publié, ou une partie de ce document, et d’y faire référence.

Not to be confused with…

la citation : a quote.  From WikipediaUne citation est la reproduction d’un court extrait d’un propos ou d’un écrit antérieur dans la rédaction d’un texte ou dans une forme d’expression orale.

How to irritate a linguist, Part 4

The conditional probability of “dog” is higher if the preceding word is “my” than if the preceding word is “artichoke.”

Screen Shot 2017-11-27 at 18.59.13Screen Shot 2017-11-27 at 18.59.24

Here’s the closest that we come to “complexity” in linguistics: take a big sample of some language.  Build a statistical model of the conditional probabilities of all two-word sequences (“conditional” probability is the probability of some word given that the preceding word was X.  The conditional probability of dog is higher if the preceding word is my than if the preceding word is artichoke).  For that statistical model, you can calculate something something called perplexity.  It’s as close as linguists come to having any notion of “complexity” of language.  Here’s a bit of the Wikipedia page on perplexity:

In natural language processing, perplexity is a way of evaluating language models. A language model is a probability distribution over entire sentences or texts.

Using the definition of perplexity for a probability model, one might find, for example, that the average sentence xi in the test sample could be coded in 190 bits (i.e., the test sentences had an average log-probability of -190). This would give an enormous model perplexity of 2190 per sentence. However, it is more common to normalize for sentence length and consider only the number of bits per word. Thus, if the test sample’s sentences comprised a total of 1,000 words, and could be coded using a total of 7.95 bits per word, one could report a model perplexity of 27.95 = 247 per word. In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.


But you have summers off!

I didn’t shower.

I spent enormous amounts of time in grad school watching things like this go across the screen, labelling the points where various and sundry parts of the vocal tract are at their peak heights. Today I would just write a program to do it for me, but that was a loooong time ago… Picture source: http://www.phon.ucl.ac.uk/courses/spsci/expphon/week7.php

When I was a first-year graduate student, I did not realize that the first-year winter break was the last time off that I would have until I walked out the door of the lab for the last time–several years later.  Consequently, I spent it reading a book about linguistics–that is to say, I spent my last vacation for several years reading a book about the subject that I was studying during all of my work hours.  On the plus side, it was a great book, and I was thrilled to find a copy in the original French in the basement of one of the Giberts in the Quartier latin last year.  But, had I known that I would be spending the next several years in a basement watching jaws go up and down (literally), I think I might’ve taken a walk instead, or something.

The thing that makes academics howl the most is people saying it must be so nice to have summers off!  In reality, the summertime is when you work frantically on your research.  Research is, in theory, what you’re in academia to do.  In practice, events usually conspire to keep you doing everything but your research.  What kind of events?  Read on.  The following is a recent Facebook post from Karin Verspoor, a colleague upon whom I rely to kick my ass, teach me things, and basically be a bossy but much-adored big sister (despite the fact that she’s a lot younger than I am).  One of the many reasons that Karin is my hero is that she does smokin’ research in computer science (smokin’ explained in the English notes below) while raising three damn fine kids, which means that her research work typically starts around 10 PM.  Along the way, she got a doctorate at the best computational linguistics program in the world, ran labs at various and sundry research facilities in the US and Australia (hence the weird spelling in what follows), became a full professor at the University of Melbourne, and taught me a bit about statistical distributions of language (probably the most difficult of all of the things that she’s done–and, of course, the topic of this blog).  Karin’s Facebook post is about as accurate of a portrait of what the daily life of an academic looks like as one could hope for.  As Karin puts it: it’s not a whine–it’s just an attempt to answer the question Where does the time go?

This is not a whine, simply an exploration of the question “Where does the time go?”

phd021717sToday I had no meetings (a rare occurrence indeed) so I decided to stay home to catch up on some work. I started the day with ~150 unread messages in my inbox. During the day I received another 111, most of which I processed but not all, and ended up sending 125 messages. I am now near the end of the day with 61 messages still marked unread (I have read nearly all of them, but these are ones that I want to follow up on one way or another at some future time; yes there is probably a better approach).

ce7651ece5292ac5568bde111e80c355-phd-comics-phd-studentI finalised a paper due today and a presentation that someone is presenting on my behalf at 6am tomorrow Melbourne time in Bethesda MD (I will join by teleconf for Q&A), with a lot of back-and-forth with the students involved. I processed paperwork for a visitor and a new post-doc including checking references, tweaked [“tweak” explained in the English notes below –Zipf] position descriptions for positions we are getting ready to advertise (come work with us!), checked up on my seats and meals for my Friday flight, confirmed next week’s meetings, looked at flight options for my next-next trip, dealt with editorial work for 3 papers in 2 journals, contributed to the agenda for a meeting next week which I will miss anyway, inventoried and downloaded all the papers I need to review in the next couple of weeks (a lot), reviewed and gave feedback on several pieces of student work, made plans for some research assistant work while I’m out of town, made progress on a promotion assessment I’m working on, followed up on inquiries from prospective students.

I diddissertation-defense-phd-comics-safety_1n’t get to start packing for my trip, which I wanted to do, and didn’t quite manage to finish our Australian taxes which are due *very soon now*. I didn’t change out of my pyjamas until 3:30pm when it was time to meet Consuela after school (I obviously didn’t walk her there). I didn’t shower.

Of note: I had no time to work on any research today, as is the case most days. Research, in theory, is my job. I’m not even teaching this semester.

The glamorous life of an academic.


English notes

to tweak: This delightful monosyllable has a number of meanings in English.  I’ll give you the sense with which Karin used it, and then my favorite.

Karin’s: to make usually small adjustments in or to; to fine-tune (Merriam-Webster).  I think it’s somewhere between peaufiner and fignoler, perhaps.  Some examples:

How Karin used it: I tweaked some position descriptions for positions we are getting ready to advertise (come work with us!).

My favorite: to be high on, or doing, meth.  It can also be a noun, with a closely related sense.  Some examples:


smokin’: This is an adjective, and it’s quite positive. I don’t have a specific translation for it—it’s just really good.

How I used it in the post: One of the many reasons that Karin is my hero is that she does smokin’ research in computer science while raising three damn fine kids, which means that her research work typically starts around 10 PM.  

Here we go ’round the mulberry bush

I admit it: I’m an old fuddy-duddy. 

I admit it: I’m an old fuddy-duddy.  [Half of this post seems to have ended up in the English notes below, beginning with fuddy-duddy.] Case in point: I fought like hell the introduction of wikis into the daily working life of our lab.  Eventually, I folded, and it did not go without notice: one day in a lab meeting I mentioned my delight at having edited something on Wikipedia, and one of our graduate students was delighted to point out the unavoidable contrast with my oft-stated position on such things: Ha, Kevin’s using a wiki!  

Today I am delighted once again: I just edited my first French-language Wikipedia entry!  Here’s the before-and-after.  Can you see the word whose spelling I edited–twice?  (You’ll find it in the French notes below.) Of course, being excited about having spotted a French-language spelling error means that before noon I’ll have said something stupid enough to make the entire lab bust out laughing, mais c’est normal, ça, it happens every day anyway…


Screen Shot 2017-11-24 at 07.06.19


Screen Shot 2017-11-24 at 07.08.10

English notes

old fuddy-duddy: “one that is old-fashioned, unimaginative, or conservative” (Merriam-Webster)


like hell: an intensifying adverb.  It means something like very hard .

How I used it in the post: I fought like hell the introduction of wikis into the daily working life of our lab.  Eventually, I folded, and it did not go without notice.

There’s another use, meaning “very much,” often appearing with the verb to hurt:


…and there’s another use, explained by Merriam-Webster like this:

— used to say in an angry and forceful way that one will not do something, does not agree, etc. “It’s your fault!” “Like hell it is!”


to fold: to give in, to give up, to surrender.

How I used it in the post: I fought like hell the introduction of wikis into the daily working life of our lab.  Eventually, I folded, and it did not go without notice.

to bust out: to suddenly begin doing something–intensely, I think.  You might bust out laughing, for examply–suddenly you start laughing, hard.  This feels pretty informal, limite slang, because of the word bust, which is a low-register word (versus the word to break, which would be a more appropriate choice at work, in a classroom, etc.).

How I used it in the post: Before noon I’ll have said something stupid enough to make the entire lab bust out laughing.

French notes

le mûrier : blackberry bush; mulberry tree

Picture source: Par LPLT — Travail personnel, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=23966478

How we’re sounding stupid today: “their”

The next morning I’ll show up at the lab and someone will say “good morning” to me in a way that I haven’t heard before, and I’ll stand there like a drooling idiot trying to figure out what the hell I just heard. 

It never fails.  One evening I’ll be sitting around with friends discussing how the American right wing versus the American left wing defines “the establishment,” or the contrasting French and American understandings of religious freedom, or the lack of correspondence between the French and American categories of liberal, conservative, leftist, rightist, or something equally complex.  I’ll think that I’m finally starting to be able to function in the French language.  Then the next morning I’ll show up at the lab and someone will say “good morning” to me in a way that I haven’t heard before, and I’ll just stand there like a drooling idiot trying to figure out what the hell I just heard.  (For lots of ways to say “hello” in English, see the English notes below.)

Picture source: https://fr.slideshare.net/surco2011/presentacion-basica

Case in point: it was recently pointed out to me that I essentially always get the word their wrong.  It has two forms, leur and leurs, neither of which I ever get right–instead, I say/write son/ses.  Those are the singular forms.  Crazy–any child could get this right…

OK: enough self-castigation.  Let’s practice.  We’ll use a technique that I learned from the book Português Contemporâneo, by Maria Abreu and Cléa Rameh.  Amazon describes it like this:

This is the first volume of a basic course organized around the concept that to learn another language is to internalize another set of linguistic rules.

How do you do that?  Abreu and Rameh do it with drills.  Not tests, really–drills.  When you have things that contrast with each other–say, the first-person singular form of a verb (I like you) and the first-person plural form of a verb (We like you)–you repeat each one several times in a very similar sort of sentence.  Only after that do you start mixing them together.  If memory serves, you do this in sets of half a dozen or so.  For example:

Ça c’est son livre.

  1. stylo : Ça c’est son __________ stylo.
  2. porte-monnaie : Ça c’est __________ porte-monnaie.
  3. sac à dos : Ça c’est __________ sac à dos.
  4. père : Ça c’est __________ père.
  5. chien : Ça c’est __________ chien.
  6. bateau : Ça c’est __________ bateau.

Now, remember: I belong very firmly to the “write about what you don’t know” school of…well, of writing.  So, I may wander into error very quickly here–I depend on you native speakers to keep me straight here.  This is about to get to be a problem, because…

There can be quite a bit of variation between languages in how nouns get inflected–or don’t–when you’re talking about a group of individuals, each of which has something.  For example, suppose you’re talking to a group of kids, and their faces are dirty, and you’re going to tell them.  Here is how that works in English, and in Spanish:

  • Wash your faces.
  • Lavense la cara.

Notice that in English, faces is plural, while in Spanish, la cara is singular.  It’s pointless to talk about logic or lack thereof here–this isn’t about logic.  It’s just a fact about the individual language.  How does it work in French?  I actually have no idea, but I bet I can screw it up pretty easily.  In order to try to avoid that, I’m going to hunt about for real-life examples.  But–I’ll bet that native speakers disagree about this kind of thing.  I’d love to hear your opinions on this in the Comments, if they differ.  In order to try to get the most Académie-ish usage, I’ve taken these examples from the EUROPARL corpus, a set of speeches in all of the languages of the European Union.  These texts come from either native-speaker politicians or professional EU translators, so they should, in theory, be close to the spoken standard.


  1. travail : Je le déplore car nous devons agir du fait que d’ autres n’ ont pas fait __________ travail.
  2. choix : Nous espérons que ceux qui envisagent de voter contre ces amendements pourront donner de bonnes raisons de __________ choix au Parlement et aux citoyens qui sont à la recherche d’ un emploi.
  3. but : Pourtant, __________ but originel est d’ indiquer la direction à suivre et de fixer des priorités.
  4. part : Les États membres ont aussi __________ part de responsabilité dans ce domaine et ils ne doivent pas l ‘ oublier.
  5. région : Les gens réagissent en ” votant avec leurs pieds “, en quittant __________ région dans l ‘ espoir de trouver de meilleurs revenus.
  6. taux :  Je pense que nous en avons tous le plus grand besoin si nous considérons les résultats des dernières élections européennes, si nous considérons __________ taux de participation.

Now: that probably looks super-simple, but it’s probably six times more than I’ve used leur in the entire past six months–apparently, I get this wrong all the time.  It’s about to get a lot harder for me to set up the drill, because I have to figure out a way to cue whether the possessors are singular or plural, while simultaneously keeping the structures as similar as possible–difficult (for me, at any rate) in a language in which the verb agrees with the number (i.e., singular or plural) of the subject.  Let’s see what we can do here…

Il culpabilise son prof.

Ils culpabilisent leur prof.  (Note that we have to make the assumption that “they” share the same (single) professor–this is different from the “wash your faces” case, where everyone has their own, individual face.)

  1. il, cousin : Il culpabilise __________ cousin.
  2. ils, père : Ils culpabilisent __________ père.
  3. ils, président : Ils culpabilisent __________ président. 
  4. il, président : Il culpabilise __________ président.
  5. il, chien : Il culpabilise __________ chien.
  6. ils, chef : Ils culpabilisent __________ chef. 

OK, that wasn’t so bad…  Let’s do another drill of six examples.  This time, though, I’m going to use feminine nouns, so the singular will be sa.

  1. Les deux frères, mère : Les deux frères aiment __________ mère.
  2. Les deux maîtresses d’école, directrice : Les deux maîtresses d’école aiment__________ directrice.
  3. il, mère : Il aime __________ mère.
  4. ils, tante : Ils aiment __________ tante.
  5. il, voiture : Il aime __________ voiture.
  6. Le jeune homme et sa meuf, voiture : Le jeune homme et sa meuf aiment __________ voiture.

(OK, quick question for my fellow anglophones: or these more straightforward or less straightforward when I can find actual nouns (les deux frères) instead of pronouns (ils)?)

Notice that we haven’t even touched the form leurs, which is used when multiple things are owned–that gets us straight into the question of what you do in cases where everyone has their own of something (remember the example wash your faces in English versus lavense la cara in Spanish that we talked about above).  We’re up to 1200 words already, so let’s leave that for another day–if I can get this straight with the singular leur, maybe I’ll at least be making 50% fewer errors today, at any rate…

English notes

Here are a bunch of ways to say hello.  Some of them you would only use in the morning–I’ve listed them separately.  Good afternoon and good evening sound pretty formal–in America, we tend to use separate greetings for the morning only.  For the rest of the day, we mostly just use some form of hello. 

  1. Hello
  2. Hi
  3. Hey
  4. Hey there
  5. What’s up?
  6. ‘Sup?  (for young people only–I’ve probably never said this in my life.  It’s a form of what’s up?)
  7. What’s goin’ on?
  8. Howdy (don’t even think about using this one unless you want to make people smile/laugh, in which case: go for it)
  9. Howdy, pardner (see note above–and, yes, you have to pronounce it pardner, with a d, to get the full effect)

For the morning only:

  1. Good morning
  2. ‘mornin’
  3. mmmphglrg (or words to that effect–I’m (mostly) joking here)

The Holy Brotherhood

Nobody who’s anybody walks in LA.

The Missing Persons song got it right: nobody walks in LA.

It’s time to renew my visa, which means a flight to Los Angeles to render myself to the French consulate tomorrow morning (you are assigned to one consulate or another depending on where in the US you live–mine is in Los Angeles), which means that I spent three hours this afternoon photocopying every @#$% document that the application requires, arranging them all in my little French plastic sleeve in the exact order in which they appear on the instructions page on the consulate web site, imploring the poor lady at FedEx to take my mug shot in such a way that I might appear adorable, or at least not hideous; and walking.  Possibly the Missing Persons lyrics should have been “nobody who’s anybody walks in LA,” ’cause I wasn’t actually the only one. There was the enormously, enormously, enormously obese white woman wearing a halter top and a muumuu, sitting in front of a house that must have cost several million dollars (I shit you not), with all of her belongings in three very chaotic-looking shopping carts, singing softly to herself.  The black lady of my age or so sitting at an empty table in Starbucks, staring at nothing, her lips silently moving as her legs twitch like… well, I suck at analogies, but the poor lady’s legs twitched non-stop.  The oddly-well-groomed-despite-wearing-shorts-and-sneakers-with-tube-socks white guy of my age or so pacing the sidewalk with a blank canvas under his arm, becoming increasingly agitated as he stops by my table again and again to ask if it’s not the case that the car parked in front of the cookie shop is there illegally.  The thin black woman of my age or so (what the fuck is going on with the people my age in LA??) sitting on a bench, waving her hands and having an animated conversation with someone visible only to herself; on her lap is a checklist on which is written חֶבְרָה קַדִישָא, which is Aramaic for “The Holy Brotherhood,” which is the term for a Jewish volunteer burial society.  (Just don’t fucking ask why I can read Aramaic well enough to catch things written on random strangers’ checklists, OK?)

The streets of Paris are full of beggars (see this post for information on why that’s the case, and why it has been the case for centuries).  What the streets of Paris are not full of, though, is vulnerable psychotic people.  Why?  In the United States, we have no national health care system.  In France, there is a national health care system.  Want to know which other first-world countries don’t have national health care systems?  None.  And what are the Republicans hot to do?  Get rid of the closest to national health care that we’ve ever been able to get.  Vote in 2018…

The folks at the consulate were super-nice, and I’m happily re-established in Paris–legal until the end of April, yay!

English notes

I shit you not: I’m not kidding you; I’m telling you the truth.

Mystery solved: the Paris edition

Filled-in windows next to the gate of the Cordeliers campus of the École de médecine, Paris. Picture source: me.

Adam Gopnik’s Paris to the moon is the best writing that I know of on the experience of being an American ex-pat in Paris.  He maintains that the only really important difference between Paris and New York is the latitude: Paris is, in fact, so far to the north that in the wintertime, days are super-short here.  For me, it’s the one and only problem with this place—the winter darkness is crushing, a weight that I often think I can feel physically. 

Screen Shot 2017-11-22 at 05.17.57
Google’s autocompletes when I looked for the origin of Paris’s nickname, “The City of Light.” Note: Paris is NOT dirty.

In this city often called the City of Light, light actually is often an issue.  If you live on one of the lower floors of the typical 7-story Hausmannian apartment buildings that make up about 60% of Paris, the sunlight only actually shines into your home for a short time every day, even in the summer—in the winter, it’s a sort of perpetual gloom, even if you have big windows, just because of the height of the surrounding buildings.   Your windows are everything here, as far as I’m concerned.

Bricked-up windows, someplace or other in Paris. Picture source: me.

Consequently, it’s always been a surprise to me to see things like you see in the photo to the left.  You’ll notice that a number of the windows have been bricked up.  In a city where sunlight is at a premium, why the hell would you do that?

A wonderful tour guide told me the answer: once upon a time, buildings were taxed by the number of windows.  Brick up your windows, and you paid less in taxes.  At the time, Parisians mostly rented their apartments (today it’s common to own your apartment), and from the landlord’s perspective, it made sense—if you didn’t think that you could make up the tax difference by charging more rent, you might as well brick up your windows, pay less taxes, and your renters be damned.

Bricked-up windows overlooking a little café on the rue des Écoles. I accidentally learned the word “braguette” here when I walked inside to pay–with mine open. Picture source.

Interesting, but I was never able to find any documentation of the old tax rule that the tour guide had told me about, and I don’t typically write about things on this blog if I can’t find a source to cite.  Fast-forward a few months, though, and I find myself reading Victor Hugo’s Les misérables.  I was expecting a nasty cop trying to throw a guy in prison for stealing two loaves of bread; instead I’ve read  chapters and chapters about a really nice priest.  Is this book ever going to go anywhere?  I have no clue.  But, then I came across this.  Remember: as I said, the priest is really nice.  At one point, he gives this sermon:

« Mes très chers frères, mes bons amis, il y a en France treize cent vingt mille maisons de paysans qui n’ont que trois ouvertures, dix-huit cent dix-sept mille qui ont deux ouvertures, la porte et une fenêtre, et enfin trois cent quarante mille cabanes qui n’ont qu’une ouverture, la porte. Et cela, à cause d’une chose qu’on appellee l’impôt des portes et fenêtres. Mettez-moi de pauvres familles, des vieilles femmes, des petits enfants, dans ces logis-là, et voyez les fièvres et les maladies ! Hélas ! Dieu donne l’air aux hommes, la loi le leur vend. Je n’accuse pas la loi, mais je bénis Dieu. Dans l’Isère, dans le Var, dans les deux Alpes, les hautes et les basses, les paysans n’ont pas même de brouettes, ils transportent les engrais à dos d’hommes ; ils n’ont pas de chandelles, et ils brûlent des bâtons résineux et des bouts de corde trempés dans la poix résine. C’est comme cela dans tout le pays haut du Dauphiné. Ils font le pain pour six mois, ils le font cuire avec de la bouse de vache séchée. L’hiver, ils cassent ce pain à coups de hache et ils le font tremper dans l’eau vingt-quatre heures pour pouvoir le manger. — Mes frères, ayez pitié ! voyez comme on souffre autour de vous. »  — Victor Hugo, Les misérables


Hugo, a champion of the poor, had it right: search for impôt des portes et fenêtres and you’ll find the Wikipedia page on the subject.  Turns out the tax was first instituted during the Revolution of 1789, but it comes from an older Roman tax scheme called the ostiarium.  In effect until 1926, the original goal was to have a progressive tax, i.e. one that falls proportionally more heavily on richer people.  As it turned out, it had a bad effect on the renters.  From Wikipedia:

Cet impôt fut accusé de pousser à la construction de logements insalubres, avec de très petites ouvertures, donc sombres et mal aérés, et il conduisit à la condamnation de nombreuses ouvertures, ainsi qu’à la destruction, par les propriétaires eux-mêmes, des meneaux qui partageaient certaines fenêtres en quatre, ce qui augmentait substantiellement l’impôt.  Wikipedia

Sombres et mal aérés–exactly as Hugo described them.

On the plus side, the lack of any prolonged sunshine on my windows means that my apartment never gets very hot in the summertime.  When the days get short, I pull a light box out from under my little water heater (which turns out to be related to another Parisian mystery, but more on that another time), and half an hour a day in front of that makes the crushing winter darkness feel less…crushing.  Spring will be here before we know it.

French notes

Dieu donne l’air aux hommes, la loi le leur vend.  “God gives men the air, the law sells it to them.”  What interests me about this is the double pronominal objects: le leur vend, “sells it to them.”  I have a terrible time with that kind of double-pronominal construction, and as it turns out, a lot of French people do, too–ask someone how to say I give myself to you and you give yourself to me, and I’ll bet that they have to think about it for a minute.  The most common answers that I get are along the lines of Je me donne à toi and Tu te donnes à moi, where the indirect object pronoun (in this case, the person to whom something is being given) is not placed in front of the verb, but rather after it, in a prepositional phrase–contrast those with le leur donne, where both of the pronouns are pre-verbal (before the verb).  Native speakers, got any help for us anglophones here?

The basic principle of shopping in a market

Expecting everyone in the file d’attente behind me to be groaning at my idiocy in not being prepared to pay, I dug out my wallet and started digging through it frantically.  “Relax, it’s Sunday,” said the nice lady behind the counter. 

The basic principle of shopping in a marché (market) is this: look for the longest line, and get in that one.  If there are lots of little old ladies in it: all the better.

So, it’s my turn at the chosen fromagier’s kiosk, and madame is weighing my little Vacherin.  Because there are tons of people in line behind me, I’ve got my money right there in my hand, waiting to pay as soon as I have the goods in hand.  Seulement voilà (the thing is), when the fromagière tells me the price, it turns out to be twice what I thought I remembered from last year.  Expecting everyone in the file d’attente behind me to be groaning at my idiocy in not being prepared to pay, I dug out my wallet and started digging through it frantically.  Relax, it’s Sunday, said the nice lady behind the counter.  (If it’s in italics, in happened in French.  But, this gentleman in line behind me, this lady–I don’t want to inconvenience them.  

Oh, no–madame is right, it’s Sunday.  No one is in a hurry, said the gentleman.  He smiled.  The lady behind him smiled.  The fromagière smiled.  even smiled.  I got my Vacherin, said au revoir to everyone, and walked away.  Have a good Sunday, said the fromagière.

Explain to me again why you think that French people are rude??

The reason that I hadn’t boughten a Vacherin for a year: it’s a winter cheese.  (Boughten discussed in the English notes below.) Yes, cheeses have seasons, and this one shows up around the time that the days start to get depressingly short and you wonder whether or not you can find last year’s gloves.  According to my copy of Marie-Anne Cantin’s Guide de l’amateur de fromagesIl est de nos jours un des rare fromages saisonniers.  (The kid at the fromagerie that I usually go to–it’s about a 20-second walk from where the firing squads used to do their thing up against the wall of the Fermiers généraux, as recently as 1871–told me one day about some of the tricks that are now used to get sheep to produce milk outside of the lambing season.  It’s not cruel, but not exactly appetizing, either.)

Picture source: http://www.haussimont.com/

Also known as Mont d’or, I think it’s hyper-bon, and apparently a lot of other people do, too, because at this time of year, it’s stocked more heavily than anything else.  As you can see in the photo (taken on my kitchen table), it comes in a box (and it must come in a box), and the box is made of épicéa (spruce) (and it must be made of épicéa).  Cantin says that it’s from the spruce that the unusual taste of a Vacherin comes.

As you can see from the picture, a good Mont d’or has undulations on the surface–des vagues (waves), Cantin calls them.  It’s a very soft cheese, to the point that if you a buy a larger one, it typically comes with a wooden spoon–indeed, you can just scoop it out, and it spreads easier than butter.  (One of my friends insists that the only way to eat a Mont d’or is to pour some white wine on top, put it in the oven for a bit, and then pour the melted cheese over boiled potatoes.  Cantin sees it my way, though, and what my friend doesn’t know, won’t hurt her.)

In the time that it’s taken me to write this post, I’ve eaten approximately 25% of my Vacherin, and you know what?  I don’t care.  The other day I calculated how many more weekends I have to live: 680.  Probably sounds morbid, but it inspired me to work not more than, say, 30 minutes all of this weekend, which happens, like, never–did you calculate how many weekends you have left yesterday, and if not, what did you do this weekend?  Carpe diem, baby!

French notes

l’épicéa (n.m.) : spruce.

le vacher : cowherd.  Le vacherin était autrefois le fromage des vachers.  As Cantin explains this: back in the days, comte was made in the mountains while the cows did their summer grazing.  In the winter, the cows would be back in the stables, and the milk quantity and quality decreased.  Additionally, the roads could impassable.  So, rather than taking the milk to a cheese-maker, the farmers made their own cheese out of it–hence Vacherin being a vacher’s cheese.

English notes

boughten: yes, boughten is English.  More commonly, it’s bought, but you will run into the boughten form in some dialects–the Midwest and the Northeast, mostly, I think, although I couldn’t swear to that.

Picture source: http://izquotes.com/quote/230318
Picture source: bittersoutherner.com


A day in the life of a puppy

There was a leaf and I sniffed it and there was kibble and I peed and I took a nap and there was a bird and I barked at it and we played King of the Hill and I peed and I took a nap and…

J’ai bien reçu cet SMS du chiot d’un pote aujourd’hui :

yavê une feui é jl’é reniiiphlé é yavê dé krokè é sétè miam miam et pui jé fê pipi é popo et pui jmesui fê un peti som et pui jé fê pipi et pui yavê un ouoyzo et jé hurlé trê férossmen et ysanètalé et pui on a jwé roy de la montanie et pui jé fê un peti som et pui jé fê pipi é pui ma seur m’a bouskulé et jé di “iapiapiap” é pui on a rennniphlé un ballllon é pui on a jwé loup-garou é pui jé fê pipi é pui jmsui fê un peti som et pui jé fê pipi

Nul en ponctuation mais il se débrouille pour bien exprimer ce qui lui passe quand même, hein ?

French notes: You’ll find the names of some dog breeds below.  (I have no idea how one pronounces chow-chow in French.)  One of the delightful aspects of regulation in France: you can call your purebred dog  (le chien de race) anything you want, but its registry name has to start with the official letter of that year.  2017 is N, so look for lots of registry names like Ninou, Nanette, or Nadine until fin décembre, and then lots of puppies named things like Olivier, Odette, or Ourson beginning January 1st…  One of the delightful aspects of the language itself is the occasional presence of names for males and females of a given breed, but the only one that I know off the top of my head is levrette–no limite remarks necessary on that one, but I’d love to hear about others in the Comments section…

Picture source: http://www.ikonet.com/fr/ledictionnairevisuel/images/qc/races-de-chiens-290540.jpg
Picture source: http://www.ikonet.com/fr/ledictionnairevisuel/images/qc/races-de-chiens-290541.jpg

Some things are just WRONG: How to set your y-axis range, and why

I recognize the difficulty of defining things like good and evil.  I recognize the hazards of binarity: a lot of things in life just aren’t simple enough to talk about in terms of yes/no, up/down, right/left.  Nonetheless: some things are just WRONG.

Source: me.

Case in point: labelling your y-axis in graphs.  The y-axis of a graph is the vertical part.  It typically indicates a quantity of something.  Most quantities in life have some natural range of values.  For example:

  • Percentages can’t go lower than zero, or higher than 100.  That’s just part of the definition of “percent.”
  • Accuracy can’t go lower than zero, or higher than 1.  That’s just part of the definition of “accuracy.”  (Yes, accuracy does have a definition.)
  • Minutes in an hour can’t go lower than zero, or higher than 60.

Not everything has a fixed range, right?  For example: temperature has a minimum (the point at which all molecular movement ceases), but (as far as I know), it doesn’t have a maximum.  Age can’t be a negative number, but if there’s an upper limit on how old something can be, I don’t know what it is.  (Obviously, there’s a limit on how old can be, and I won’t mind hitting it.)

Now, if you are graphing something that has a fixed range–percentages, you should have a y-axis that corresponds to that range.  You’re graphing the accuracy of your man-eating-rabbit detector?  No accuracy at all (i.e. always being wrong) is an accuracy of zero, and always being right is an accuracy of 1, so your y-axis should go from 0 to 1.  You’re graphing the percentage of recently-fed post-pubescent man-eating rabbits that your man-eating-rabbit-electrocutor successfully electrocutes?  The smallest possible percentage is 0, and the largest possible percentage is 100, so your y-axis should go from 0 to 100.

Why do I say this?  Where the hell do I, an English major and an acknowledged innumerate, get off telling anybody how to draw their graphs?  Who the fuck do I think I am–some Strunk and White of data visualization?  No, not at all.  You use y-axis ranges that reflect the natural range (if there be one) in your data for two reasons.

Reason number 1: Using a natural range helps your reader understand what you’ve found.  The top and bottom of the y-axis are visual landmarks that your user uses to get the instinctive feel that graphics are so good for about the size of whatever effect it is that you’d like to convince them that you found.  (English note: I don’t have a clue about whether your reader is male or female, so in my dialect, you use the “generic plural” them to refer to him/her/it.)  Not using a natural range for your y-axis is failing to help your reader get your point.

Reason number 2: Not using a natural range can be completely misleading.  I’ve never seen anybody use an enormously larger range than they should have.  No: when people don’t use a natural range for the y-axis, they’re using a smaller range than they should have.  What is the effect of using a smaller range than you should?  It makes things look more different from each other.  Why does looking more or less different matter?  Because science is typically about finding the differences between things.  These cancer cells got treated with Drug A and didn’t die, these cancer cells got treated with Drug B and did die–that’s a difference that will save some lives.  What happens when your y-axis doesn’t show the full range of possible values?  It makes it look like there are differences where there really aren’t any differences.  It’s misleading, and if you’re going to claim that you’re doing science, you should avoid misleading people.

Yes: I am ranting.  In fact, this is the kind of thing that I like to rant about when I teach.  (Well, I don’t like to rant about it–it irritates me, and I would prefer that it not exist.  But, rant about it I do.  (See the end of this blog post for an explanation of that weird construction rant about it I do.))  But, somehow, when I’m teaching, I never seem to have an example handy, and I sometimes wonder if my students think that I’m ranting about a phenomenon that doesn’t actually exist.  (I’m pretty sure that they think that when I rant about man-eating rabbits, but I haven’t actually asked.)  So, when I happened to come across a stellar example today, I took a screen shot of it.  I’m not going to tell you what the source was–to protect the guilty.  Check it out.  What do you think the point is of those columns labelled ours?  It’s that their numbers are bigger than the other guys’, and therefore better.  But, are they, really?

Screen Shot 2017-10-30 at 16.51.25
Graph with a y-axis range from 80 to 88 when it should be 0 to 100.

I think not.  Are their bars higher than the other guys’?  Yep.  But, look at the values on the y-axis–the range is only from 80 to 88.  They’re graphing something called F-measure, which has a range of possible values from 0 to 100.  (Actually, it’s from 0.0 to 1.0, but I’ll let that go for the moment.)  Their crappy y-axis range makes it easy to think that there’s some interesting difference there, but actually, there’s barely any difference at all between their score and the next-best score–that tiny difference means essentially nothing, and is more likely to be statistical “noise” than it is to reflect anything real.  Note that these guys could have made the range be from 80 to 89 just as easily, but making it go from 80 to 88 means that if you don’t look closely, it looks like they have a perfect score–the orange bar goes all the way up to the top of the graph, right?




Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues