Academia, industry, and graduate degrees: the speech/language technology version

People often ask what the job situation in language processing and speech technology is like for people who don’t have PhDs. Here’s my perspective.

No French language stuff in this post–it’s a response to a question that I get fairly often from people who are thinking about leaving graduate programs in natural language processing or speech technology.

I was recently asked: What’s it like to work in the speech and/or language technology industries with a master’s degree?  To understand the question, you have to realize that the alternative would be to work in the speech and/or language technology industry with a PhD.  People in these fields will typically have one graduate degree or another (or both)–the question relates to differences in how you will experience the industry world depending on which of the two you have.  To understand the context of my answer, it’s probably helpful to know that I got a master’s degree, went into industry, went back into academia with a master’s degree and became a researcher, and then got a PhD.  So, I have a little bit of familiarity with academia and with industry, as well as with the experience of working when you have a PhD versus working when you don’t.

The short answer: there isn’t necessarily any difference between what it’s like to work in industry with a master’s degree and what it’s like to work in industry with a PhD.  If you can take a problem and come up with an answer by yourself, and the answer works; if you can identify useful new problems; and if you can propose, implement, and design the evaluation for some project yourself, then you’re going to be treated like a PhD, and if you negotiate reasonably well, you’re going to get paid like one.  Now, the same is actually true in academia, the primary difference being that with a master’s degree, you’re very likely to be working in someone else’s lab and writing grants for someone else, versus having your own lab and writing grants for yourself. (I did this for quite a while, and I loved it.)  This is also true of working in a private think tank like MITRE or BBN, except that you can write your own grants in places like that, at least for internal funding–external agencies like NSF and NIH are not very likely to fund you if you don’t have a PhD.

A longer answer: in industry, having a master’s degree versus having a PhD is likely to affect the position into which you are first hired, but it won’t necessarily have an impact on what your job is like.  Companies that do speech and/or language processing work are accustomed to using the level of your graduate degree as an indicator of how likely you are to be able to do independent work when they’re considering hiring you, but once you’ve talked your way into a job, they generally care much more about the results that you do (or don’t) produce than they do about your academic pedigree.  A friend whose husband tried to get a PhD in computer science, failed his comps, and then went into the robotics industry–a good example of something that’s very oriented towards writing code and towards engineering, but also very much an area where there is a need to be able to do real research in addition to development, described her ex’s experience like this.  He is a guy who did quite well in industry, eventually starting his own company, which he sold for “major bank,” as the kids say (the expression means “a lot of money”).

His fear was that he’d never be able to be the principal investigator on a grant (still the case , I believe)  and that he wouldn’t be taken seriously.   I think that was the case initially, too…certainly in terms of starting position and compensation. He had to prove himself once in the door and build credibility that was conveyed more automatically with a PhD (though I’m sure a PhD could lose that credibility if they didn’t prove worthy ).   At this point,  I don’t think he’s in any different place than he would have been with a PhD. He just had to have more to offer to get those initial doors open than he would have with a PhD.

A master’s degree won’t keep you from rising through the ranks of the industry world, and a PhD won’t keep you from getting fired–I’ve certainly seen both of these happen.

You’ll find that what you learnt in graduate school can be super-helpful in industry.  The opposite is also true–the time that I spent in industry was the best thing that ever happened to my academic career.  Another opposite is true, too–the things that you learn in grad school can hurt you in industry.  I’ll give you some examples of both.

Things you learn in graduate school that can be helpful in industry: the ability to define a problem, state it clearly, figure out how to evaluate it, and communicate what you’ve done are super-useful in industry (and in life in general!).  In fact, I suspect that a lot of my success in industry (to the extent that I had it–objectively, I have been offered a promotion in every industry job I’ve ever had, which I guess is a sort of success) was related to the fact that if I had an idea, I could communicate it more clearly than pretty much everyone else–not because I’m smarter than anyone else, but because I have a degree in English (among other things), and in the process of getting that degree, I learnt to write (reasonably) clearly, and quickly.  All other things being equal, a well-crafted email will generally trump a crappily written email.

Things you learn in graduate school that will hurt you in industry: this may be very culture-specific, but the picture that I got as a student in my linguistics graduate program was that you need to stake out a position and then defend it to the death, and you don’t get to change that position very often.  Again, this may be specific to the culture of linguistics, or even just specific to the culture of linguistics at the time when I was in graduate school–certainly the community around Chomsky and the subfield of linguistics in which he specialized (syntax) is pretty notorious for brutal fights around theoretical issues.  Be clear that this is an attitude that will hurt you in industry.  Theory can be important in industry–often a company has a stake in some particular kind of approach to a problem, at least in very broad terms–but, theoretical purity per se is not typically valued in industry.  In fact, probably the opposite is true–the path to success in industry lies more in humility about your ideas and a willingness to seriously consider the other person’s take on things than it does in defending whatever your take on a question happens to be.  Industries that don’t do this get overthrown, companies that don’t do this fail, and engineers that don’t do this get fired.

From the perspective of quite a few years of doing natural language processing and computational linguistics in an environment that has a hell of a lot more physicians and biologists in it than it has computational people, I’m starting to wonder if this isn’t just a matter of cultural mores, but of differences in philosophy of science.  The classic linguist’s philosophy of science is that of Thomas Kuhn, where the conception (if I understand it correctly, which is not a given) is that science advances when the old ideas collapse under the weight of their clearly stupid inadequacies, and the new ideas succeed by being brilliant, and right, and new–even sui generis.  In contrast, you could say that the industrial world is underlyingly more influenced by the philosophy of science of Karl Popper, where the idea (again, if I understand it correctly, and again, that’s not a given) is that science proceeds only by falsifiability.  On this model, you should be happy if your hypothesis is not supported, because now you really know something, and you can move forward.  I’m not claiming that this happens universally in daily life in industry–you bet you will run into people who will get pissed if they’re questioned about the approach that they’re taking to something, or if the testers uncover a bug in their implementation, or whatever.  But, underlyingly, in industry you want someone to find your problems–before the customer does.  I may be overthinking the issue with respect to my philosophy of science explanation–Chris Brew, who knows both the academic and the industry computational linguistics and language processing worlds very well and has been a long-term mentor of mine, sees it like this:

I’m not totally clear on the “why” of the bad thing that linguistics graduate school teaches you. There is certainly a mismatch in culture, and humility and listening is rewarded in industrial settings. But I think the central factor is not philosophy of science but willingness to step into the other person’s shoes and awareness that their perfectly valid priorities may not be your priorities.  Laser-like focus on a personal agenda also sucks in academia, but is more likely to be tolerated if you are somehow brilliant and high-status.  “Not a team player” is the standard industry complaint about ex-academics.

Things you learn in industry that will help you in graduate school: see the entire preceding section.  As far as I can tell, picking a position and sticking to it come hell or high water won’t actually help you do better science in academia any more than it will help you build good software in industry.  A little humility about your ideas can go a really long way towards helping you understand whatever it is that your science is about understanding.  There are really practical things that industry will help you with once you get back to graduate school, too (or even if you don’t).  One of these is deadlines.  Industry and academia are both pretty deadline-driven, and that was never apparent to me in graduate school, or at least not until it was too late.  Some time in industry helped me understand the role of deadlines, and also helped me develop methods for making sure that I never missed them–methods that worked for me, at any rate.  Another practical thing is the importance of all of those things that they teach you in software engineering classes–documenting your code, testing your code, defining requirements at the beginning of a project (or having a solid plan for how you’ll do it iteratively throughout the project, if you do something like the Agile method of software development), testing your code, taking usability issues very seriously, and testing your code.  For me, thinking a lot about testing my code led to me thinking a lot about how similar the theoretical bases of software testing are to the theoretical bases of linguistics, and that ultimately led to a bunch of publications on the subject and to me doing my dissertation on approaching software testing as a problem in descriptive linguistics.

Another important lesson to take back to academia from your time in industry is the importance of edge cases and the phenomena in the “long tail.”  You probably know the joke about what happens if you ask a phonetician, a phonologist, and a syntactician if all odd numbers are prime–the phonologist says “one is an odd number and one is prime, 3 is an odd number and 3 is prime, 5 is an odd number and 5 is prime, 7 is an odd number and 7 is prime, 9 is an odd number and–9 is not prime, but if we say that it’s not prime, then we miss the generalization that a lot of odd numbers are prime, so let’s just assume that 9 is prime.”  In linguistics, we tend to like generalizations, and generalizations by their nature tend to cover most of the data points in question, not just a few of the data points in question, or they wouldn’t be generalizations.  The infrequent phenomena that don’t fit our analysis but that don’t seem to be very primary, we tend to leave to the side, at least in non-empirical approaches to linguistics.  This does not fly in the industry world, ever.  You have to take care of every case, and that includes the special cases, and if you need some special code for them, some special ad hoc solution for them, then that’s just the way it is.  I don’t care if your F-measure is 0.98 or your word error rate is not statistically signicantly different from zero–show your product to a potential customer, and the first thing they’re going to do is to ask it to run on their name, or the name of their company, or their birthday, and if that doesn’t work, then you will be shown the door.  I’ve seen this over and over–in industry, you have to account for everything.

It turns out that this is a good attitude to take back to academia with you.  I’m reminded of an anecdote involving the phonologist Michael Broe.  I once saw him give him a talk that was focussed on an analysis of “regularly irregular” forms in some language or another–that is, things that are irregular, but that are irregular in a way that is similar to the way that some other things are irregular.  Think about mouse/mice and louse/lice in English, or the very few French verbs in the ir class that take the same present indicative inflectional morphemes as er verbs, or what have you.  Michael was asked why he was bothering to work on these regularly irregular forms when they’re so uncommon in the language that he was interested in.  I’ve never forgotten his answer: It depends on what you think the goal of phonology is–is the goal of phonology to understand patterns in sound systems, or is the goal of phonology to understand frequent patterns in sound systems?  Many people in my field have some story like this: a physician or a biologist asks you to build a system to do something or other.  You build one, and you have an amazingly high F-measure, or an amazingly low word error rate, or whatever.  You proudly demo your system for the physician or the biologist.  On the entire computer screen, there are tons of correct outputs, and one fucking error.  The physician or biologist points at the error, shrugs, and says, OK–let me know when it works.  If you want your research to actually fix some problems in the world, and you want those solutions to be taken seriously by the people who might actually have a use for them, then you need to think about those edge cases, those exceptions, those rare events–even if taking care of them means sacrificing some purity of theory or some elegance of design.  Those edge cases, “exceptions,” and rare events are perhaps the things that uncover the flaw in your theory, and you should be very happy to come across them.

This little essay started with a very specific question: What’s it like to work in the speech and/or language technology industries with a master’s degree?  I want to generalize it a bit, and answer a more general question: What’s it like to work in industry with a master’s degree?  The short answer to this more general question is the same as the short answer to the more general question: which graduate degree you have doesn’t necessarily make any difference with respect to what it’s like to work in industry.  However, the long answer is somewhat different.  This is probably counter-intuitive to academics, but in industry in general, it can actually be easier to get a job with a master’s degree than with a PhD. Now, I’m talking here specifically about high-tech industries where you’re basically being hired to write computer programs of one sort or another, versus ones where you’re being hired to do things that are closer to the research and development end of the continuum of high-tech jobs.  In these environments, having a master’s degree or not isn’t necessarily considered important, one way or the other–they want to know if you have the technical skills that they need and whether or not they think you’ll be non-painful to work with, and that’s about it.  On the other hand, having a PhD is often not looked on kindly by private companies.  Your potential future co-workers may be–often are, in my experience–quite suspicious of people with doctorates, suspecting that they might be strong on theory but weak on implementation, and–worse–rigid defenders of whatever their position happens to be, unwilling to seriously consider alternate approaches.  I didn’t just make up the phenomena that I describe for several paragraphs above!  Now, this is not true of the speech and/or language technology industries, where there’s a long tradition of the interesting, innovative, and successful work coming from PhDs.  But, in the broader industrial world, the skepticism-about-PhDs phenomenon is widespread.





Getting shot in the leg with a .22 is better than being hit in the head with a 2×4

English has a number of words that are made of numbers. Here are some of them.

No French in this post–this is all about obscure English vocabulary that you can bet Zipf’s Law will bring into your life sooner or later.

I recently wrote a post about what we call in the US 3x5s (pronounced “three by fives”), and that got me thinking about words in English that are formed in similar ways.  There are a number of them, and if you can use them, it will definitely add an American flavor to your English.

  • 2×4 (pronounced “two by four”): a kind of wooden board that measures about 2 inches by four inches, and about six feet in length.  In America, 2x4s are commonly used in the construction of homes and the like.
  • 4×4 (pronounced “four by four”): a kind of truck or similar vehicle that can provide power to all four wheels simultaneously.  (More traditionally, cars would only power the front axle or the back axle.)
  • 8×10 (pronounced “eight by ten”): a particular size of photographic print, measuring eight inches by ten inches.
  • 24/7 (pronounced “twenty-four seven”): absolutely constantly.  24 hours in a day, and seven days in a week, so 24/7 is all the time.
  • 7-11 (pronounced “seven eleven”): originally the name of a convenience store that was once open from 7 AM to 11 PM.  Today you can use it to refer to pretty much any 24-hour convenience store, I think.
  • 69 (pronounced “sixty-nine”): a verb referring to a specific sexual act.  Consider the relationship between the numbers 6 and 9 and you can probably figure it out for yourself, which will save me from feeling like I have to put a trigger warning on a blog post about numbers.
  • soixante-neuf: same thing.  Yes, we can use it in English, and if you’re sleeping with people who are educated enough to know what it means, then you probably already know what it means yourself.
  • 10-4 (pronounced “ten four”): I heard you, I got your message; there’s also some implication that you agree.  Often heard in the contexts 10-4, good buddy, or that’s a big 10-4, or, if you wear a cap with the name of a feed company on the front, or just listened to a lot of AM radio in the 1970s, that’s a big 10-4, good buddy.  That’s how we said it when I was a little tyke, at any rate.  OK: a teenager.
  • .45 (pronounced “forty-five”): a kind of pistol, known for its “stopping power”–that is, if someone is charging you, a shot from one of these things will keep them from moving forward if it hits them.  The projectiles are short, but very big around, and heavy.  It’s not very accurate at a long distance, but at a short distance, it’s very effective at what it’s intended for.
  • .32 (pronounced “thirty-two”): another kind of pistol.  They’re also not very accurate, as they usually have a pretty short barrel, and they really don’t have any use other than killing people at close range, as far as I know.
  • .36 (pronounced “thirty-six”): another kind of pistol.  They sometimes have longer barrels, in which case you can use them to kill people at close range, and also a bit further away.
  • .25 (pronounced “twenty-five”): another kind of pistol.  I don’t recall ever actually seeing one.
  • .22 (pronounced “twenty-two”): another kind of pistol, and also a small-caliber rifle.  The bullet is quite small, and unless you get shot in some place really vital–head, heart, an artery–it may not do that much damage.  On the other hand, I did once see a young guy who shot himself in the head with one of these–he didn’t die immediately, but he sure as hell died eventually.  Again, there is not much that’s actually useful about these things…

You can use the number-by-number construction productively (in the linguistic sense of the word productive, which means that the construction can be used to produce new things) to talk about the sizes of pieces of wood in general.  However, if you’re not talking about 2x4s specifically, the context needs to be clear if you want to be understood.  The assumption is that the pieces of wood in question will be 6 feet long unless otherwise specified, so if you ask for a 3×6 (pronounced “three by six”) in a lumber yard, people will know what size board to give you.   For a version of this kind of construction that is also productive, although quite obscure, see this entry from the Urban Dictionary for a description of how it is used to refer to pairs of male characters in the Gundam Wing anime series.

Having gotten the basics out of the way, here’s a useful expression: to get/be hit with a 2×4.  You know what a 2×4 is by now–a solid wooden board.  If someone smacks you upside the head with it, you will have been smacked really, really hard.  To get/be hit with a 2×4 means to be stunned by something that you’ve learnt.

Here are some real-life examples.  This woman wrote on her blog about needing to be forced to face the facts that she was (a) eating too much, and (b) not exercising enough:

Screenshot 2016-03-29 08.41.57
Picture source: screen shot of

This blog post suggests that if you “get hit with a 2×4,” the answer is to just surrender to whatever God’s wishes for you might happen to be:

Screenshot 2016-03-29 08.48.32
Picture source: screen shot of×4/7_8_13-what-to-do-when-you-get-hit-with-a-2×4/.

Here’s a story from the Washington Post (a reputable and very famous American newspaper known for its coverage of national politics) about Chris Christie, describing the experience of the explosion of the Bridgegate scandal during the period before his unsuccessful run for the Republican presidential nomination:

Screenshot 2016-03-29 08.51.53
Picture source: screen shot of×4/.

Here, a bigwig of the investment world talks about the effects of getting two pieces of bad news about the financial world, one right after the other:

Screenshot 2016-03-29 08.57.18
Picture source: screen shot of

So, now you see why it would, in fact, probably be better to get shot in the leg with one of those tiny little .22s than to get hit in the head with a 2×4.  Native speakers of English (or non-native speakers who just like to collect funny words), do you have any other all-number words to add to the list?

Full-screen and coffee breaks: conferences in France

There is essentially nothing that I do in France that doesn’t involve an encounter with Zipf’s Law.  One thing that I find quite useful here in France is to go to talks, conferences, and what are called journées–literally “days,” but in practice, a day-long mini-conference on some subject or another.  It’s a good way to learn the technical vocabulary of my field in French, and also to have casual conversations with my peers about it.  The other day, I went to a journée on natural language processing (what I do for a living) and artificial intelligence.

As far as I can tell, French researchers (at least in my field) primarily publish in English.  My field is much more oriented around conference papers than around journal papers–our conferences are peer-reviewed and often quite competitive, while our journals are more oriented towards essentially archival coverage of long-term research projects.  So, the latest and greatest research shows up in conferences, not journals.  The conference papers are published, and they’re cited quite a bit more than journal articles.  Although my French colleagues do primarily publish in English, there are also French conferences and journals in my field.  The French conferences and journals ask for papers written in French from Francophones, but allow non-Francophone scientists to submit work in English.  Being able to read some French has opened up quite a bit of stuff to me that I wouldn’t otherwise have been able to read (and cite).  I especially enjoy some of the work in French on lexical semantics; it isn’t necessarily any different in terms of topics, approaches, or the flavor of the results, but some of it is written so much more clearly than similar stuff that I’ve read in English.

One thing that still surprises me about French conferences is that during the question-and-answer period after a talk, the speaker and members of the audience address each other as tu, using the informal pronoun.  You can read more about this  phenomenon of French conference participation here, along with some speculations about where it comes from.

For official purposes, the French system often differentiates between French conferences and what the paperwork refers to as “international” conferences, which in practice seems to mean any conference outside of France.  (That’s not obvious–for example, a conference in Germany, attended primarily by a local audience, apparently would count the same as a conference like the Association for Computational Linguistics annual meeting, which is attended by people from all over the world.  I suppose that evidence of an international reputation is, indeed, supplied by presentation of your work anywhere outside of your home country.)

Just following the schedule gave me trouble, which doesn’t exactly make me feel bright.  Here are some really basic words that I came across in the course of the day:

  • la pause-café: this is what gives as the translation of “coffee break.”
  • plein écran: full screen.

How to sound French: March 2016 edition

This winter, the expression on everyone’s lips seems to be pas mal.  We learn this in college as meaning not bad.  However, colloquially it can also mean something like a lot.  In fact, I’ve been hearing it quite a bit even from a friend who prefers the 17th-century language of Molière to the language of today and doesn’t speak very colloquially at all.  You can find good examples of how to use pas mal in this sense on Laura Lawless’s Lawless French web site.  Here are some more:

Screenshot 2016-03-26 01.47.21
“I saw Manon this morning and we talked a lot it’s cool” Picture source: Twitter screen shot.

If you use pas mal to modify nouns, it’s a quantity term, and is followed by de:

Screenshot 2016-03-26 01.50.15
“I think that we have a lot of things in common” Picture source: Twitter screen shot.

Be aware that even the expression as we learn it in school–that is, with the meaning not bad–can be difficult to understand, with the intended meaning being conveyed in part by intonation.  There’s a very nice video on the intonational subtleties of the expression here, on the Français avec Pierre YouTube channel.

Spleens, 3x5s, Molière, and French grad students

2016-03-24 07.16.50
3x5s or index cards. The name “3×5” comes from their size, which is 3 inches by 5 inches.

My morning routine includes studying French vocabulary, which means flash cards.  I make my flash cards from what we call in English index cards or 3x5s (pronounced “three by fives”–they take that name from the fact that they normally measure 3 inches by 5 inches).  Recently I’ve been amusing my younger coworkers by sharing my current vocabulary flash cards, and I have been impressed beyond belief by the breadth and depth of the English vocabulary that these kids have.  “Talon”?  No problem.  “Greenhouse”?  They’ve got it.  Yesterday I ran into the word rate, “spleen,” in the play Le malade imaginaire, a 17th-century French play by Molière.  One of them explained to me the various and sundry forms with which the English word “spleen” can be translated into French.  The word has at least five meanings in English.  The most common meaning is the internal organ that most vertebrates have, located on the left side in humans near the stomach and playing a role in a variety of processes, including ridding the body of old red blood cells and being involved in the immune response.  The other meaning, which is not nearly as common but is still found in the language, is given by Merriam and Webster as “feelings of anger or ill will often suppressed.”  These get split into two different words in French.

  • la rate: spleen (the internal organ).
  • le spleen: melancholy or ennui, and archaically, the same anger-related meaning as in English.
The spleen, or la rate. Picture source: Public Domain,

It was quite impressive to hear a computer scientist explain the 17th-century meaning of the word–that’s not something that I would expect an American computer science grad student to be able to do for the English equivalent.  I’ve been reading Molière, and apparently she has done so, too–again, I wouldn’t expect an American computer science grad student to be familiar with Shakespearean vocabulary.

It amazes me that there seems to be no French equivalent to the 3×5.  (I saw friends exchange the sidelong glances that I inspire so often here by accidentally saying inappropriate things when I referred to them by the Canadian term, fiches vierges–it can mean “blank cards,” but also “virgin cards.”)  I only survived my education by using these things to obsessively memorize pretty much every term, equation, and random fact that I was taught.  Considering the very demanding nature of the French educational system, I’m baffled by how French students manage to pass the  exams that are required to progress through the system without some equivalent of index cards.  I throw several packs of them into my luggage every time that I come to France, and can’t imagine learning as much as I do without them.

No clever title about the bombings in Brussels, but here’s some relevant vocabulary

Scene from Zaventem airport in Brussels after the bombings of March 22, 2016. There is some concern that with the security in restricted areas of airports as high as it is now, terrorists will now start attacking the public areas, as they did today. Picture source:

The radio show that I listen to in the mornings (Les matins de France culture) starts with the various and sundry reporters going around and saying a few words about what they’ll be talking about.  Yesterday one of the reporters said this: I’ll be talking about the attacks in, um…in, um…well, there are so many of them.  [Nervous chuckle.]  This morning I woke up to the news of the latest attacks in Brussels: two bombings at the airport, then one at a metro station.  All major European capitals are on heightened security at the moment, especially at transit points, but other than that, planes are flying (except to and from Brussels), the trains are moving (except in Brussels, where all subways, busses, and trams are shut down, and people are being advised to stay at home), etc.

It’s depressing to note that I now know most of the words in any given story about a terrorist attack.  However, Zipf’s Law never really goes away, so here are some words from stories about this morning’s bombings.  For the full story from the source, click here.

  • survenir: to occur, to arise.

 Une explosion survenue dans le métro, à la station Maelbeek, aurait fait une dizaine de morts et de nombreux blessés.

“An explosion that took place in the metro, at the Maelbeek station, has caused a dozen deaths and numerous injured.”

  • réaffirmer: to reaffirm.

Nos pensées vont naturellement aux victimes, à leurs proches ainsi qu’à l’ensemble des autorités belges auxquelles nous réaffirmons notre solidarité.

“Naturally our thoughts are with the victims, with their dear ones, and we reaffirm our solidarity to all of the Belgian authorities.”

  • faire le point sur: to take stock of, to review.

Nous venons de faire le point sur notre dispositif en place aux frontières et dans les transports.

“We have just reviewed our security team in place at the frontiers and in the means of transport.”

  • rehausser: to raise, to boost.

Nous n’avions pas attendu cette attaque pour réhausser notre niveau de sécurité.

“We didn’t wait for this attack to boost our level of security.”  Note: the news story spells the word réhausser, but gives it as rehausser.





How we’re sounding stupid today: #JeSuisCirconflex

2016-03-21 08.41.55
“The last day of the fast.” Picture source: picture by me of an advertising poster in the train station.

The military has this problem.  People transfer from one “duty station” to another fairly often, and you need to be able to get them integrated quickly–you can’t have someone taking up unproductive space on a ship or on a base for very long.  The US military has gotten this integration process down to a science.  Basically, when you show up at a new command, you’re given a check-sheet.  You take it around to various and sundry places–the medical clinic, the pay clerk, the base library, etc.  The people who work there do whatever has to be done to get you integrated into the unit.  They sign your check-sheet, and you go on to the next place.  It takes maybe two days to get totally set, and then you’re productive.

The place where I work when I’m in France has a similar system.  By now, je connais déjà cette musique–I know the drill–and I can usually get all of my administrative stuff done the first day back in the lab.  There’s only one problem: I have to successfully pick up the check-sheet.  The issue is that it’s called a feuille jaune–a “yellow piece of paper” (it is indeed a piece of paper, and it is indeed yellow), and I constantly mess up and ask the administrator for a feuille jeune.  Only a one-vowel difference, but it means “young piece of paper,” not “yellow piece of paper.”  This gets me confused looks, or by now a smile.  I was reminded of just how ambiguous this really is on the way to work this morning, when I saw the poster that you can see at the top of this post.  What’s interesting about it is the word jeûne, which is pronounced the same as the word jeune, but spelt differently–notice the circumflex accent in the former.  As I said, they’re pronounced the same, but jeune (no accent) is “young” or “youngster,” while jeûne (with accented û) is a fast (that is, when you don’t eat).  The pair of words has been much in the news lately.  The issue here is that the French government will be instituting a spelling reform at the beginning of the next school year.  Among other derangements of the current system, some words with circumflex accents will be losing them.  There is a major Twitterstorm about this.  One funny tweet that I read pointed out that the circumflex accent on jeûne (a fast) is the only difference between je vais me faire un petit jeûne (I’m going to take a little fast) and je vais me faire un petit jeune (I’m going to have myself a little youngster.)

Now, “fast” is a perfect Zipf’s Law sort of word: it certainly is not common, but it also is certainly not particularly weird in any way–any native speaker knows it.  I had never run into jeûne before the Twitterverse went crazy about the spelling reform, and in fact, that’s how I learnt it.  Now it’s a couple months later, and there it is: right there in my face as I went to work this morning.  Zipf’s Law!

  • jaune (adj.): yellow.
  • le jaune: scab, strikebreaker.
  • le/la jeune: young person.
  • le jeûne: fast, fasting.


Fun (and possibly useful) facts about the Paris metro

2014-07-08 10.34.50
Musician playing an interesting variant of the violin on the RER B train to Roissy. Picture source: me.

In a previous post, we saw some basic principles of politeness on the Paris métro.  Here are some other fun facts about the subway experience in the City of Lights.

  • There’s a whole genre of panhandling that takes place on the métro and on Paris-area trains.  Someone will get into the car and declaim a speech.  “Declaim” might not be the best verb here, as it’s typically done in something of a monotone.  It typically begins something like this: Mesdames et messieurs, je suis désolé de vous déranger pendant votre trajet.  “Ladies and gentlemen, I am sorry to disturb you during your journey.”  It then proceeds along the following lines:
    1. I am homeless/I have 5 children/I have lost my job and am unemployed.
    2. I need a place to stay tonight/food for my children/money to pay my rent.
    3. I would appreciate money/lunch tickets/a few sous.
  • The mendicant then walks through the car holding out his hand (it’s primarily men that do this) or a paper coffee cup.  If it’s a young street person carrying a backpack and a leather jacket, he’s probably not going to get much.  If it’s an old person, a few people will give him money.  Then he gets off at the next stop and gets on another car.  Linguistic note:  it is taken me over a year and a half to be able to understand one of these guys.
  • Musicians on the subway and on local trains are a real thing, not just a cute cliche.  Sometimes they will be playing the accordion that we’ve all seen a thousand times in the movies, but they also may sing, or play any of a variety of musical instruments.  You’ll see this not just in touristy areas, but even on the train that I take out to the suburbs with tons of other people on their way to work or to the various and sundry universities south of Paris.  I would guess that they are organized in some way, as the majority of them show up with the same loudspeaker and canned background music.  (See above for one of the more bizarre things that I’ve seen in this respect.) I encourage you to give these folks some money–they’re out there working for a living, and they’re one of the things that gives Paris its usually-wonderful ambience.  In addition, it sometimes takes some courage for these folks to do what they do, e.g. the guy who plays the recorder in one of the stations, frequently including Hatikvah (the Israeli national anthem) in his repertoire.  This is risky in a city in which anti-Jewish violence has recently been pretty severe.  One of my favorites is a guy who often sits in the Cluny-La Sorbonne station, west-bound quai, on Saturdays.  He alternates between playing the er-hu, playing the flute, and singing truly impressive Chinese songs.
  • There’s usually an excellent map of the immediate neighborhood somewhere in a métro station.  However, its location is somewhat unpredictable.  Sometimes you’ll find it on the actual subway platform, and sometimes outside of the turnstiles.  Check before you leave the platform; if you don’t find it there, try again once you exit.
  • In his book Five nights in Paris: after dark in the City of Light, John  Baxter claims that massive amounts of perfume are pumped into the métro system on a daily basis.  I can’t say that I’ve ever smelt it, but I did find a couple of articles from 1998 that are consistent with the claim.  (Here’s one.  Here’s another.)
  • It’s said that within Paris, you’re usually not more than 10 minutes’ walking distance from a metro station.  This is probably true.  If you’re near a metro station with multiple lines, you’re probably about 30 minutes door-to-door from any place in Paris.  Add a bit if you’re near a metro station that only serves one line.

Paris metro etiquette

If you follow Parisian rules of etiquette on the metro, your visit will go more smoothly. Here’s how to do it.

enleve son sac a dos
“He who travels with his back loaded removes his back pack in order to be less bothersome.” Picture source: photo that I took of a sign in a metro station.

If you look on question-answering web sites like Quora, or even just do a quick Google search, you’ll see many people asking this question: Why do Parisians hate tourists?  The answer: Parisians do not hate tourists.  On the contrary–Paris is very aware that tourists are part of the life-blood of the city, and they are happy to have foreign visitors in droves.

However: there are definitely things that tourists do that can interfere with the flow of your daily life here, and those things can be irritating.  A lot of those things happen on the métro, the Paris subway system.  Every morning, 2.5 million people get on that thing for their commute to work, and then they do it again in the evening.  In hopes of preventing you from being one of those tourists who irritate the locals, here are some notes on metro etiquette.  You’ll note here many instances of what I understand to be the basic principle of French behavior: don’t inconvenience the other guy.

  • Entering/exiting: When you’re waiting to enter a subway car or train, stand off to the side of the door so that you’re not blocking it.  People will exit through the center, and then you enter at the side, or through the center if it’s not obstructed.  (This is a really common rule for tourists to break, and it’s really irritating during rush hour when lots of people are trying to get off and into the cars.  Don’t be that tourist.)
  • Hold gates open: When you go through the gate, hold it open behind yourself for the next guy.  You don’t have to stand there and wait for him, but if there’s someone right behind you, this is the polite thing to do.
  • Luggage: Be considerate about trying to wrestle your big, bulky suitcase through the turnstile when there are lots of people trying to go through it–wait until traffic lets up, so that you’re not keeping everyone else from getting to their train.  Some stations also have a space next to the turnstile for luggage, so look for those.  Also note that you shouldn’t be taking your @#$% luggage on the metro anyway–see this post for reasons why that’s a bad idea.
  • Strapontins: Subway cars in Paris typically have a couple of folding seats right next to the doors, called strapontins, believe it or not (it’s the general word for a folding chair).  If the car is crowded, don’t sit in them–you will see French people who are using them stand up when a bunch of people enter the car.
  • Be quiet: If you hear someone talking or laughing loudly on the metro, they will probably be speaking English.  If you hear someone speaking on a cell phone, they will probably be a foreigner.  In general, Parisians tend to be quiet on public transportation.  See here for a funny story about what can happen if you’re not.
  • Backpacks off: If a car is crowded, take off your backback–even if it’s a little one.  If you don’t, it’s super-awkward, both for you and for everyone else.
  • Offer your seat to pregnant/elderly people: Even kids from bad neighborhoods will offer their seats to an old person or a pregnant woman on the métro.  You should do the same.
  • Say pardon: You will often have to squeeze by a few people to get on or off of a crowded car.  The polite thing to do is to say pardon when you do so.
  • Don’t block the quai, stairs, or escalators–move to the right.  If you’re walking along the subway platform or going up/down stairs and escalators, stay to the right.  People will pass you on the left.  Similarly, if you need to stop and figure out which way to go, don’t stop in the middle of heavy foot-traffic–get out of the way while you get your bearings.
  •  le strapontin: folding seat, jump seat.
  • le quai: platform, dock, quay.
  • le wagon: subway car, train car.


Don’t take the train from the airport to Paris

Any guidebook will tell you that you can take the train from the airport to Paris. What they don’t tell you is that you can–BUT, YOU SHOULDN’T.

Man descending stairs, Abbesses Metro Station, line 12, Paris, France
Stairs at the Abbesses metro station. Picture source:

I happened to be on the metro on the way home from work during rush hour yesterday.  Onto the packed train climbed a young woman wearing an enormous backpack.  Her travelling companion was similarly encumbered, and was also carrying an enormous, ornately embroidered, blue velvet sombrero.

Their backpacks took up at least as much space as two additional people would have.  Furthermore, since they were wearing them on their backs, there was no way that they could move without smacking someone with 65 liters’ worth of stuff, and no way that they could maneuver out of anyone else’s way in those tight quarters.

You didn’t even have to be able to hear them to know what their nationality was–just by watching their mouths, it was pretty clear that they were Americans.  (Yes, the mouths of American English speakers move quite differently from the mouths of French speakers.)  They had probably read their Paris guide books closely–and they had been lead astray by them.

Any Paris guidebook will tell you that you can get from Charles de Gaulle Airport (or Roissy, as the locals call it) by taking the train into Paris, and then switching onto the subway.  This is certainly true.  What the guidebooks don’t tell you is that even though you can, you should not do this. 

There are two basic problems with the take-the-train-to-the-subway plan, and I see those two problems raise their ugly heads all the time.

  1. There are a heck of a lot of stairs in some of those stations.  I can’t tell you how often I have come across someone trying to struggle up–or down–a long flight of stairs in a Parisian metro station with a huge suitcase.  (Oh, there really ARE nice people in the world, said one old lady with an absolutely enormous suitcase who I found almost in tears at the top of a loooong flight of stairs in a metro station.  If I hadn’t recently been training to fight in Nationals, I don’t think that I could have carried that big honking thing down the stairs, either.)  Even if your plan is to take the train to someplace from where you can catch a taxi, versus transferring to a subway, you are not going to escape the stairs in the train station.
  2. Trains and metro cars can both be absolutely packed with people.  Want to get stared at with deep dislike?  Try to squeeze your suitcase with a week’s worth of vacation wear into a subway wagon filled with people jammed [trying to think of a non-vulgar way to put this] together like sardines already.  Enjoy the welcoming looks of all of the people on their way to/from work as they try to squeeze around your giant suitcase, which is now mostly blocking the door.  Roll your 49-pound suitcase over the toes of some nice Parisian woman’s $835 asymmetric-strap Manolo Blahniks.  Or, roll your 49-pound suitcase over the toes of some nice Parisian woman’s $10 Converse knock-offs after she’s just spent all day on her feet at her job as a cashier.  Or…well, the possibilities for pissing off people on a crowded train or metro car are endless, really.

Is that really how you want to start (or end) your vacation?  Probably not.  I recommend that instead, you spring for a taxi.  Due to a recently-passed law, the price of taxi rides from either Paris airport into Paris proper is fixed: 50 euros to the Right Bank, 55 euros to the Left Bank.  You get in the taxi, the total price shows up on the meter, and that’s it.  After a looong flight across the Atlantic, this is the only civilized way to start your Parisian adventure.

I ate my dinner last night while mulling over what the heck that kid could possibly have been doing with that giant sombrero.  Using it to cover his eyes while he slept on the plane?  Using it to cover his entire body while he slept on the plane?  Bringing it as a present for some unsuspecting Parisian who couldn’t possibly have enough room in their tiny Parisian apartment for a Sombrero of Unusual Size?  Hard to say…

  • fourvoyer: to mislead; to lead astray.  The American tourists with their giant backpacks (and their giant sombrero) were led astray by their guide books.
  • se fourvoyer: to be mistaken, to get something completely wrong; to get lost, to stray from one’s path.  I do this pretty much all the time.
Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues