Becoming a computational linguist without double-majoring in linguistics and computer science

You’re an undergraduate, and you want to become a computational linguist? Here’s how to do it.

People who want to become computational linguists usually get a PhD in the subject.  Every once in a while, though, you run into someone who wants to study computational linguistics as an undergraduate.  In the United States, that means a student in what we call “college” and the rest of you call “university” (or, if you’re French, la fac’).  Undergraduate students in the US have one, and sometimes two, “majors”–the topic in which they will do the most coursework, and whose name will appear on their official paperwork when they graduate.  To “double-major” is to have two majors, rather than the usual one.  It’s not super-unusual to do this–I had a double major, in English and linguistics–but, it’s helpful to do a double major only if really necessary, as it’s a hell of a lot of work. 

If you’re getting a bachelor’s degree and want to be a computational linguist, a double major in computer science and linguistics is probably overkill.  (Overkill discussed in the English notes below.)  The most efficient way to become a computational linguist would be to get a degree in linguistics in a department that has computational linguists on the faculty, such as the University of Colorado at Boulder, or Ohio State University. If you want to try to become a computational linguist in a university that doesn’t have computational linguists in any department: first of all, your major should probably be linguistics, not computer science—computational linguists are a kind of linguist, right? (They are—I’m a computational linguist, and I’m a linguist.) You’ll want to do some coursework in the computer science department, but I wouldn’t actually recommend even a minor in computer science—that will probably require you to take some courses that won’t be the most useful ones for you, while taking up time that you could have been using to take courses that would be useful for you.

What should those courses be?  As many as possible from this list:

  • Corpus linguistics (usually offered in the linguistics department, but if your university doesn’t have such a course in the linguistics department, look for courses in the social science, communications, or media departments, possibly with names like “content analysis”)
  • Statistics (best in a linguistics or speech & hearing department–the traditional psychology department or agriculture school courses will kill you)
  • Machine learning (usually offered in a computer science department)
  • Natural language processing (presumably not what you meant by “computational linguistics,” or you would have said so)
  • Automatic speech recognition, if and only if you seriously think that you want to work in this area (often offered in the electrical engineering department)
  • Speech synthesis, if and only if you seriously think that you want to work in this area (again, often offered in the electrical engineering department)

Notice what’s not on this list: programming courses.  Take those if you know that you need them, but if you don’t know that you need them, then don’t take them.  Notice that I also haven’t said anything about linguistics courses: we’re assuming here that linguistics is your major, and you’re going to get a solid and well-rounded background in that field.

English notes:

overkill: doing way too much.  Examples:

How I used it in the post: If you’re getting a bachelor’s degree and want to be a computational linguist, a double major in computer science and linguistics is probably overkill. 


You won't learn to speak another language unless…

You won’t learn to speak another language unless you’re willing to do two things: memorize a LOT of vocabulary, and…

You won’t learn another language unless you’re willing to do these two things:

  1. Develop the sitzfleisch to memorize a lot of vocabulary.
  2. Make a fool of yourself–over, and over, and over again.

The only known predictor of success in learning a second language is this: motivation.  There are lots of things that you have no control over whatsoever that can tip the odds in your favor a little bit–already being bilingual, having had exposure to native languages other than your own in childhood, being quite young when you begin–but, in the end, the only thing with enough of an effect to be predictive is having sufficient motivation.

What do you do with that motivation?  Everyone who’s successful at second language acquisition develops their own tricks.  But, there are two things that are essential–without them, it’s just not going to happen.  You must use your motivation to make yourself do two things:

A graph showing the Zipfian distribution of words in 30 languages on a log-log scale. Source: Wikipedia,
  1. Memorize an enormous amount of vocabulary.  Knowing a sufficient amount of the grammar of the language is necessary, but it’s having lexical items (words) to plug into those grammatical structures that makes the difference between being able to function in the language, or not.  And: if you’ve been following this blog for a while, you know that it’s a very basic fact about the statistics of language that you need to memorize not just the common words, but an enormous amount of rare words, too–because about 50% of the words that you will run into on any given day are going to be statistically of very low frequency.  (That’s the Zipf’s Law in the title of this blog.)
  2. You have to be able to tolerate feeling like an idiot.  Specifically, you must use your motivation to force yourself to take the opportunities that you get to practice the language that you’re trying to learn.

I happen to know from experience that you can feel really stupid without (so far, at least, in my case) dying from it.

This month finds me in Wuhan, a city in roughly central China of 10 million inhabitants, essentially none of whom are occidentals.  There’s a charming aspect to this–people will literally ask me to pose for pictures with their children.  Fat, bald old me.  There being no occidentals here to speak of, the people in stores, restaurants, etc. rarely speak English, so studying Mandarin (or the local dialect, which is not mutually intelligible with Mandarin) is a necessity.

If you have not tried to live with just a tiny bit of a language, you might be surprised how little you can get by with.  For example, a couple days ago I had my first Mandarin conversation.  It involved dropping off my laundry, and went like this:

Me: Míng tiān ma? (Tomorrow?)

Nice laundry lady: Míng tiān.  (Tomorrow.)

Now, after a couple weeks of me struggling to communicate in Mandarin when dropping off my laundry, the laundresses du coin are a lot less nervous about dealing with a hairy barbarian and have progressed to giggling and trying to teach me new words.  Strangers: a different story.

Today I’m sunning on a terrace with a cup of coffee (a luxury here–that cup of coffee cost more than the large, delicious, and healthy meal that I had just eaten) when I notice a couple girls adjusting, readjusting, and re-readjusting their…berets.  Not a huge shocker, as the stereotypical Parisian tourist is now Chinese, but still: they were looking at each other’s berets, then looking at their own berets using their cell phone cameras in lieu of mirrors (yes, in lieu of is English), then looking at each other’s berets, then touching up their lipstick, and then starting all over again.

Obviously I needed a picture of this, but how to get it?  I mean, it’s not like you can go around taking photos of women you don’t know without risking an ass-kicking.  Ah–but, at 56, I am totally accustomed to making a fool of myself.  We have the following (one-sided) conversation:

Me (fat old bald guy, remember): Duì bu qǐ (‘excuse me’).  Patting myself on the chest: wǒ shì fǎ guó rén (‘I am French’–not true, but I am of French descent, so…).  Pointing at each of their hats: fǎ guó, fǎ guó (‘France, France’).  Then I mime taking a picture with my camera.

Them: Speaking to each other for a while, then looking at me like I’m insane, or an idiot, or both.

Me: Patting myself on the chest: wǒ fǎ guó rén (‘I am French’).  Pointing at each of their hats: fǎ guó, fǎ guó (‘France, France’).  Then I mime taking a picture with my camera.  (Yes, this is exactly what I said the first time.)

Them: Talking to each other again for a while, then they shrug at each other–and pose for a couple pictures.  

Me: Xiè xie (‘thank you’).

Them: Walking away in silence.  Do I want to know what they’re thinking?  Definitely not.

Now, bear in mind: the purpose of my relating this attempt at conversation is not to brag about how great my Mandarin is.  The opposite–the point is how bad (nonexistent, really) my Mandarin is.  And yet: I know that…

  1. …if I’m willing to keep making a fool out of myself, I might actually get comfortable with the language (in, say, SEVERAL YEARS), and…
  2. …if I’m not willing to keep making a fool out of myself, I will never get comfortable with the language, and…
  3. …making a fool out of myself did not kill me.  Embarrassing?  Yes.  Fatally so?  No.
Two beret-wearing Huazhong Agricultural University students being very tolerant of a fat old bald guy outside the Luckin Coffee cafe. Picture source: me.

So, the next time you’re trying to work up the courage to practice your language of choice, just remember this: at least you’re not a fat old bald guy like that funny-sounding Zipf fella.

English notes

sitzfleisch: The perseverance to just sit and plug along at a task.  I learned it from my master’s thesis advisor, who often pointed out that two hours in the library can save you four months in the lab–suddenly the word has become popular.  I have no clue why.

a couple: ‘a couple of.’  This is one of those things that other native speakers give me shit for saying.  What can I tell you–like my hero Tonya Harding, I’m Oregonian trailer trash.  And, yes–you should go see the movie.  It’s really good.

I am the walrus, Part I

Let’s do the obvious thing: talk about French vocabulary related to walruses.

It’s 4 AM where I am, and I’m awake and definitely not getting back to sleep, and for the first time in several weeks I have no looming deadlines, so let’s do the obvious thing: talk about French vocabulary related to walruses.

le morse: walrus

First of all: what are they?  From Wikipédia:

Le morse (Odobenus rosmarus) est une espècede grands mammifères marins, unique représentant actuel de son genreOdobenus, ainsi que de sa famille, celle des Odobenidae.

  • le mammifère: mammal.
  • le représentant/la représentante: representative.

Marine mammals (mammifères marins) are anatomically unusual for a number of reasons, one of which is their teeth: in general, they tend to be homodonts, meaning that their teeth are all of the same kind.  Walruses have their tusks, which are very different from the rest, but the rest of their teeth are pretty much undifferentiated.  Here’s a photo of a walrus mandible–note that the teeth are all pretty similar:

Source: Mike Peel,

Here’s a nice in-your-face photo of the dentition of a more familiar marine mammal, the dolphin–note that they’re all the same:


…and another marine mammal, the orca or killer whale (go ahead and try to find a better picture than this of orca teeth without spending 15 minutes plowing through memorabilia of the movie Jaws–go ahead, I dare you…). Like the dolphin, this fellow is a total homodont–all of his teeth are the same:

Orca skull. Source:

…and compare those with the teeth of some non-marine mammals. Your garden-variety mammal is a heterodont, and has up to four kinds of specialized teeth: incisors, canines, premolars, and molars.


So, you compare a morse to your typical mammifère marin and they look well-endowed in the tooth variety department, but compare ’em to a primate or a feline and they look pretty impoverished.  And what are those tusks (défenses) for?

On a longtemps supposé que ces défenses étaient utilisées pour déterrer les proies des fonds marins. Mais l’étude de l’abrasion des défenses indique que celles-ci traînent simplement dans les sédiments lorsque le bord supérieur du museau est utilisé pour creuser, et qu’elles ne s’usent alors que dans leur partie supérieure28. Les individus aux défenses cassées peuvent donc continuer à s’alimenter23.

  • déterrer: to extract, unearth, dig up; to exhume.
  • le fond marin: seafloor.
  • traîner: a verb that never fails to fuck me up… I think that in this case it’s the sense of dragging (Je traîne la table dans la pièce voisine, or of hanging down to a lower level (Les rideaux traînent sur le sol de la salle,  I have a lot of trouble with traîner, which I associate always and only with what you should not do when there are zombies around (Traînez pas, y’a des zombies partout (sorry if the French is wrong–I just made that up).).
  • le sédiment: …just ’cause I didn’t know about the accent, nor the gender.
  • creuser: another one of those verbs that has a thousand senses.  I think that this is the one that WordReference gives as “to dig,” although I think that it might be closer to to furrow.  Do you creuser a hole, or something longer in one direction than the other, like a sillon, or a creux, or a fossé? Native speakers?
  • s’user: …because this verb is so confusing for us poor anglophones: it means to get worn out, worn down, worn thin.
  • s’alimenter: …just ’cause it’s such a pretty verb, and I wanna remind myself to use it.

…and with that, it’s 5:20 AM, and my sleep deprivation is nearing psychosis-level, and I’m definitely not getting back to sleep, and my sleep deprivation is nearing psychosis-level, and I couldn’t get the pictures of walrus-calf teeth to upload (they have deciduous (“milk”) teeth, which makes for a very confusing picture, and how the fuck do you say “milk teeth” in French?), and my sleep deprivation is nearing psychosis-level, and we haven’t even gotten around to the walrus’s wrist structure, and my sleep deprivation is nearing psychosis-level, and je laisse à part les fièvres et les pleurésies, et…

Comment parler à un alien ?

Aliens land. How do you communicate with them? Read this book on language and linguistics in science fiction by Roland Lehoucq.

I got this message this morning via an email list for francophone specialists in natural language processing, the use of computers to do things with language.  If you’re a regular reader of this blog, you’ll probably find it interesting, and it has some grammatical constructions and vocabulary items that I don’t understand, so if you’re an anglophone reader, you might learn something from it, as I did… I’ve interspersed my comments with the text of the email, and the vocabulary notes show up at the end of the post, after the email.

 Date: Thu, 18 Oct 2018 11:25:52 +0200
From: Frederic Landragin <>
Message-ID: <>

Chers collègues,Le livre “Comment parler à un alien ? Langage et linguistique dans la
science-fiction” vient de paraître aux éditions du Bélial’, dans la
collection de vulgarisation “Parallaxe”, dirigée par Roland Lehoucq.

Is the family name Lehoucq composed of le + houcq? Not as far as I can tell—I haven’t found dictionary entries for houcq, houc, or houq.  If it is, indeed, so composed, apparently the h of houcq was an h aspiré, or we would see l’houcq, right??

Imaginez : les extraterrestres sont là ! Sur Terre. À côté de chez
vous… Et d’emblée se pose la question cruciale qui accompagne
l’extraordinaire événement : comment leur parler ? Comment s’en faire
comprendre ? Le langage sera sans doute d’une importance cruciale. La
science-fiction, domaine réflexif par essence, l’a compris depuis ses
origines et en a fait l’un de ses sujets de prédilection, tant au cinéma
qu’en littérature, de “Babel 17” à “Premier Contact”, de
“L’Enchâssement” aux “Langages de Pao”.

This paragraph contains lots of instances of that pronimal bugaboo of us anglophones, en. S’en faire comprendre: where does that en come from?  Is it an anaphor for “by them”?  Native speakers?  The en of La science fiction…en a fait l’un de ses sujets de predilection seems straightforward-ish: I think it refers back to le langage in the preceding sentence.  (By the way: most computer programs for “resolving” anaphora would get this one wrong, basically because they typically don’t look as far back as the beginning of a preceding sentence, or if they do, they tend to prefer to guess that the referent is at the end of the preceding sentence, if there is a candidate (in this case, une importance cruciale) at the end of the preceding sentence as well as one at the beginning. 

Sommaire :
– Avant-propos
– Introduction
– Chapitre 1 : De la science-fiction à la linguistique-fiction
– Chapitre 2 : Origine et évolution des langues naturelles
– Chapitre 3 : Des langues artificielles, mais pour quoi faire
– Chapitre 4 : Les éléments constitutifs d’une langue
– Chapitre 5 : Premier contact avec des extraterrestres
– Anticipons !
– Notes,  – Bibliographie

What does pour quoi faire mean in the title of Chapter 3?  I have no idea.  If it’s “why make artificial languages,” wouldn’t that be pourquoi en faire ? As I said: en really screws up us anglophones…

La collection : la parallaxe est un changement de perception de notre
environnement dû à un changement de point de vue. En utilisant le
“cognitive estrangement”, la science-fiction observe notre monde sous un
angle différent et l’interroge. L’ambition de la collection Parallaxe
est de montrer qu’il est possible de faire un détour par l’imaginaire
pour parler de sciences et comprendre notre monde.

Question: as far as I know, French—unlike English, where it’s possible but definitely optional–generally repeats the preposition when there’s a conjoined phrase “to talk about science and understand our world”); if I’m right about that, then why does the paragraph contain pour parler de sciences et comprendre notre monde, rather than pour parler de sciences et pour comprendre notre monde, which is what I would have expected?

Bien cordialement,
Frédéric Landragin.

Message diffuse par la liste Langage Naturel <>
Informations, abonnement :
Archives                 :

Désabonnement : envoyer le message “unsubscribe LN” a

La liste LN est parrainee par l’ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

ATALA décline toute responsabilité concernant le contenu des
messages diffusés sur la liste LN

French vocabulary:

 enchâssement: encastrement dans une châsse (

The English translation of this word on WordReference makes no sense to me, but I never pass up an opportunity to use the word châsse. 

encastrement: insertion d’un objet dans un autre ( Nous avons opté pour l’encastrement de l’électro-ménager dans les meubles de notre cuisine. (Example sentence also from WordReference)

 The English translation on WordReference seems right for their example sentence, but not for their French-language definition of enchâssement.  Maybe châsse has a meaning besides the one that I know, which is a synonym of reliquaire?  Not according to WordReference, whose English-language translation is, once again, at odds with their French-language definition: the French-language definition is coffret pour reliques précieuses, but they translate châsse into English as shrine, when it should be reliquary.    



American English reading practice: John McCain, Trump, and torture

I’m a US military veteran, and proud of it. If anyone hates torture more than a military person, I don’t know who it is.

John McCain was shot down and held prisoner for 5 and a half years by the North Vietnamese. He never recovered physically from the frequent and lengthy torture sessions that he underwent. The son of an admiral, he was offered early release, but refused to be set free until all of his fellow prisoners were. Meanwhile, Trump avoided the draft, later bragged about it repeatedly in public, and attacked McCain repeatedly as a candidate and as president. Asshole.

Afin de travailler votre amerloque, voilà un reportage sur la torture, John McCain, et Trump.  On débute avec du vocabulaire, et puis je vous invite à suivre le lien vers l‘article dans son intégralité.

For more on a proud US military veteran’s opposition to Trump’s immoral ideas about torture, see this post.  Do you have corrections for my crappy French?  The Comments section awaits you.

Speaking out on torture and a Trump nominee, ailing McCain roils Washington

to speak out: to say something by way of a public statement, typically criticizing something.  Note that the preposition here is on, but it could also be about, and possibly others.

ailing: sick.  If English had the concept of langage soutenu, this would be soutenu, like many of the words in this article.

to roil: to stir up, to disturb, to put in a state of disorder (see Merriam-Webster, sense 2)

Sen. John McCain is 2,200 miles from Washington and hasn’t been on Capitol Hill in five months, but he showed this week that he remains a potent force in national politics and a polarizing figure within the Republican Party.

potent: powerful

polarizing: “to break up into opposing factions or groupings: a campaign that polarized the electorate” (Merriam-Webster, sense 3). Today’s Republican Party can generally be divided into people who like McCain, a war hero and basically OK guy right up to his recent death–versus immoral shitbags who cravenly support Trump no matter how low he stoops into the mud.  Thus: he’s a polarizing figure within the party.

But his declaration Wednesday in opposition to Gina Haspel, President Trump’s nominee for CIA director, has uniquely roiled the political scene. The denunciation has prompted reactions from fellow senators and a former vice president, as well as intemperate remarks from some Republicans aligned with Trump, including a White House aide.
to prompt:to serve as the inciting cause of : evidence prompting an investigation” (Merriam-Webster, sense 3).
intemperate:  not temperate, where “temperate” means “akeeping or held within limits not extreme or excessive MILDmarked by an absence or avoidance of extravagance, violence, or extreme partisanship” (Merriam-Webster, senses 2a and 2d)”
It has revived the fierce debate over torture and its effectiveness in extracting information in the years since the Sept. 11 terrorist attacks — from a man who speaks from experience. McCain was held for 5½ years in a North Vietnamese prison, often deprived of sleep, food and medical care, after a jet he piloted was shot down over Hanoi.
No need for translation here, but for context, it’s worth knowing that McCain was a war hero and a staunch supporter of the US military–and hugely, vocally opposed to torture.  In contrast, Trump the draft-dodger (réfractaire, I think) has long advocated it.  Asshole.
Click here for the complete article in the Washington Post.

What a linguist would name a store if a linguist owned a store

to delight in: to really enjoy doing something; to like a thing very, very much. Examples:

Trump delights in insulting people who are less powerful than he is. Fucking bully–nothing more despicable than a fucking bully.


Trump delights in his ability to insult women’s appearance on the world stage. What a loser.


How it’s used on the sign: Delight in treasures old and new.


Brought to you by the Anglophone Association for the Promotion of Weird Prepositions.

Shit Interpreters Say


Thanks to the person who posted this on my Facebook timeline–you shall remain anonymous, since you probably would not want it known that you know me far too well.

English notes

This little gem of humor about the realities of translation/interpretation uses a number of devices from very colloquial written English.  Three of them:

Wanna: want to. “Now I really wanna see a horrible faltering translation from one of these movies…”

Cuz: because.  Can also be written ’cause or cos, and cuz can also be “cousin.”

The thing is:  This is used to introduce an assertion that … hm… states some kind of problem or complication with whatever it is that is under discussion.  For example: Zipf, are you going to the lab meeting?  Well…the thing is, I double-booked myself at 1.  In the material, when the person says (I’m going to insert some punctuation, which will make it a lot easier to follow) the thing is, in one dialect this word is the name of a terrifying Demon but in a completely different language from the same area that… the “thing under discussion,” if you like, is the fact that the person is being expected to be able to translate this stuff (but there’s this complication related to the multiple possible meanings of the word in question).

Note that if you’re being really casual, you can shorten this to just thing is… omitting the “the.”