Becoming a computational linguist without double-majoring in linguistics and computer science

You’re an undergraduate, and you want to become a computational linguist? Here’s how to do it.

People who want to become computational linguists usually get a PhD in the subject.  Every once in a while, though, you run into someone who wants to study computational linguistics as an undergraduate.  In the United States, that means a student in what we call “college” and the rest of you call “university” (or, if you’re French, la fac’).  Undergraduate students in the US have one, and sometimes two, “majors”–the topic in which they will do the most coursework, and whose name will appear on their official paperwork when they graduate.  To “double-major” is to have two majors, rather than the usual one.  It’s not super-unusual to do this–I had a double major, in English and linguistics–but, it’s helpful to do a double major only if really necessary, as it’s a hell of a lot of work. 

If you’re getting a bachelor’s degree and want to be a computational linguist, a double major in computer science and linguistics is probably overkill.  (Overkill discussed in the English notes below.)  The most efficient way to become a computational linguist would be to get a degree in linguistics in a department that has computational linguists on the faculty, such as the University of Colorado at Boulder, or Ohio State University. If you want to try to become a computational linguist in a university that doesn’t have computational linguists in any department: first of all, your major should probably be linguistics, not computer science—computational linguists are a kind of linguist, right? (They are—I’m a computational linguist, and I’m a linguist.) You’ll want to do some coursework in the computer science department, but I wouldn’t actually recommend even a minor in computer science—that will probably require you to take some courses that won’t be the most useful ones for you, while taking up time that you could have been using to take courses that would be useful for you.

What should those courses be?  As many as possible from this list:

  • Corpus linguistics (usually offered in the linguistics department, but if your university doesn’t have such a course in the linguistics department, look for courses in the social science, communications, or media departments, possibly with names like “content analysis”)
  • Statistics (best in a linguistics or speech & hearing department–the traditional psychology department or agriculture school courses will kill you)
  • Machine learning (usually offered in a computer science department)
  • Natural language processing (presumably not what you meant by “computational linguistics,” or you would have said so)
  • Automatic speech recognition, if and only if you seriously think that you want to work in this area (often offered in the electrical engineering department)
  • Speech synthesis, if and only if you seriously think that you want to work in this area (again, often offered in the electrical engineering department)

Notice what’s not on this list: programming courses.  Take those if you know that you need them, but if you don’t know that you need them, then don’t take them.  Notice that I also haven’t said anything about linguistics courses: we’re assuming here that linguistics is your major, and you’re going to get a solid and well-rounded background in that field.

Picture source: Mariana Romanyshyn, Grammarly, Inc. https://www.slideshare.net/MarianaRomanyshyn/nlp-a-peek-into-a-day-of-a-computational-linguist-71510838


English notes:

overkill: doing way too much.  Examples:

How I used it in the post: If you’re getting a bachelor’s degree and want to be a computational linguist, a double major in computer science and linguistics is probably overkill. 

 

You won’t learn to speak another language unless…

You won’t learn to speak another language unless you’re willing to do two things: memorize a LOT of vocabulary, and…

You won’t learn another language unless you’re willing to do these two things:

  1. Develop the sitzfleisch to memorize a lot of vocabulary.
  2. Make a fool of yourself–over, and over, and over again.

The only known predictor of success in learning a second language is this: motivation.  There are lots of things that you have no control over whatsoever that can tip the odds in your favor a little bit–already being bilingual, having had exposure to native languages other than your own in childhood, being quite young when you begin–but, in the end, the only thing with enough of an effect to be predictive is having sufficient motivation.

What do you do with that motivation?  Everyone who’s successful at second language acquisition develops their own tricks.  But, there are two things that are essential–without them, it’s just not going to happen.  You must use your motivation to make yourself do two things:

440px-Zipf_30wiki_en_labels
A graph showing the Zipfian distribution of words in 30 languages on a log-log scale. Source: Wikipedia, https://commons.wikimedia.org/wiki/File:Zipf_30wiki_en_labels.png.
  1. Memorize an enormous amount of vocabulary.  Knowing a sufficient amount of the grammar of the language is necessary, but it’s having lexical items (words) to plug into those grammatical structures that makes the difference between being able to function in the language, or not.  And: if you’ve been following this blog for a while, you know that it’s a very basic fact about the statistics of language that you need to memorize not just the common words, but an enormous amount of rare words, too–because about 50% of the words that you will run into on any given day are going to be statistically of very low frequency.  (That’s the Zipf’s Law in the title of this blog.)
  2. You have to be able to tolerate feeling like an idiot.  Specifically, you must use your motivation to force yourself to take the opportunities that you get to practice the language that you’re trying to learn.

I happen to know from experience that you can feel really stupid without (so far, at least, in my case) dying from it.

This month finds me in Wuhan, a city in roughly central China of 10 million inhabitants, essentially none of whom are occidentals.  There’s a charming aspect to this–people will literally ask me to pose for pictures with their children.  Fat, bald old me.  There being no occidentals here to speak of, the people in stores, restaurants, etc. rarely speak English, so studying Mandarin (or the local dialect, which is not mutually intelligible with Mandarin) is a necessity.

If you have not tried to live with just a tiny bit of a language, you might be surprised how little you can get by with.  For example, a couple days ago I had my first Mandarin conversation.  It involved dropping off my laundry, and went like this:

Me: Míng tiān ma? (Tomorrow?)

Nice laundry lady: Míng tiān.  (Tomorrow.)

Now, after a couple weeks of me struggling to communicate in Mandarin when dropping off my laundry, the laundresses du coin are a lot less nervous about dealing with a hairy barbarian and have progressed to giggling and trying to teach me new words.  Strangers: a different story.


Today I’m sunning on a terrace with a cup of coffee (a luxury here–that cup of coffee cost more than the large, delicious, and healthy meal that I had just eaten) when I notice a couple girls adjusting, readjusting, and re-readjusting their…berets.  Not a huge shocker, as the stereotypical Parisian tourist is now Chinese, but still: they were looking at each other’s berets, then looking at their own berets using their cell phone cameras in lieu of mirrors (yes, in lieu of is English), then looking at each other’s berets, then touching up their lipstick, and then starting all over again.

Obviously I needed a picture of this, but how to get it?  I mean, it’s not like you can go around taking photos of women you don’t know without risking an ass-kicking.  Ah–but, at 56, I am totally accustomed to making a fool of myself.  We have the following (one-sided) conversation:

Me (fat old bald guy, remember): Duì bu qǐ (‘excuse me’).  Patting myself on the chest: wǒ shì fǎ guó rén (‘I am French’–not true, but I am of French descent, so…).  Pointing at each of their hats: fǎ guó, fǎ guó (‘France, France’).  Then I mime taking a picture with my camera.

Them: Speaking to each other for a while, then looking at me like I’m insane, or an idiot, or both.

Me: Patting myself on the chest: wǒ fǎ guó rén (‘I am French’).  Pointing at each of their hats: fǎ guó, fǎ guó (‘France, France’).  Then I mime taking a picture with my camera.  (Yes, this is exactly what I said the first time.)

Them: Talking to each other again for a while, then they shrug at each other–and pose for a couple pictures.  

Me: Xiè xie (‘thank you’).

Them: Walking away in silence.  Do I want to know what they’re thinking?  Definitely not.

Now, bear in mind: the purpose of my relating this attempt at conversation is not to brag about how great my Mandarin is.  The opposite–the point is how bad (nonexistent, really) my Mandarin is.  And yet: I know that…

  1. …if I’m willing to keep making a fool out of myself, I might actually get comfortable with the language (in, say, SEVERAL YEARS), and…
  2. …if I’m not willing to keep making a fool out of myself, I will never get comfortable with the language, and…
  3. …making a fool out of myself did not kill me.  Embarrassing?  Yes.  Fatally so?  No.
IMG_1460
Two beret-wearing Huazhong Agricultural University students being very tolerant of a fat old bald guy outside the Luckin Coffee cafe. Picture source: me.

So, the next time you’re trying to work up the courage to practice your language of choice, just remember this: at least you’re not a fat old bald guy like that funny-sounding Zipf fella.

Scroll down for the English notes!


English notes

sitzfleisch: The perseverance to just sit and plug along at a task.  I learned it from my master’s thesis advisor, who often pointed out that two hours in the library can save you four months in the lab–suddenly the word has become popular.  I have no clue why.

a couple: ‘a couple of.’  This is one of those things that other native speakers give me shit for saying.  What can I tell you–like my hero Tonya Harding, I’m Oregonian trailer trash.  And, yes–you should go see the movie.  It’s really good.

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues