How to write a personal statement for a grad school application

There is a bit of an art to writing a personal statement for a graduate school application. Here’s how to do it.

Applying to a graduate program means filling out a lot of paperwork–and writing a thing or two yourself. One of those things is called a personal statement, and there is a bit of an art to writing one.  Here’s some advice for doing it.

The first thing to know about a personal statement is this: it’s not actually personal.  Your goal in a “personal statement” is not to tell the admissions committee who you are “as a person,” but rather to take advantage of this opportunity to speak to them to show that you would be a good fit for their program.

What that means: you want the admissions committee member who is reading your statement to finish saying this to themself: oh–they could work with our faculty member Dr. Zipf [insert some actual faculty member of the institution in question, unless you’re applying to my institution].   (The pronoun themself is explained in the English notes below.)

How you lead them to that happy conclusion: don’t tell them, but show them.  Here are some things that you can do:

  1. State that you are interested in one or two specific areas of research of that department.
  2. State that you became interested in the/those topic when doing a research project on that topic…
  3. or, if you have not done research on that topic, then that you got interested in it/them while doing research on some other topic and coming across a paper on the topic by some member of the faculty of the department to which you are applying.
  4. List some areas of specialization within that topic or some related topics that you would be interested in working on, where those specializations or related topics are actually areas of research that members of the department to which you are applying work within.

Why I say one or two: you very much want to avoid a situation where (a) only one person in the department works on a topic, and (b) you don’t know it, but that person is getting ready to retire/move to another institution/begin a three-year period as the Associate Dean for Reproducibility, or something.  You avoid that situation by either (a) talking about a topic that two or more people in the department actually work on, or (b) talking about more than one topic.

Now, you may be asking yourself: what if I can’t find anyone in the department who works on my area of interest? The answer:

If you cannot find anyone in the department who works in your area of interest, then that department is not a good fit for you.

…and that’s exactly what the department wants to know.  In fact, if you apply to a graduate school and they don’t accept you, it is entirely reasonable to assume until proven otherwise that they’re not rejecting you, but just don’t see their department as the right place for you.

Need to know how to ask for a letter of recommendation for graduate school?

Click here.

This post is written on the basis of my time on the admissions committee of a medium-sized graduate program in computational biology.  If you have other perspectives/opinions on the subject, please add them to the comments below!


English notes

When you get deep into the weeds of the English language, one of the things that you run into is dialectal variation in pronoun use.  For example:

Dative pronouns in conjoined subject noun phrases: In the Pacific Northwest region of the United States, if you have a subject with two more people joined by a conjunction (e.g. and or or), then the pronouns are in the dative form, not the subject form.  For example, look at these contrasts:

  • I’m going to the store.  (subject)
  • He’s going to the store.  (subject)
  • Me and him are going to the store. (dative)
  • Him and me are going to the store. (dative)
  • Anaïs is going to the store. (subject)
  • They are going to the store. (subject)
  • Anaïs and them are going to the store. (dative)

Even in the Pacific Northwest, you don’t have to talk this way–it’s pretty regionally specific, and people will understand you just fine if you say he and I are going to the store.  But, if you are in that part of the country, you have to be able to understand it.

Atypical reflexive pronouns: Other oddnesses have to do with the reflexive forms of pronouns.  For example, in my dialect, the third-person plural forms they/them/their are used if you don’t know the gender of the referent.  Straightforward enough–that usage goes back centuries in English. But: in a reflexive context (i.e. when the subject is doing something to itself or for itself), you get a variety of forms, depending on number:

  1. You want the admissions committee member who is reading your statement to finish saying this to themself: oh–they could work with our faculty member Dr. Zipf [insert some actual faculty member of the institution in question, unless you’re applying to my institution].  That is obscure enough that it does not even show up in Merriam-Webster’s online dictionary.
  2. My aunt and uncle bought themselves a new copy of the compact edition of the Oxford English dictionary. This plural form is totally standard American English.
  3. My aunt and uncle each bought themselfs a new pair of sunglasses. …and that one, again, does not show up in Merriam-Webster.

This raises a question: how would someone who doesn’t speak a dialect like this say (1) and (3)? I’m pretty sure that in (3), they would say themselves.  But, (1)?  I don’t know another way of saying it–native speakers?

The picture at the top of this post is of Oxley Hall on the Ohio State University campus. I had the pleasure of getting a master’s degree in linguistics there in the 1990s. Mostly we hung out in the basement analyzing spectrograms, but we would occasionally sneak up into the tower.  Fun.

 

Jokes that can’t be translated

Sucking the joy out of language for 30 years

Things written in [square brackets] are in the International Phonetic Alphabet.

French: A farmer in Picardy takes his pig to the vet.  The vet says to him: c’est tatoué?  The farmer says: ben sûr c’est à mwé!

English: What’s black and white and [rɛd] all over?  A newspaper.

American Spanish: How is a cat like a priest? Ambos [kasan].  


The French joke relies on a regional dialect where oi is at least sometimes pronounced wé rather than wa.  The vet asks the farmer is it tattooed? in standard French, but the farmer understands it in the regional dialect as is it yours?, and answers of course it’s mine!

The English joke relies on the homophony between the color red and the past tense of the verb to read.  This riddle puzzled the shit out of me when I was a small child, which in retrospect I should have realized meant that I was never going to be a very good linguist.

The Spanish joke relies on the American Spanish non-distinction between the pronunciation of and s.  (“American Spanish” means Spanish as spoken in the Americas, i.e. South, Central, and North America.)  A cat casa (hunts), while a priest caza (marries).  They’re written differently, and in Spain (and maybe some upper-class American dialects, but I can’t swear to it) are pronounced differently, but they’re pronounced the same in the Americas.

Sucking the joy out of language since 1989,

Beauregard Zipf


English notes

vet: This word can mean two things in American English:

  • veterinarian, as in the joke.  Examples:
    • took my dog to the vet just to find out he’s sick af (af = “as fuck,” an adverb meaning “a lot”)
    • My dog hates going to the vet.
    • Ask a cat vet online now
  • veteran, or former member of the military.  Examples:
    • She and other vets said there’s frustration that the President is quick to claim credit for successes and happy to bask in the reflection of the military’s luster but doesn’t follow through on tough issues. 
    • Vets groups decry hatred, racism in wake of Charlottesville violence (Source: headline here.  Charlottesville is a city in North Carolina where the president of the United States of America defended a white supremacist rally at which an anti-racism protester was killed.)
    • The veteran’s voice is crucial to changing the hate rhetoric directed at Muslims. “When I served in the United States Marine Corps, I took an oath to the Constitution of the United States. There is a First Amendment, which respects religious tolerance and freedom of speech,” stated John Amidon, Vietnam vet and member of Veterans For Peace.

Picture source: https://www.memecenter.com/search/vet

What computational linguists actually do all day: The variable reuse edition

I know, I know–computational linguistics seems like the most glamorous job in the world, right?

Dearly Beloved Colleagues,

I just spent several frustrating hours trying to fix a bug in my code.  In the end, the bug was purely a logic bug, and it was purely the product of poor variable-naming.

Code is the instructions that you write in a computer language, for a program to execute.

Here’s what happened.  I’m writing the world’s simplest script–I just need to read in some files that contain values for features for individual files–or, to put it better: for individual papers that I want to classify.

script is a kind of computer program, typically one that does a relatively simple task.

…and, with that, I think you can already guess what happened.  I was opening files that contained features that I had extracted from other files, and I reused a variable name.  Consequently, once my script reached some critical length, I could no longer keep track in my own head of the code that I was editing.  So, my test cases found a simple bug, and in the process of fixing that bug, I got myself so confused that I was mixing up the “files” in the sense of “papers that I’m classifying” and the “files” in the sense of “files containing feature values from papers,” and the next thing you know, several hours have gone by.

variable is something in a computer program whose value can be changed.  It’s the opposite of a constant, which is something whose value cannot be changed.  For example, the number is a constant–its value will always be 3.  On the other hand, a computer program might contain something called length_of_word, intended to store the length of some word that you’re looking at, and that length could be anything, in principle.  (Really?  How about 0?  Or a negative number?  This kind of unstated assumption is one way that computer programs can go wrong.)

This is one of those things that gets fixed by (1) printing out my code on actual paper, noticing the same variable name in two clearly-marked-off-as-different sections of the code, and thinking “Zipf, you might be even more stupid than you knew…”; (2) sitting in the Philadelphia sun with a pack of cigarettes and a quality zombie novel for a while (Déchirés, by Peter Stenson–the zombie apocalypse comes and the only people who survive are meth addicts–I think you can come to your own conclusion about the metaphor a lot quicker than I fixed my code); and then (3) you go back and look at the code and you see immediately how you managed to confuse the heck out of yourself.

My error here was in reusing my variable to store two different kinds of information.  This is a classic error in computer programming. I either didn’t notice that I was doing it when I moved from the first part of the program to the second part, or more likely, noticed it but didn’t think that it would be a problem because the script was relatively short and simple.  The problem with variable reuse is not for the program itself; rather, the problem is for the programmer, because variable reuse is a great way to confuse yourself.  That’s exactly what I did–bad Zipf, bad!

Happy Saturday from Penn Student Housing, where either the kid in D3 is going to stop throwing rotting chicken in the communal trash can or he’s going to wake up with it in his bed,

Zipf

I notice that I’ve been writing a lot of whiny posts about computational linguistics lately.  In fact I LOVE my job, enough so that I am probably one of the happiest people you know–or don’t know.  Want the English-language version of Déchirés?  Here it is: Fiend.  I read it three times in English before I read it in French, so it MUST be good, right?

Billet-doux: love letter

This is a love letter.  It’s not to my grandmother, although it could be.  My favorite memories of her: sitting together on her front porch in the morning, sharing a cup of coffee and a cigarette, talking about nothing–or just not talking at all.

izis-jacques-prévert-cigarette-cat
Prévert in Paris, 1946. Photographer: unknown. Cat: unknown.

This is a love letter.  It’s not to Jacques Prévert, although it could be.  I’m usually up at daybreak, and sometimes as the sun peeks over the horizon I’ll go outside to have a smoke and read his Encore une fois sur le fleuve.  I’ve read some of his poems so often that they form a sort of soundtrack in my head as I walk the streets.  In his photographs, he looks like the uncle you always wanted–a face that you can tell is just barely hiding a smile, a cigarette in his hand–or just hanging from his lips.

This is a love letter.  It’s not to my grandmother, although it could be.  When she died, I found her long white evening gloves and her cigarette holder.

This is a love letter.  It’s not to my grandfather, but it could be.  One of my mother’s friends told me this about him: his apartment was nothing but books and cigarette smoke.  

This is a love letter to cigarettes.  Yeah, I know: they’re gonna kill me.  Hell–if I didn’t smoke, I might live two years longer!  Two years against some connection, any connection, with the French grandfather who had my mother when he was as old as I am now (very), and died before I was born.  Two years against Jacque Prévert in my head when I walk the streets in Paris, or anywhere in the world, really.  Two years against that memory of my grandmother, the warm Florida mornings, the ashtray that my father made for her in summer camp.  Seems like I come out ahead on this one.

The picture at the top of this page is not my grandmother, but the American actress Carol Landis, photographed in 1946 for a Kislav glove ad.  Photographer: unknown.


English notes:

To walk the streets: be careful with this one.  It can mean walking nowhere in particular–not flâner, as it connotes a certain intensity and solitariness that is lacking in flâner.  It can also mean living by prostitution–compare the noun streetwalker, a prostitute qui fait le trottoir.  Yet another meaning: to be free after a time in prison.

  • How I used it in the post: I’ve read some of his poems so often that they form a sort of soundtrack in my head as I walk the streets.
  • With the “out of prison” meaning: Many are outraged that the convicted killer will be walking the streets after spending just two years in prison. (Source: the Farlex Free Dictionary.)
  • With the “prostitution” meaning, in a slightly different construction: 52 and still working the streets.

French notes:

le billet-doux: an old term for a love letter.  I understand that you can use it for comic effect.  But, compared to la lettre d’amour, I like the sound of billet-doux much more.  Doux: it just sounds…right.  (Phil dAnge, can you comment?)

What computational linguists actually do all day: The recursion edition

I know, I know: computational linguistics sounds like the world’s most glamorous profession, right? 

I know, I know: computational linguistics sounds like the world’s most glamorous profession, right?  You imagine a bunch of geeks in hip glasses sitting around talking about Sanskrit is-aorist verbs, playing a little foosball after a free sushi lunch in the Google cafeteria, and then writing code to translate Jacques Prévert into idiomatic American English with a little stock ticker in the upper-right corner of their screen so that they can watch the value of their vested options go up, and up, and up, and…

In reality, I’m sitting in the international student dormitory of a well-known East Coast American university.  Yesterday was a good day, because the shitwad in room D2 left his dirty dishes in the sink for the full 48 hours that let me feel fine about throwing the reeking things in the trash can.

But, then I realized something: I can only get easy copyright releases for the book I’m writing for papers published in 2016 or later.  That means that I need to do a serious analysis of what I’m citing in the book, which means…writing code (the computer language that makes up a program) to go through a bunch of citations to figure out what year they were published, in which conference or journal, etc., etc., etc.

That means that I write stuff that looks like this:

open (IN, "/Users/transfer/Dropbox/Scripts-new/bioNLP.bib") || die "Couldn't open input file...\n";

…and then spend a lot of time looking at the error message “Couldn’t open input file”, ’cause I was missing the slash at the beginning of this:

/Users/transfer/Dropbox/Scripts-new/dummy.bib

…which I was happy to figure out, but didn’t really find all that interesting.

Then I spent a lot of time writing things like the following:

    if ($line =~ /title.*=\{(.*)\},$/) {

        $DEBUG && print "TITLE: $1\n";

        $entry{"TITLE"} = $1;

    }

…which wasn’t particularly difficult, but caused a little pinprick in my soul, ’cause I knew as I was writing it that it would mess up any time that I had a title with a curly-brace in it ({}), and practicing your profession shittily never feels good.  For reasons that we need not go into, having curly-braces in the title of a work happens a hell of a lot more often than you might think, and that fixing that little flaw would require writing something called a recursive functionwhich really shouldn’t be that complicated for a computational linguist (recursion is one of the fundamental properties of language (the picture at the top of this page is a humorous illustration of recursion (which is probably oxymoronic (and as you might have guessed, these embedded parentheticals are themselves an example of recursion (as is the second sentence of this post (an example, that is–not necessarily a humorous one (unlike the cartoon))))))), and yet still, is more than my little brain de pois chiche (garbanzo bean) can handle on a Sunday morning.

Then, in order to be able to see any actual output, I had to write code like the following:

        my $output = "";

        for my $field (@fields) {

            #print "$entry{$field}\t";                                                               

            $field .= $entry{$field} . "\t";

        }

        $field =~ s/\t$//;

        print "$field\n";

    }

…which was neither particularly challenging nor particularly interesting, but caused my program to crash quite rudely, ’cause for reasons that we need not go into, I should have written

        my $output = "";

        for my $field (@fields) {

            #print "$entry{$field}\t";                                                               

            $output .= $entry{$field} . "\t";

        }

        $output =~ s/\t$//;

        print "$output\n";

    }

That gave me the first thought I’d had all morning that was actually interesting, as I contemplated how hard I’m pretty sure that it would have been–how impossible I at least hope it would be, for the moment at any rate–for a computer to find and fix that particular bug.

Another half hour or so of work, and now I can actually see what I wanted to know, which is the venues where the works that I cite were published.  This was useful, in that I noticed that one that should be heavily represented in my bibliography in fact barely figures there at all.  But, what it meant was that I needed to Google hither and yon to find out how to search Google Scholar (we’re just getting more and more meta here all the time) by name of conference.  Not particularly challenging; but, not particularly interesting, either.

This is a whiny post, right?  Totally tongue in cheek, though.  Actually, I have the incredible good luck to love what I do, and the book in question really is a labor of…a labor of love.

 


English notes

Something in this post that is perfectly fine English but that I probably would not have written if I didn’t spend a lot of time writing (poorly) in French these days:

I noticed that a publication venue that should be heavily represented in my bibliography in fact barely figures there at all.

An educated speaker of the langue de Molière will be aware that figurer sur une liste is perfectly natural (as far as I know) French.  What I wrote is perfectly fine English, but I would suspect that it doesn’t occur very often, even in written academic or official English.  Why did it pop out of mouth (well…fingers) today?  French-language interference, which is funny, ’cause in language teaching we often talk about first-language interference (carrying over aspects of the grammar of your native language, such that they fuck up your mastery of a foreign or second language), but I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first.  Go figure…

go figure is an expression that expresses surprise about something that you’ve just been talking about, or an assertion that you are about to make.  How I used it in the post:

I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first.  Go figure…

 

 

How to design the methods for a data science, machine learning, or natural language processing project: Part I

Yay—I have a data science/machine learning/natural language processing project! Now what do I do??

I occasionally use this blog to try out materials for something that I will be publishing.  This post is a casual version of something that will go into a book that I’m writing about…writing.

So, you’re going to do a data science project.  Maybe you’re going to use natural language processing (processing: using a computer program to do something; natural language: human language, as opposed to computer languages) to analyze social media data because you want to find out how veterans feel about the medical care that they receive through the Veterans’ Administration.  (Spoiler alert: a number of my buddies are vets, and they do indeed use the Veterans’ Administration health care system, and they both (a) are happy with it, and (b) recommend it to the rest of us.)  Maybe you’re doing it as a project for a course; maybe you’re doing it as your first assignment at your high-paying brand-new data scientist job; maybe you’re planning to write a research paper for a journal on military health care.  How do you go about doing it?


An excellent piece of advice when you’re trying to figure out how to do any research project: write out what you’re going to do, in prose, before you start doing it.  As my colleague Graciela Gonzalez, of the Health Language Processing Laboratory at the University of Pennsylvania School of Medicine, puts it:

Most of us make some mistakes in the process of thinking through how we will test our hypothesis.  The advantage of writing down what you’re going to do–the Methods section of a research paper, the design of your research project–before you do it is that when you see it on paper, spelled out explicitly and step by step, you will often notice the logical or procedural errors in what you were thinking, and then you won’t spend weeks making those errors before realizing that they were never going to get you where you wanted to go.

OK, so: you know that you’re going to write out your methods, very explicitly and in the order in which you will do them.  But, how do you figure out what those methods should be?


An efficient way to go about this is to read research papers by other people who have done similar things.  As you read them, you’re going to look for a general pattern–think of this as an example of the frameworks that we’ve talked about in other parts of this book.  Returning to our example of using natural language processing to analyze social media data, you might go to PubMed/MEDLINE, the National Library of Medicine’s database of 27 million biomedical research articles, and search for papers that mention either natural language processing or text mining, and also have the words social media in the title or abstract.  (Click here if you would like to see the set of 190+ papers that this search would find.)

The results of that search will return these three papers that are studying a problem similar to yours: they’re using natural language processing to find women talking about their pregnancy, people talking about adverse reactions to drugs, or people talking about abuse of prescription medications–not exactly what you need to do, but similar. You’ll see two steps that are carried out in all of them.  I’ve highlighted the points where they’re mentioned in the abstracts of the three papers:

METHODS: Our discovery of pregnant women relies on detecting pregnancy-indicating tweets (PITs), which are statements posted by pregnant women regarding their pregnancies. We used a set of 14 patterns to first detect potential PITs. We manually annotated a sample of 14,156 of the retrieved user posts to distinguish real PITs from false positives and trained a supervised classification system to detect real PITs. We optimized the classification system via cross validation, with features and settings targeted toward optimizing precision for the positive class. For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined.

Sarker, Abeed, Pramod Chandrashekar, Arjun Magge, Haitao Cai, Ari Klein, and Graciela Gonzalez. “Discovering cohorts of pregnant women from social media for safety surveillance and analysis.” Journal of medical Internet research19, no. 10 (2017): e361.

METHODS: One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies.

Sarker, Abeed, and Graciela Gonzalez. “Portable automatic text classification for adverse drug reaction detection via multi-corpus training.” Journal of biomedical informatics 53 (2015): 196-207.

METHODS: We collected Twitter user posts (tweets) associated with three commonly abused medications (Adderall(®), oxycodone, and quetiapine). We manually annotated 6400 tweets mentioning these three medications and a control medication (metformin) that is not the subject of abuse due to its mechanism of action. We performed quantitative and qualitative analyses of the annotated data to determine whether posts on Twitter contain signals of prescription medication abuse. Finally, we designed an automatic supervised classification technique to distinguish posts containing signals of medication abuse from those that do not and assessed the utility of Twitter in investigating patterns of abuse over time.

Weissenbacher, Davy, Abeed Sarker, Tasnia Tahsin, Matthew Scotch, and Graciela Gonzalez. “Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods.” AMIA Summits on Translational Science Proceedings 2017 (2017): 114.

Now we can abstract out the two steps that we found in all three papers:

  1. The authors built a data set.
  2. The authors used a technique called classification–a form of machine learning–to differentiate between the social media posts that did and did not talk about a person’s own pregnancy, or an adverse reaction to a medication, or abuse of prescription medications.

So, now you have a basic outline of your methodology.  Your goal being to use natural language processing to investigate, using social media data, how veterans feel about the care that they receive through the Veterans’ Administration health care system, maybe your methodology will look like this:

  1. Create a data set containing tweets in which veterans are talking about how they feel about the care that they receive in the VA health care system.
  2. Use machine learning to classify those tweets into ones where the vets feel (a) positive, (b) negative, or (c) neutral about that care.

OK, so: now you can expand that.  You’re quickly going to realize that Step 2–classifying those tweets–is actually going to require you to be able to do three classifications:

  1. You have to be able to differentiate tweets written by veterans from tweets written by everybody else.
  2. You have to be able to differentiate tweets where the vets are talking about the VA health care system from where they’re talking about things other than the VA health care system.
  3. You have to be able to classify whether the feelings that they express about the VA health care system are positive, negative, or neutral.

Now that you’ve started to flesh out your methodology, you realize something: creating that data set is going to take a really long time, since you essentially have to be able to label three different kinds of things in the social media posts.  You have a finite amount of time and resources with which to do it, so how are you going to make that possible?

Faced with an enormous amount of work to accomplish with limited time and resources, the most sane approach is this: go to your supervisor, show them your detailed methods plan, and let them come to the conclusion that they had better either (a) give you a lot more resources, or (b) modify your assignment.  Having gone through this multiple times over the course of my career, I can tell you that (b) is a hell of a lot more likely.  What is the modified assignment going to look like?  It’s probably going to be a reduction of the task to “just” the task of detecting tweets that were and weren’t written by veterans.  Now you can go back to your outline, and modify it:

  1. Create a data set containing tweets written by veterans, and tweets written by anybody else.
  2. Use machine learning to classify those tweets into the ones that were written by veterans, and the ones that weren’t.

This is going to be hard enough, believe me.  Here are some examples of what those tweets might look like–I made them up, but they’re totally plausible:

  1. HM1 Zipf here, USS Biddle 1980-1982–BT3 Raven McDavid, you out there?
  2. AFOSC raffle drawing at 1500–win that lawnmower and help us buy books for the squadron?
  3. FTN today, FTN tomorrow, FTN and fuck Chief Chomsky til I get out this motherfucker
  4. Mario Brothers, still nothin like it, bitchboys

Have you figured it out?  Here are the answers:

  1. Clearly written by a veteran.
  2. Almost certainly written by the spouse of an active duty Air Force officer, so not written by a veteran.
  3. Clearly written by a sailer who is still on active duty, so not written by a veteran.
  4. No clue who it was written by, and/but there’s no reason whatsoever to think that it was written by a veteran, so it should be classified as not written by a veteran.\

What’s that you say?  It wasn’t clear to you at all?  Think about this: if it wasn’t clear to you, it’s certainly not going to be clear to a computer program, so your classification step is going to be difficult.  In fact, if it’s not clear to you, you’re going to have a hell of a difficult time building the data set–time to go back to your supervisor and ask for the resources to hire some veterans to help you out!

…and (4) raises a super-difficult question: what the hell counts as a reasonable experimental control for this research project?  (Spoiler: I don’t know, and I have a doctoral degree in this particular topic.)


All of this to say:

  1. Your redefined project is going to be plenty hard, thank you very much.
  2. You wouldn’t know how crucial it was to redefine said project if you hadn’t started the process of writing out what exactly you’re going to do.

…and hell–you hadn’t even gotten to the “exactly” part yet!  So: take Graciela’s point seriously, and write some things down before you start doing anything else.

…and now you can think about what you’re going to measure to figure out whether or not you were successful in doing what you were trying to do.


Linguistic geekery: Raven McDavid was a dialectologist back in day.  He is said to be the inspiration for the Harrison Ford character in Raiders of the lost ark.  Chomsky is Noam Chomsky, the most important (although not the best, in my humble opinion) linguist of the 20th century.  Where they appear in the post:

  1. HM1 Zipf here, USS Biddle 1980-1982–BT3 Raven McDavid, you out there?
  2. AFOSC raffle drawing at 1500–win that lawnmower and help us buy books for the squadron?
  3. FTN today, FTN tomorrow, FTN and fuck Chief Chomsky til I get out this motherfucker
  4. Mario Brothers, still nothin like it, bitchboys’

We who smell of anchovy pizza salute you

I thought how horrible it would be to kiss someone whose beard smelled of anchovy pizza.

The adventure of the moment has me in Philadelphia, one of the great eastern cities of the Colonial era, where I have the privilege of spending the summer as a guest of the Health Language Processing Laboratory of the University of Pennsylvania School of Medicine.  I’m living in a dormitory on the edge of the campus, which is a beautiful part of town–a beautiful part of town that borders what we call in French a quartier sensible.  In plain English: a shitty neighborhood.  The grocery store that I go to is right on the border between the two, so the clientele is a mix of two very different populations: hip young college students and faculty, and the kinds of people who live in shitty neighborhoods.  (Word to the wise: don’t go past 45th Street.)

So, yesterday I’m standing in line in said grocery store.  In front of me is a smart-sounding young woman.  Being an American, she’s talking on her cell phone.  Being an American, she’s talking loudly enough for everyone else to hear.  Being American, we’re all listening.  (In France, we do not have loud conversations on the phone in public–rude, rude, rude.)  She’s talking about what medical specialty she’ll be going into–it’s clear which side of the border she’s from.

In the line next to us is an old dude who clearly is from the other side of the border.  He stands in line with his hands behind his back.  Being an American, he’s fat.  Ratty jeans, and his nasty t-shirt reveals a jailhouse tattoo on his arm that says FTW, which for those of us of a certain age stands for Fuck The World.  His beard looks like it probably smells of anchovy pizza.

The smart-sounding young lady is talking about what medical specialty she’ll be going into.  Funny thing is, she’s not just talking about it–she’s apologizing for it.  Mom, dermatologists help people, too.  I mean, at least in small ways, we can affect their lives a little bit.  …No, it’s not like saving lives all day in the emergency room, but we can make a little bit of a difference for people. 


So, I’m standing there in line, and I notice something: my favorite soap is on sale for a dollar off.  Computational linguistics is not nearly as remunerative as one might think, and that dollar off could be translated into the luxury of a cup of coffee, so as the smart-sounding young lady says her goodbyes and hangs up, I think: I’m getting out of this line and I’m grabbing a bottle of soap, and the wait be damned.  Then the nasty-looking old dude with his hands behind his back says something, and I think: hearing this one out is going to be worth spending an extra dollar for a bottle of soap–stay right where you are.

  • “Are you a doctor?  You sound like a doctor.”
  • Well…I’m an intern.  I just finished medical school.
  • “If my grandma has a melanoma, is that a bad thing?”
  • She should definitely go to a doctor.  It can be fatal.
  • “Like, a skin doctor?”
  • Yes, a dermatologist.
  • “A…dermalologist?”
  • Yes, a dermatologist.  Melanoma can be fatal, but if it’s caught in time, it could save her life.
  • “So, a skin doctor could save someone’s life?  Who woulda thought?”  

For the first time, she looked at him, right in the face.  I’m guessing that she was thinking something like I was thinking: nobody as ancient as this old fuck has a living grandmother.  What the hell?

“I sure do love my grandma,” he said.  And he smiled.

She looked at him for a while.  You could see the wheels turning.  And then she smiled, too.  Thank you, she said.

She paid for her groceries, and left.  The old fuck turned to put his groceries on the counter, and I saw what was in those hands behind his back: a book about existentialist perspectives on psychoanalysis.  I looked at his FTW tattoo.  I thought how horrible it would be to kiss someone whose beard smelled of anchovy pizza.  I bet his dead grandma didn’t mind, though.


to say something in plain English: to say something clearly, with no big words or complicated sentences.  (Phil dAnge: is there a French equivalent?) Examples:

 

 

funny thing is: …means that you’re about to describe something that is strange about a situation.  Could also be weird thing is, strange thing is, and you could also put the in front of it.  Examples:

 

 

  • How I used it in the post: Funny thing is, she’s not just talkingabout it–she’s apologizing for it. 

remunerative: adjective that means that something pays a good salary.  How I used it in the post:

  • Computational linguistics is not nearly as remunerative as one might think, and that dollar off could be translated into the luxury of a cup of coffee, so…

(dollar) off: “Off” here means the amount of a reduction in price.  Examples:

 

 

to hear something out: to listen to the end of a discourse of some kind–an idea, an explanation.  Similar construction: to hear someone out, which means to listen until they have said all that they have to say. Examples:

 

 

  • How I used it in the post: Then the nasty-looking old dude with his hands behind his back says something, and I think: hearing this one out is going to be worth spending an extra dollar for a bottle of soap–stay right where you are.

woulda: in informal spoken American English, “would have.”

sure do + verb: an emphatic construction.  Has a very rural flavor.

Looks like [it] smells of anchovy pizza is a marvelously evocative description, and I wish I could say that I came up with it myself, but I didn’t: it’s from an old Berke Breathed cartoon, where Opus the penguin uses it to describe Bruce Springsteen.