Computational linguistics and misinformation

Computational linguistics takes on the infected swamp that the World-Wide Web has become

In the late 1990s, I worked at a start-up. At the time, it was one of the 25 largest web sites in the world.

Why “largest web site,” and not “biggest web site?” English tends to use “big” to refer to physical objects, and “large” to refer to abstract concepts. Note that I said “tends to”–this is a statistical tendency, not an absolute.

Like a lot of people working for internet-related businesses or causes, we thought that we were making the world a better place. The World-Wide Web was going to democratize so much–access to information, democratization of everyone’s ability to communicate their message to a broader world.

20+ years later, we all realize that everyone includes a lot of assholes. From a former president of the United States to random evil-doers in the former Soviet Union, there are people who use the technologies that so many of us well-intentioned people worked so hard on to spread hate, to attack democracy, to spread lies.


Misinformation: things that are not true. Disinformation: deliberately created untrue things. Unlike a simple mistake, misinformation is widely spread about. Unlike a lie, disinformation is widely spread about, too. “Diffused,” if you prefer a technical term. “Propagated.”


People like me who had a hand in developing the kinds of technologies that assholes use to propagate misinformation and disinformation have–belatedly, I would say–begun to try to address the kinds of problems that we helped create. One of these is a shared task on detecting online misinformation. A “shared task” involves a bunch of computer-sciencey-types getting together to define a task–say, finding emails that would be relevant to a court case. They come to an agreement about the definition of the task, about the right contents for a shared data set on which to evaluation performance on that task, and a metric for evaluating performance on it. You put together a schedule, everybody goes off and builds a computer system for doing the task, you distribute the data, and on some agreed-upon date, everybody submits their systems’ output to the people who organized the task. Then everyone gets together for a workshop in which we compare systems, compare outputs, and see what we can learn from those comparisons.


A day or two ago, an email appeared in my inbox about just such a shared task. Its goal is to deal with misinformation on the Internet. That’s a pretty goddamn big thing to take on, though, isn’t it? So, the participants agreed on a subpart of the misinformation problem that is a bit more tractable:

The TREC Health Misinformation track fosters research on retrieval methods that promote reliable and correct information over misinformation for health-related decision making tasks.

https://trec-health-misinfo.github.io/

Right away, we know some of the ways that the organizers have defined their hopefully-tractable task definition:

  1. The word retrieval suggests that participants will be given a set of documents, and that their output should be documents from that set. This mimics the basic structure of the World-Wide Web: a set of documents (on a loose definition of the term “document”) that users search in order to find information.
  2. The word health-related suggests that participants will not need to be able to deal with every possible kind of misinformation–only health-related misinformation. This makes the task considerably more (potentially) achievable, and given the amount of misinformation that has recently been spread on health-related issues such as the current global COVID19 pandemic, there is potential benefit to the world as a whole if it can be accomplished. (Notice how I snuck in there the inference that health-related is a word, not a something…more than a word? I don’t actually think that–just showing you how discourse works.)
  3. Promote reliable and correct information over misinformation refers to a common aspect of any “retrieval” task (see #1 above): your system is expected to present not just a set of documents, but a ranked list of those documents. Think about it like the page of results that Google gives you when you do a search: you want the most relevant web page to be at the top of the page, not at the bottom, right? So, that’s what the shared task organizers are asking your system to do: rank correct information over misinformation. Of course, if all of the web pages that your system presents to the user are correct, then that is wonderful. (Normally only the top results are considered in terms of scoring your system’s performance.)

Want more details? See the TREC Health Misinformation Track web page. Note that all opinions expressed in this post are mine, and they especially do not represent those of TREC, the Text Retrieval Conference, an organization that has run shared tasks for…over twenty years now, wow… And if you feel like slapping a computational person of my advanced age for having helped to create the stinking swamp that the World-Wide Web has become: go for it. But, also recognize that computational linguists are trying to do something to…wait for it…drain that swamp.

The picture at the top of this post is from an article published by the New York Times on April 13th, 2020.

How to learn conjugation: the Kaqchikel edition II

…and after all of that, now we can conjugate both consonant-initial AND vowel-initial intransitive verbs IN THE FIRST PERSON SINGULAR PRESENT TENSE ONLY. Persistence, persistence!

Want to learn how to conjugate verbs in French? No problem–you can look them up lots of places on line, you can buy a book on the topic at pretty much any train station in France, and the French Verb Forms app will give you many happy hours of practice. (Seriously, I use it often.)

Want to learn to conjugate verbs in a less-commonly-studied language? Good fucking luck. You have two basic options:

  1. Get a “FLAS.” Foreign Language Area Studies fellowships fund students to do intensive courses in languages that the US has a national interest in having Americans know how to speak. (Seriously, it’s not a coincidence that they used to be called National Defense Foreign Language fellowships.) You either have the wonderful luck to be at a university that offers courses in your (relatively) obscure language of choice, or you snag a $2,500 summer grant to go take an intensive course somewhere. (Indiana University Bloomington is currently offering FLAS language courses in Akan, Bosnian, Czech, Dari, Estonian, Finnish, Georgian, Hungarian… You get the picture.)
  2. Figure it all out for yourself.

Option #2, “figure it all out for yourself,” is the one that you pick if you are a bald old fat fuck such as myself who is not going to be getting a FLAS summer fellowship any time soon. Option #2 begins with figuring out what the characteristics of verbs are in your language of choice, such that those characteristics might affect how you go about learning to conjugate them. We worked our way through this in a previous post. I’ll sum up the outcome like this:

The main things that differentiate amongst verbs in Kaqchikel are

  1. Whether they are transitive, or intransitive
  2. Whether they began with a consonant, or with a vowel

Last time we worked on consonant-initial intransitive verbs. This time, we’ll move on to vowel-initial intransitive verbs. First thing we need: a list of such verbs. To put one together, we’ll go through the glossary at the end of the only English-based Kaqchikel textbook that I know of: ¿La Ütz Awäch?, by R. McKenna Brown, Judith M. Maxwell, Walter E. Little, and Angelika Bauer.

I am simplifying the Kaqchikel transitivity situation quite a bit. This will not shock Americanists (linguists who work on the indigenous languages of the Americas).

-achik’to dream
-ach’ixïnto sneeze
-ajanto brush, to sculpt
-ajilanto count
-ajinto be Ving (a progressive)
-ak’walanto procreate
-aläx peto be born, to sprout
-animäjto run from, to flee
-animajinto escape, to flee
-aninto run
-aponto arrive there
-aq’ab’anto rise before dawn

All of these verbs begin with -a because I started at the beginning of the glossary, and it happens to be in alphabetical order. But, the observant reader will also have noted that many of these verbs end with n. Is this significant? Probably, but I don’t yet know why.

Want to know how to pronounce these verbs? See this post on my adventures in learning the pronunciation of Kaqchikel consonants.


OK, we’ve got some vowel-initial verbs, and we did consonant-initial ones last time, so let’s compare them. We’ll start with the first person singular of the present tense. Last time we learned the prefix yi- for consonant-initial vowels; for vowel-initial ones, we instead use yin-. Let’s look at examples of them side by side, and then we’ll practice the vowel-initial variant using the same technique that we learned last time. Remember that you will look at the example, then cover the second column of the table and work your way down it row by row, doing whatever the example showed you to do.

-sik’in (to read)yisik’in-aq’ab’an (rise before dawn)yinaq’ab’an
-tzijon (to talk)yitzijon-achik’ (to dream)yinachik’
-tz’ib’an (to write)yitz’ib’an-ach’ixïn (to sneeze)yinach’ixïn
-samäj (to work)yisamäj-ajilan (to count)yinajilan
-wa’ (to eat)yiwa’-anin (to run)yinanin
-b’iyin (to walk)yib’iyin-apon (to arrive there)yinapon
Example: -oq’ (to cry)yinoq’
-achik’ (to dream)yinachik’
-ach’ixïn (to sneeze)yinach’ixïn
-ajilan (to count)yinajilan
-anin (to run)yinanin
-apon (to arrive there)yinapon
-aq’ab’an (to rise before dawn)yinaq’ab’an

Following the same strategy that we learned last time, we will now alternate between the two of them…

Example: -wäryiwär
b’ixan (to sing)yib’ixan
-ach’ixïn (to sneeze)yinach’ixïn
-kemon (to weave)yikemon
-atin (to bathe)yinatin
-wa’ (to eat)yiwa’
yinaq’ab’an

…and then mix them randomly:

Example: -oq’ (to cry)yinoq’
-samäj (to work)yisamäj
-tz’ib’an (to write)yitz’ib’an
-atin (to bathe)yinatin
-kemon (to weave)yikemon
-aq’ab’an (to rise before dawn)yinaq’ab’an
-ach’ixïn (to sneeze)yinach’ixïn

…and after all of that, now we can conjugate both consonant-initial AND vowel-initial intransitive verbs IN THE FIRST PERSON SINGULAR PRESENT TENSE ONLY. Persistence, persistence!

The picture at the top of this page is of a Kaqchikel singer I like a lot named Sara Curruchich. You can buy her stuff on Apple Music, and I’m sure elsewhere, as well. Picture source: https://assembly.malala.org/stories/kaqchikel-artist-guatemala. No English or French notes today, but here is one of her songs–I’ll post the words in Kaqchikel and in Spanish below.

IXOQI/MUJERES

xub’ij ri wati’t’ chuwe’: (Mi abuela me dijo:)

Noya, at achiel, at achi’el ri ruwächulew (Sos como la madre tierra)

at achi’el ri ruwächulew (Sí, como la tierra)

CORO.

Ri niya’on riquchuq’a (Quien nos da fuerza, valentía)

Ri niya’on rutz’intz’ojil ri qak’aslem (quien alumbra nuestra vida)

Ri qanaoj, chuqa k’a ri qab’ey (siembra pensamientos, sabidurías y los caminos plurales)

Ri niya’on ri ya’, ri kaq’ïq’, ri q’aq’ (Como agua, viento y fuego)

At keri’ rat wal (así sos)

At keri’ rat wal (“Así, así sos) nïm riaq’ij wal (Es inmenso e importante tu existir”)

Xcha’ ri watit pa jun wachik’ (Dijo mi abuela en mis sueños)

Man junb’ey xtnumestaj ta (Nunca olvidaré su palabra)

Jantape’ k’o pa nuk’u’x re (Se han aferrado a mi corazón)

CORO.

Ri niya’on riquchuq’a (Quien nos da fuerza, valentía)

Ri niya’on rutz’intz’ojil ri qak’aslem (quien alumbra nuestra vida)

Ri qanaoj, chuqa k’a ri qab’ey (siembra pensamientos, sabidurías y los caminos plurales)

Ri niya’on ri ya’, ri kaq’ïq’, ri q’aq’ (Como agua, viento y fuego)

At keri’ rat wal (así sos)

Cada paso que doy

Me acerca a mis hermanas

A la igualdad soñada

Merecida y trabajada

Cada paso que doy

Deja una huella sutil

Un camino que puede seguir

Un destino

CORO.

Ri niya’on riquchuq’a (Quien nos da fuerza, valentía)

Ri niya’on rutz’intz’ojil ri qak’aslem (quien alumbra nuestra vida)

Ri qanaoj, chuqa k’a ri qab’ey (siembra pensamientos, sabidurías y los caminos plurales)

Ri niya’on ri ya’, ri kaq’ïq’, ri q’aq’ (Como agua, viento y fuego)

At keri’ rat wal (así sos)

K’aslem (Vida)

A Cada paso que das

How to learn conjugation: the Kaqchikel edition

With commonly-studied languages, you can find books with page after page of verb conjugations. But, if you are trying to learn a less-commonly-studied language, you will need to put those together yourself.

Learning verb conjugations in any language requires memorization, practice, and more memorization. With commonly-studied languages, you can find books with page after page of verb conjugations; but, if you are trying to learn a less-commonly-studied language, you will need to put those together yourself. The process of practicing those conjugations is the same, though. I will show you a system here that I picked up from the textbook Português Contemporaneo, by Maria Abreu and Cléo Rameh.

Kaqchikel is spoken in Guatemala, in the purple-colored region at the lower left of the map. Map source: online Kaqchikel Dictionary Project.

Being a big believer in writing about what you don’t know, I will illustrate the process with Kaqchikel, a Mayan language spoken by 400,00–500,000 people in the western highlands of Guatemala. There is an excellent textbook on Kaqchikel, called ¿La ütz awäch?, by R. McKenna Brown, Judith Maxwell, Walter Little, and Angelika Bauer. Other didactic materials are hard to come by outside of Guatemala, though–and in particular, there is no 501 Kaqchikel Verbs. (Fat old fucks such as myself grew up using books of verb conjugations with the standard title X01 [language name] verbs, where X is typically a 2 or a 5–for example, 501 French Verbs. They were published by a company called Barron’s. Still are, although I wouldn’t swear that they sell very many copies these days.) Hence: this post. Ready? Let’s do this!

Step One: Figure out what categories of verbs exist.

In Kaqchikel, that will be the following (at least to a first approximation):

  • Transitive verbs versus intransitive verbs
  • Verbs that start with consonants versus verbs that start with vowels

This is a very language-specific thing. Autrement dit: you have to do this for every language. In Spanish, the classes would be different:

  • Verbs that end with -ar in the infinitive, versus ones that end with -er and ones that end with -ir
  • Verbs that are regular in the tense that you’re learning, versus verbs that are irregular in that tense

In French, there would be so many categories that if a student working on an undescribed language told me that “their” language worked that way, I would tell them to go away and come back when they could prove it, ’cause languages like French just are not all that plausible. (The only similar one that I can think of is Dinka–1.3 million speakers in Sudan and South Sudan.)

Back to Kaqchikel… We’ll start with the intransitive verbs. (Left to my own devices, I prefer to start with transitive verbs, but the textbook that I’m using starts with the intransitives, and I’m gonna bet that the people who wrote the textbook known a hell of a lot more about how to learn Kaqchikel than I do.) The other thing that matters to us today is whether the verb starts with a consonant or with a vowel, and here I genuinely have no preferences, so we’ll start with the verbs that begin with consonants, just like the textbook does.)

Step Two: Pick a “person.”

I’m gonna start with the first person singular, i.e. “I,” because I know that that’s what my teacher will ask me about first. I am actually a fan of the third-person singular, i.e. “he/she/it,” when (trying to) learn a language where that’s a regular one. (The third person singular is not necessarily regular in any commonly understood sense of that word. It’s not difficult to find languages with over a dozen forms for the third person singular. Swahili (50-100 million speakers in East Africa–Swahili is so widely used as a lingua franca that it is difficult to know who to count as a speaker) is a common one, and has about 15, plus some more for plurals.)


So, the first-person singular of consonant-initial intransitive verbs: the marker is the prefix yi-. To practice it, I will put together a table like the following. The first row gives an example of what I need to do. You read it like this: when prompted with the verb wär, which means ‘sleep,’ add rïn yi- to it to make rïn yiwär, ‘I sleep.’ Then I have six repetitions. For each one, I cover up the answer, then do the thing, then uncover the answer to check whether I got it write. (Ummmm: right. Damn homophones…) Ready? Let’s do this shit!

Example: wär (sleep)rïn yiwär
xajon (dance)rïn yixajon
b’e (go)rïn yib’e
tz’iban (write)rïn yitz’iban
tzijon (to talk)rïn yitzijon
käm (to die)rïn yikäm
k’ask’o’ (to recover from an illness)rïn yik’ask’o’

You see the system? Now I’ll try the second-person singular, i.e. “you.” This time, we’re going to add rat ya-.

Example: wärrat yawär
xajonrat yaxajon
b’erat yab’e
tz’ibanrat yatz’iban
tzijonrat yatzijon
kämrat yakäm
k’ask’o’rat yak’ask’o’
Second person singular present tense, vowel-initial intransitive verbs

Now it’s time to mix the two together:

rïn/wärrïn yiwär
rïn/b’e (go)rïn yi’b’e
rat/b’e (go)rat ya’b’e
rïn/tzijon (talk)rïn yitzijon
rat/tz’ib’an (write)rat yatz’ib’an
rïn/käm (die)rïn yikäm
rat/k’ask’o’ (recover from an illness)rat yak’ask’o’

…and, then we add in the third-person singular, and we’re done for the day–plurals can wait until tomorrow. If you don’t want to slog through those with me, I’ll just point you towards these videos on the conjugation of intransitive verbs in Kaqchikel–if you do want to do some slogging, page down past the video links!

Videos about intransitive verb conjugation in Kaqchikel:

https://www.youtube.com/watch?v=q1Yzi6IFcec

Videos about intransitive verb conjugation in Kaqchikel:

The picture at the top of this post shows eight Kaqchikel women from the village of San Marcos La Laguna, in the Lake Atitlán region. They were recipients of a micro-loan that enabled them to go into business selling the fabric that they make at home. Their clothes are the everyday wear of Kaqchikel women in that area. Picture source: here.

The Navy SEALS broke into tears: irregular past-tense verbs in English

This post contains material from the New York Times article Anguish and Anger From the Navy SEALs Who Turned In Edward Gallagher, by Dave Phillips, published on December 27, 2019. The post will help you learn to use irregular past-tense verbs in English, and to understand why the US military voted for Joe Biden, 50%-45%.

So, one day I land at an airport in the US, and I jump in a taxi, and the driver is an Oromo guy. He’s listening to some American talk radio program, and as we get to chatting, I see that he speaks excellent American English. One thing, though: he gets lots of common irregular past tenses wrong. Eated, speaked, seed–stuff like that. And I wonder: you listen to American English radio all day, you speak to anglophones all day, you’re totally immersed in the language–how do you still mess up common irregular forms? Not in a critical way, right? Non-rhetorical question: how does one manage to speak a language pretty well while still fucking up really common aspects of the language?

Six years go by, and I go from having taken one semester of French in college in the early 1990s to speaking French well enough that when I meet a Frenchie for the first time, I typically have to convince them that I’m an American. (No French person ever thinks that I’m from where they’re from, but, yeah–I usually have to tell them that I’m an American, and I usually have to insist.)

About 80% of French verbs form their past tense (participle, actually, but whatever) by adding é at the end–the past participle lu of the verb to read is highly irregular.

And yet: yesterday I’m talking to a francophone friend, and I ask him the French equivalent of Did you readed the article that I sent you? (T’as li le reportage que je t’ai envoyé, rather than T’as lu le reportage que je t’ai envoyé). I watch YouTube videos about every subject under the sun in French, I read Wikipédia in French, I occasionally go up to two weeks without speaking anything but French–in other words, I am every bit as immersed in French as that Oromo guy is in English, but I still fuck up really frequent irregular forms.

From Wikipedia: “The United States Navy Sea, Air, and Land (SEALTeams, commonly known as Navy SEALs, are the U.S. Navy’s primary special operations force and a component of the Naval Special Warfare Command. Among the SEALs’ main functions are conducting small-unit special operation missions in maritime, jungle, urban, arctic, mountainous, and desert environments. SEALs are typically ordered to capture or to eliminate high level targets, or to gather intelligence behind enemy lines.[6]

Wikipedia

French speakers have been incredibly kind and patient about correcting me for such things for several years, and as a measure of respect and thanks, I offer herewith a little exercise on English verbs that are irregular in the past tense. As material, we’ll use an article published by the New York Times just under a year ago. It’s an excellent piece for this exercise because it includes a number of irregular past tenses. It was published on the occasion on Donald Trump’s interference in a military trial. The circumstances: a US Navy SEAL murdered a prisoner. His own troops turned him in. In a surprise twist, one of the witnesses claimed that he, not the SEAL in question, had murdered the prisoner, and the SEAL was acquitted on most charges. He was convicted on a relatively minor charge, and was demoted as punishment. President Trump reversed the demotion–an excellent way to weaken any military force is to destroy its mechanisms of discipline, and Trump socked the US military in the gut with that move. It wasn’t the end of the story, either, but we’ll get to that later. With that context: let’s get to some verbs.

Present PastPast participle
breakbrokebroken
saysaid said
bewas (singular), were (plural)been
cancouldbeen able to
telltoldtold
Irregular past tenses and past participles of several common English verbs

The Navy SEALs showed up one by one, wearing hoodies and T-shirts instead of uniforms, to tell investigators what they had seen. Visibly nervous, they shifted in their chairs, rubbed their palms and pressed their fists against their foreheads. At times they stopped in midsentence and broke into tears.

“Sorry about this,” Special Operator First Class Craig Miller, one of the most experienced SEALs in the group, said as he looked sideways toward a blank wall, trying to hide that he was weeping. “It’s the first time — I’m really broken up about this.”

Video recordings of the interviews obtained by The New York Times, which have not been shown publicly before, were part of a trove of Navy investigative materials about the prosecution of Special Operations Chief Edward Gallagher on war crimes charges including murder.

They offer the first opportunity outside the courtroom to hear directly from the men of Alpha platoon, SEAL Team 7, whose blistering testimony about their platoon chief was dismissed by President Trump when he upended the military code of justice to protect Chief Gallagher from the punishment.

“The guy is freaking evil,” Special Operator Miller told investigators. “The guy was toxic,” Special Operator First Class Joshua Vriens, a sniper, said in a separate interview. “You could tell he was perfectly O.K. with killing anybody that was moving,” Special Operator First Class Corey Scott, a medic in the platoon, told the investigators.


OK, we’ve seen the examples: now let’s practice using these irregular past tenses. We’ll practice using a technique called a cloze. It involves filling in a blank; it’s a common testing technique in foreign language teaching, and 40+ years ago when I was a young sailor, it was used for teaching pretty much anything via programmed learning. I’ll give you the material from the original article, but with the past-tense verb under test replaced by its infinitive form; you will replace it with the past tense form.


The Navy SEALs showed up one by one, wearing hoodies and T-shirts instead of uniforms, to tell investigators what they had seen. Visibly nervous, they shifted in their chairs, rubbed their palms and pressed their fists against their foreheads. At times they stopped in midsentence and break into tears.

“Sorry about this,” Special Operator First Class Craig Miller, one of the most experienced SEALs in the group, say as he looked sideways toward a blank wall, trying to hide that he was weeping. “It’s the first time — I’m really broken up about this.”

Video recordings of the interviews obtained by The New York Times, which have not been shown publicly before, be part of a trove of Navy investigative materials about the prosecution of Special Operations Chief Edward Gallagher on war crimes charges including murder.

They offer the first opportunity outside the courtroom to hear directly from the men of Alpha platoon, SEAL Team 7, whose blistering testimony about their platoon chief be dismissed by President Trump when he upended the military code of justice to protect Chief Gallagher from the punishment.

“The guy is freaking evil,” Special Operator Miller tell investigators. “The guy was toxic,” Special Operator First Class Joshua Vriens, a sniper, say in a separate interview. “You could tell he was perfectly O.K. with killing anybody that was moving,” Special Operator First Class Corey Scott, a medic in the platoon, tell the investigators.


Vocabulary notes

  • midsentence: How it was used in the NY Times article: At times they stopped in midsentence and broke into tears. More examples:
    • Trump has provided a dark, dank hole into which these folks can dump whatever it is they’re mad about. Even contradictory views, since Trump frequently changes viewpoint in midsentence, can happily nest there, swelling and breeding like poison fungus.” (Source: Twitter)
    • Giuliani is literally trying to backtrack midsentence as he’s realized what already came out of his mouth? (Source: Twitter)
  • trove: How it was used in the NY Times article: Video recordings of the interviews obtained by The New York Times, which have not been shown publicly before, were part of a trove of Navy investigative materials about the prosecution of Special Operations Chief Edward Gallagher on war crimes charges including murder. More examples:
    • While clearly using Trump-friendly words here, Kelly knows national security agencies have a trove of incriminating information on Trump & Co. which will be revealed during an orderly transition. The walled-in White House is burning up the shredders… (Source: Twitter)
    • The BP Senate Report provides a treasure trove of new details abt Donald Trump’s relationship with Moscow, & says that a Russian National, Konstantin Kilimnik, who worked closely with Trump’s presidential campaign in 2016 was a career intelligence officer. (Source: Twitter)
  • blistering: How it was used in the NY Times article: They offer the first opportunity outside the courtroom to hear directly from the men of Alpha platoon, SEAL Team 7, whose blistering testimony about their platoon chief be dismissed by President Trump when he upended the military code of justice to protect Chief Gallagher from the punishment.
    • Leading health experts have delivered a blistering rebuke of Donald Trump’s decision to halt U.S. funding for the World Health Organization. (Source: Twitter)
    • Conservative Judge issues blistering rebuke of Supreme AG Barr (Source: Twitter)
  • freaking: In this context, it is a euphemism for the adjective fucking. How it was used in the NY Times article: “The guy is freaking evil,” Special Operator Miller told investigators. Some more examples:
    • John McCain already told everyone he graduated last in his class but look at a war HERO he became. You do not deserve to lick the dirt on John McCain’s boots. Trump, you are filthy disgusting lowlife and an freaking TRAITOR. You whine like a little girl. (Source: Twitter)
    • Just saw Giuliani on Wolf Blitzer. ….what a freaking idiot. (Source: Twitter)

I had a dream: Subjunctives in English and elsewhere

I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all.

So, I laid in bed staring at the ceiling for what felt like an interminable amount of time.  Finally got up and looked at the clock: 4 AM. Before I woke up, I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all.

Before I woke up, I’d been dreaming that I was sleeping really well.  In fact, I wasn’t sleeping well at all: English is my native language, but I’m not sure that what I just said makes sense.  It seems hopelessly unclear.  Is it the case that “in fact” I wasn’t sleeping well in the dream, or that “in fact” I wasn’t sleeping well in real life?  In fact, I wasn’t sleeping well in real life–I was just dreaming that I was.  I don’t know of a way to disambiguate that in English.  What is called for here: a language with a robust past tense of the subjunctive.


The subjunctive mood is the term that is usually given for grammatical structures that express things that are in the realm of wishes, desires, opinions, and possibilities, as opposed to things that are facts.  It just barely exists in English, and as far as I know, in English it is always optional.  To the best of my knowledge, the subjunctive only exists for the verb to be.  Here’s what it looks like, in typical American English and in the Pacific Northwest dialect.  This is a way that you can give someone advice:

  • Typical American: If I was you, I wouldn’t do that.
  • Pacific Northwest: If I were you, I wouldn’t do that.

The difference: in typical American English, you would use the past tense form for the first person singular: was.  In the Pacific Northwest, you use were.  We use was for the past tense, of course–it’s only in the subjunctive that you see this weird use of the were form.  You use it for other persons, too, in the subjunctive:

  • Typical American: If he was smarter, he wouldn’t have done that.
  • Pacific Northwest: If he were smarter, he wouldn’t have done that.

Well: English does not have a robust past subjunctive at all.  Some languages do, though.  How might I talk about my dream in one of them?  Let’s consider some options.

Modern colloquial French does not have a robust past subjunctive at all.  Literary French does, though–a leftover from earlier forms of the language, and what we would be looking at here is an ongoing action, so it would be the subjunctive imperfect that we’d be using.  (I think–again, I’m not a native speaker.)  Here’s an attempt at both of them, neither of which I speak natively, or even well:

Modern colloquial French: Je rêvais que je dormais bien, tandis que de fait je ne dormais point bien.

Literary French: Je rêvais que je dormisse bien, tandis que de fait je ne dormait point bien.

In contrast with modern colloquial French, modern colloquial Spanish does, in fact, have a robust past subjunctive.  “Robust” in the sense that people do actually use it.  Let’s try that:

Soñaba que durmiera bien, aunque de hecho no dormía nada de bien.

…aaaaaand, with that I see that in Literary French and in modern colloquial Spanish, you can express the case where in real life I wasn’t sleeping well at all, but I don’t see a good way in either language to convey the situation where it’s in the dream that I wasn’t sleeping well.  Have I fucked up all four languages (English, modern colloquial French, literary French, and modern colloquial Spanish) here?  Forgive me, ’cause it’s not even 5 AM, and I didn’t sleep well last night.

Scroll down past the video of the somewhat cute song L’Imparfait du subjonctif, “The Imperfect Subjunctive” (Pourtant je le pus et vous pûtes, hee hee hee) if you want to read the English notes.  Otherwise: go back to bed.


English notes

To disambiguate: To differentiate between two possible senses (meanings) of something (“of an utterance,” as a linguist would put it).  In computational linguistics, it usually means to find the intended sense.

  • In the French sentence L’étagère plie sous les livres (‘The shelf is bending under [the weight of] the books’), it is necessary to disambiguate the sense of livres (which can mean ‘books’ or ‘pounds’ and is masculine in the former sense, feminine in the latter) to properly tag it as a masculine noun. (Ide, Nancy, and Jean Véronis. “Introduction to the special issue on word sense disambiguation: the state of the art.” Computational Linguistics 24.1 (1998): 1-40.)
  • Lapata and Brew (1999) and others have shown that the different syntactic subcategorization frames of a verb such as serve can be used to help disambiguate a particular instance of the word. (Gildea, Daniel, and Daniel Jurafsky. “Automatic labeling of semantic roles.” Computational Linguistics 28.3 (2002): 245-288.)
  • When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? (Bollegala, Danushka, Yutaka Matsuo, and Mitsuru Ishizuka. “Extracting key phrases to disambiguate personal names on the web.” International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin, Heidelberg, 2006.)

For example: I giggled about the lyrics Pourtant je le pus et vous pûtes because when spoken, it is ambiguous: it could mean either however, I could and you could (the intended sense) or however, I stink of it and you whore.  In the latter sense–which, I will note, makes no sense, and we will return to that fact momentarily–it would be written pourtant je le pue et vous pute.  So, it’s not ambiguous in writing, but it is à l’oral. 

Now: almost everything that you will say, hear, write, or read today will be ambiguous in some way.  But, humans are so good at disambiguating that we notice that ambiguity only rarely.  How do we do it?  It’s mostly mysterious, but our behavior is consistent with the notion that we calculate the set of possible meanings and select the one which is most probable.  That’s a very different thing from our normal way of thinking consciously about this, in which I might say that “I stink of it and you whore” makes no sense.  “Makes no sense” implies that there is a binary distinction–either something “makes sense,” or it doesn’t.  When you talk in terms of probabilities, then you are thinking of meanings as something that can be more or less, which is very different from being, or not.

How do computer programs do this?  Computational linguists build systems that work more or less the way that we think humans work: determine the set of possible meanings, calculate a probability for each one, and select the most-probable of the set.  What happens if there’s a tie? Well…read this paper by Antske Fokkens.

Screen Shot 2020-07-09 at 5.33.57 AM

What computational linguists actually do all day: The read-between-the-lines edition

Watch a movie like Arrival and you’ll get the impression that linguists spend their professional lives sitting around speculating about Sanskrit etymologies and the nature of the relationship between language and reality.  I’m not saying that we never do such things, but, no: that’s not what we do with our typical workdays.  I’m a computational linguist, which among other things means that what I do involves computers, which among other things means that I spend a certain amount of my time sitting around writing computer programs that do things with language.  Often, those programs are doing things that do not look very…exciting.  Not to the untrained eye, at any rate.

For other glimpses into the daily life of computational linguists, click here.

Case in point: yesterday I wanted to see how the statistical characteristics of language are affected by different decisions about what you consider a “word.”  You would think that the word “word” would be easy to define–in fact, not only do linguists not agree on what a word is, but you would have a hard time getting all linguists to agree that words even exist.  (One of the French-language linguistics books that I have my nose stuck in the most is Maurice Pergnier’s Le mot, “The Word.”  The first 50 pages (literally) are devoted to theoretical controversies around the question of whether words actually exist–or not. Want a good English-language discussion of the issues?  See Elisabetta Jezek’s The lexicon: An introduction.)

So, yesterday I got to thinking about one of the questionable cases in English: contracted negatives of modal verbs.  Here’s what that means.

In English, there is a small number of frequently-occurring verbs that can (and do) get negated not by a separate word like no, but by adding a special ending, spelled -n’t:

  • is/isn’t
  • did/didn’t
  • have/haven’t
  • could/couldn’t
  • would/wouldn’t
  • does/doesn’t

Note that British English has another form:

I’ve not

…which means I haven’t.

Now, if you care about statistics, you care about counting things.  Think about how you would count the numbers of words in these examples:

  1. I want to go.
  2. I do want to go.
  3. I do not want to go.
  4. I don’t want to go.

(3) and (4) are both perfectly acceptable ways of negating (1) and (2).  How would they affect a program that counts the number of words?  It depends.  Here are the straightforward cases: if (1) has four words (I, want, to, and go), then (2) has five (add do to the previous four), and (3) has 6 (add not to the previous five).

The questionable case is (4).  You could make a reasonable argument that don’t is a single word.  You also could make a reasonable argument that don’t should be counted as two words.  But, which two words?  A reasonable person could propose do and n’t–just split the “stem” do from the negative n’t.  

Fine.  But, let’s look at a little more data:

  1. I will go.
  2. I will not go.
  3. I won’t go.
  4. I can go.
  5. I cannot go.
  6. I can’t go.

Clearly (1) has three words–I, will, and go.  …  (2) adds one more, with not.  What about (3), though?  Is it inconsistent to count will not as two words, but won’t as one?  Maybe.  If you’re going to split it into two “words,” what are they?  Presumably wo and n’t?  But, what the hell is wo?  Is it the same “word” as will?  Notice that we’ve now had to start putting “word” in “scare quotes,” which should tell you that knowing what, exactly, a “word” is isn’t quite as simple as it might appear at first glance.  Think about this: in science you need to know what it is, exactly, the thing that you’re studying, which implies that you can recognize the boundary between one of those things and another.

What’s the right answer?  Hell, I don’t know.  I do know this, though: if you’re interested in the statistics of language (wait–what’s you’re?  Hell, what’s what’s?), then you have to be able to count things, so you have to make some decisions about where the boundaries between them are.  My issue du moment is actually not choosing between the options, but rather seeing what the consequences of those specific decisions would be for the resulting statistical measures, so I need to be able to test the effects of different ways of splitting things up (or not), so I need to write some code…


What you see below is me using a computational tool called a “regular expression” to find words that have a negative thing attached at the end (e.g. n’t) and separate the negative thing from the rest of the word.  So, given an input like didn’t, I want my program to (1) recognize that it has a negative thing at the end, and (2) split it into two parts: did, and n’t.  Grok (see the English notes for what grok means) the code (code means “instructions in a programming language”–here I’m using one called Perl), and then scroll down past it for an explanation of how it illustrates a piece of advice that I often give to students…

# this assumes input from a pipe...
while (my $line = <>) {

print "Input: $line";

# this doesn't work--why?
#$line =~ s/\b(wo|ca|did|could|should|might)(n't)\b$/\$1 $\2)/gi;
# works...
#$line =~ s/(a)(n)/a n/gi;
# this does what I want...
#$line =~ s/(a)(n)/$1 $2/gi;
# works...
#$line =~ s/(ca)(n't)/$1 $2/gi;
# works...
#$line =~ s/(ca|wo)(n't)/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|did)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|did|could)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|did|could|should|might)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|did|had|could|should|might)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|did|had|have|could|should|might)(n't)\b/$1 $2/gi;
# works...
#$line =~ s/\b(ca|wo|do|did|has|had|have|could|should|might)(n't)\b/$1 $2/gi;
# and finally: this pretty much looks like what I started with, but
# what I started with most definitely does NOT work...  what the fuck??

$line =~ s/\b(ca|wo|do|did|has|had|have|would|could|should|might)(n't)\b/$1 $2/gi;

       print $line;

} # close while-loop through input

The “regular expressions” in this code are the things that look like this:

s/\b(wo|ca|did|could|should|might)(n't)\b$/\$1 $\2)/gi

…or, in the case of a much shorter one, like this:

s/(a)(n)/a n/gi

(Note to other linguists: yes, I know that technically, the regular expression is just the part between the first two slashes, i.e. the underlined part s/(a)(n)/a n/gi in the second example.  Don’t hate on me–I’m trying to make this at least somewhat clear.) The lines that start with # are my notes to myself—the “reading between the lines” that you have to do to see how irritating it can be to troubleshoot this kind of thing.


regular expression is a way of describing a set of things.  What makes it “regular”–a mathematical term–is that those things can only occur in a very limited number of relationships.  In particular, that limited number of relationships do not include some phenomena that are very important in language, such as agreement between subjects and verbs–think of Les trois soeurs de ma grand-mère m’ont toujours aimé, “my grandmother’s three sisters have always loved me.”  The issue here is that regular expressions can only describe sequences of things that you might think of as “next to” each other; les trois soeurs is separated from the verb avoir, which must be in the third person plural form ont, by ma grand-mère, which would require the third person singular form a.  (Linguists: I know.)

Regular expressions, and the “regular languages” that they can describe, became of importance in linguistics when B.F. Skinner (yes, the famous psychologist) wrote a book about the psychology of language in which he suggested that they can describe human languages from a mathematical perspective.  This claim caught the attention of one Noam Chomsky, who wrote a book review pointing out the inadequacy of the idea of regular expressions as a description of human language.  The review brought him a lot of notice, and he went on to develop the ideas in that review into the most widespread and influential linguistic theory since the Tower of Babel.  Today, if you’ve only heard of one linguist, it was almost certainly Chomsky.

Chomsky’s critique of “regular languages” included the observation that there are perfectly natural things that can be said in any human language that can’t be described by a regular language.  For example:

Me, my brother, and my sister went to William and Mary, Indiana University, and Virginia Tech, respectively.

The problem that this illustrates for regular languages is that they don’t have a mechanism for accounting for the fact that you can have sentences where you have a list of things in an early part of the sentence, and then must have a list of things of the same length in a later part of the sentence.  Don’t believe me? Go read a book on “formal languages,” and then try it.


Linguistic geekery

Regular expressions are pretty natural tools for people who work with textual data, and they’re especially natural for linguists.  This is a surprise to a lot of computer scientists, some of whom are masters of regular expressions, but some of whom find them irritatingly bewildering.  It turns out that if you take a course on the “formal foundations” of linguistics, i.e. its groundings in logic and set theory, you will run across regular languages, which fact makes regular expressions pretty easy to learn.  And, for textual data, they are really useful even despite their limits–so much so that a programming language (named Perl) was created expressly for the purpose of making it easy to use regular expressions to “parse” textual data.  So, when I found myself wanting to be able to rip through a bunch of textual data and find the negative things like n’t, Perl and its regular expressions were a logical choice.


 

Why the fuck would you…

So, I’m wandering around backstage in a theater trying to keep my cousins from making me help build props when I come across the following sign on a storage locker:

…and I wonder: why the fuck would you tell someone to spray a paint can?

I slowly digest the bright-red color of the cabinet. I slowly digest the “FLAMMABLE” signs. I slowly digest the fact that I am apparently becoming senile.


A paint can:

Picture source: https://www.google.com/imgres?imgurl=https://shop.thepurplepaintedlady.com

A can of spray paint:

The verb “to spray:”

An apparently senile computational linguist [PHOTO OMITTED TO PROTECT PUBLIC SENSIBILITIES]

Nelson Algren’s “The man with the golden arm” and a problem in semantic theory

She’s the kind got the sort of heart you can walk in ‘n out of with boots on.

He brushed his shot glass off the table and stood up.

When I took the GREs–the Graduate Record Examinationsthe test that you take in the US when you want to go to graduate school–I scored in the top 1 percentile on vocabulary.  I say that not to brag, but to give you some quantitative measure for when I say that in English, I know a lot of words.  That doesn’t mean that I never have to look anything up, though.

Molly could not see him weaving against the table out there in the dark while he was trying to understand to himself whether it was time for him to leave, before she saw him, or time to go to her before he lost her again.


51Xx-4xBPYL._SX351_BO1,204,203,200_
Eeyore is frustrated. The subtitle, “Le mot n’est pas la chose,” says “The word is not the thing.”

From a linguist’s point of view, the challenge of definition is not to say what a thing is.  (Please, no hate mail–yes, I know that we define words, not things.)  Rather, the challenge of definition is to say what it is not.  I don’t mean this in a Saussurean sense, necessarily, but just from a practical point of view: tell me what a chair is.  OK, I get that you are not talking about a bed.  But, is what you are describing distinguishable from a couch?  How about from a bench?  A loveseat?  A stool?  A loveseat?  A recliner?  A doll-sized chair?  A toilet? The table below gives you an example of the kinds of definitional gymnastics that you find yourself going through in such exercises.  I have adapted this from Sandrine Zufferey and Jacques Moeschler’s Initiation à l’étude du sens : sémantique et pragmatique , the best introductory text on semantics that I’ve seen thus far.  Unfortunately my copy is sitting at the base of the Rocky Mountains right now, so I made up the details.  Oh, yeah–and unlike their table, mine’s in English.

chair stool armchair couch loveseat bench
must have back x x x x
armrests x x x
room for two people but no more x
room for more than two people x x
can have as few as three legs x

He felt a sickening sort of shame, this was just the way he wished not to be in finding her again: broke, sick and hunted.  What was it someone had said of her long ago?  “She’s the kind got the sort of heart you can walk in ‘n out of with boots on.”

So, today I wake up at 4 AM, as I often do.  Normally I start my day with the American news, but the country that I love so much is falling apart so quickly these days that I felt like I needed a few minutes to prepare myself before facing the latest revelations regarding Trump helping Putin with his little Ukrainian problem.  I pulled out the novel that I’ve almost finished–Nelson Algren’s The man with the golden arm.  I laid it down last night at a point where our hero, on the lam from the coppers, has gone looking for his lost love in a bar in an even seedier part of town than his own.  There’s a sort of burlesque show in the bar, and he spots his flower in the chorus line.  He is in big trouble, he’s starting to jones for his next fix (that’s junkie slang: he is going into withdrawal and needs a hit of morphine: broke, sick and hunted), and he is truly at the end of his rope.  A lifesaver: he’s found his girl.  But: as she leaves the stage, he knows full well that he does not want her to see him like this.

Then the act was done and she was gone, they were all gone as if they hadn’t been there at all.  As though the whole act had been a kickback from an overcharge, something he’d formed in his brain out of beer fumes and smoke.


51xMJEbNPzL._SX348_BO1,204,203,200_
Herbert Terrace’s book on the topic. Spoiler alert: “not as far as I can tell.”

Being a linguist and knowing the primacy of not specification, but rather differentiation, in matters of definition, it bugs the shit out of me that I know lots of words such that I know what category of thing they are, but I could not begin to tell them apart from other things of the same class–by very venerable linguistic theory, this should not happen.  For example: I know that amaryllis, dahlia, and freesia are all flowers, but I could not point any of those three out to you on a bet.  I know that opal, tourmaline, and amethyst are gemstones, but again–hand me three gemstones and ask me if one of them is a tourmaline or not, and I’m just gonna scratch my beard and excuse myself to go to the bathroom.  (Minus the beard-scratching, that last tactic for dealing with social discomfort turns out to be a pretty plausible example of how people end up claiming that they have taught chimpanzees American Sign Language.  A story for another time, perhaps.)


Yet went weaving heavily through smoke and fumes toward the tiny dressing room offstage.

Wearing army brogans on his feet.

OK, so… I already know that brogans are a kind of footwear–it’s not like I’ve never run into the word before.  But, I couldn’t tell you what kind.  The character is a recently-discharged World War II veteran, and his brogans have been mentioned many times in this novel, rom other references over the course of the novel to his heavy-footed walking, I infer that they are…well, heavy.  But, Algren didn’t say a few sentences earlier that his love was “the kind got the sort of heart you can walk in ‘n out of with boots on,” and then specify what kind of footwear he’s wearing as he walks into her dressing room after not having seen her for months, by accident.  (Algren was a treasure of the post-war American novel–he doesn’t do shit like that by accident.  A French connection: he was Simone de Beauvoir’s other lover.  Of course she left him for Sartre, who had translated Algren’s novel Never come morning into la langue de Molière.)

So, off I go to the dictionary.  And to Wikipedia.  And to Google Images, too, ’cause it is sometimes a damn fine resource for jury-rigged visual definitions.  (A little topical reference there: jury-rigged, which means something like “improvised with whatever happens to be at hand,” is said to be derived from the wartime slang term to jerry-rig.)  What I find: a brogan is a low-topped boot.  The picture at the top of the page shows a pair of WWII-era US Army brogans.  The gaiters worn above them were made redundant when combat boots became standard issue–they’re higher, so you don’t need the gaiters to “blouse” your trouser legs.  A contemporary reader would have known what he meant; reading the book today, which was written before I was born–a very long time ago–I knew that brogans were footwear, but hadn’t a clue what kind.  So: top 1 percent on the vocabulary portion of the GRE (don’t be too impressed–I was around the 50th percentile on math, maybe even lower), but I had to look a word up.


That’s being a linguist for you… The beauty of it is that you’re constantly immersed in your data, and the horror of it is that you’re constantly immersed in your data.  As far as definitions go: as my colleague Orin Hargraves, a fine lexicographer, pointed out to me while we were working on our paper Three dimensions of reproducibility in natural language processing, in which we and a cast of thousands of other colleagues proposed a set of definitions for talking about the results of experiments–trying to propose definitions might be somewhat pointless anyways, as in the end word meanings are determined by how they are used within the structure of the language, not by any prescriptive authority.  Did my linguisticness interfere with my enjoyment of Nelson’s finely-wrought prose?  Did it actually make me more aware of its beautiful craftsmanship?  I don’t know.  What I do know: now I’m going to go see what happens when he gets to her dressing room.

 


Want to know more about the myriad complications of thinking about definitions?  See Elisabetta Ježek’s excellent book The lexicon: An introductionSource of the picture of a pair of brogans at the top of the page: Eastman Leather Clothing Blog, blog.eastmanleather.com/view-post/the-us-combat-boot.


English notes

He was trying to understand to himself whether it was time for him to leave, before she saw him, or time to go to her before he lost her again. 

…is weird.  I have never heard the construction understand to [someone].  A quick search on Sketch Engine, purveyor of fine linguistic corpora and the tools for searching them, reveals nothing similar (yes, I did a Word Sketch, too):

Screen Shot 2019-10-04 at 05.03.33

What you do on Saturday night if you have no life whatsoever

That’s a whole lotta accents…

If you have no life whatsoever, what you do on Saturday night is (a) study French verb conjugations, and (b) binge-watch the excellent Netflix series Criminal: France–and not necessarily in that order, either.

I’ve recently been working on the passé simple, a French tense that’s used in some genres of writing, but only very rarely in the spoken language.  I love les chapeaux chinois (circumflex accents), and one of the nice things about the passé simple is that it uses them.  Specifically, they appear in the nous and vous forms: nouss aimâmes/finîmes/prîmes, vous aimâtes/finîtes/prîtes.

Find a verb with a circumflex accent in the stem, and it gets really fun.  So, it’s Saturday night, and I’m sitting on the back porch smoking a cigarette and and doing some exercises on the French Verb Forms iPhone app (no, I am not sponsored by Netflix, French Verb Forms, or Apple–I pay for that stuff just like everyone else), when I am presented with the verb apprêter “to prepare” to conjugate: Circumflex City!

How to write a personal statement for a grad school application

There is a bit of an art to writing a personal statement for a graduate school application. Here’s how to do it.

Applying to a graduate program means filling out a lot of paperwork–and writing a thing or two yourself. One of those things is called a personal statement, and there is a bit of an art to writing one.  Here’s some advice for doing it.

The first thing to know about a personal statement is this: it’s not actually personal.  Your goal in a “personal statement” is not to tell the admissions committee who you are “as a person,” but rather to take advantage of this opportunity to speak to them to show that you would be a good fit for their program.

What that means: you want the admissions committee member who is reading your statement to finish saying this to themself: oh–they could work with our faculty member Dr. Zipf [insert some actual faculty member of the institution in question, unless you’re applying to my institution].   (The pronoun themself is explained in the English notes below.)

How you lead them to that happy conclusion: don’t tell them, but show them.  Here are some things that you can do:

  1. State that you are interested in one or two specific areas of research of that department.
  2. State that you became interested in the/those topic when doing a research project on that topic…
  3. or, if you have not done research on that topic, then that you got interested in it/them while doing research on some other topic and coming across a paper on the topic by some member of the faculty of the department to which you are applying.
  4. List some areas of specialization within that topic or some related topics that you would be interested in working on, where those specializations or related topics are actually areas of research that members of the department to which you are applying work within.

Why I say one or two: you very much want to avoid a situation where (a) only one person in the department works on a topic, and (b) you don’t know it, but that person is getting ready to retire/move to another institution/begin a three-year period as the Associate Dean for Reproducibility, or something.  You avoid that situation by either (a) talking about a topic that two or more people in the department actually work on, or (b) talking about more than one topic.

Now, you may be asking yourself: what if I can’t find anyone in the department who works on my area of interest? The answer:

If you cannot find anyone in the department who works in your area of interest, then that department is not a good fit for you.

…and that’s exactly what the department wants to know.  In fact, if you apply to a graduate school and they don’t accept you, it is entirely reasonable to assume until proven otherwise that they’re not rejecting you, but just don’t see their department as the right place for you.

Need to know how to ask for a letter of recommendation for graduate school?

Click here.

This post is written on the basis of my time on the admissions committee of a medium-sized graduate program in computational biology.  If you have other perspectives/opinions on the subject, please add them to the comments below!


English notes

When you get deep into the weeds of the English language, one of the things that you run into is dialectal variation in pronoun use.  For example:

Dative pronouns in conjoined subject noun phrases: In the Pacific Northwest region of the United States, if you have a subject with two more people joined by a conjunction (e.g. and or or), then the pronouns are in the dative form, not the subject form.  For example, look at these contrasts:

  • I’m going to the store.  (subject)
  • He’s going to the store.  (subject)
  • Me and him are going to the store. (dative)
  • Him and me are going to the store. (dative)
  • Anaïs is going to the store. (subject)
  • They are going to the store. (subject)
  • Anaïs and them are going to the store. (dative)

Even in the Pacific Northwest, you don’t have to talk this way–it’s pretty regionally specific, and people will understand you just fine if you say he and I are going to the store.  But, if you are in that part of the country, you have to be able to understand it.

Atypical reflexive pronouns: Other oddnesses have to do with the reflexive forms of pronouns.  For example, in my dialect, the third-person plural forms they/them/their are used if you don’t know the gender of the referent.  Straightforward enough–that usage goes back centuries in English. But: in a reflexive context (i.e. when the subject is doing something to itself or for itself), you get a variety of forms, depending on number:

  1. You want the admissions committee member who is reading your statement to finish saying this to themself: oh–they could work with our faculty member Dr. Zipf [insert some actual faculty member of the institution in question, unless you’re applying to my institution].  That is obscure enough that it does not even show up in Merriam-Webster’s online dictionary.
  2. My aunt and uncle bought themselves a new copy of the compact edition of the Oxford English dictionary. This plural form is totally standard American English.
  3. My aunt and uncle each bought themselfs a new pair of sunglasses. …and that one, again, does not show up in Merriam-Webster.

This raises a question: how would someone who doesn’t speak a dialect like this say (1) and (3)? I’m pretty sure that in (3), they would say themselves.  But, (1)?  I don’t know another way of saying it–native speakers?

The picture at the top of this post is of Oxley Hall on the Ohio State University campus. I had the pleasure of getting a master’s degree in linguistics there in the 1990s. Mostly we hung out in the basement analyzing spectrograms, but we would occasionally sneak up into the tower.  Fun.

 

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues