How to irritate a linguist, Part 5: English irregular past-tense verb practice

You just stand there, silent–trying not to glare balefully, either at your new interlocutor or–more likely–at the back of your departing host.  This is a mistake.

You’re at a party, sipping a dark beer and minding your own business, when the host introduces you to another cheerful attendee.  Biggie, meet Zipf.  He’s a linguist. …and disappears.

Conditional probability is the likelihood of some event given some other event.  For example: the probability of the word barf being said is, in the absence of any other information, equal to the frequency of the word barf being said divided by the frequency of any word whatsoever being said.  For example: I went to the Sketch Engine web site, your home for fine linguistic corpora and the tools to search them with, and searched a collection of 15.7 billion words of English scraped from the Web in 2015 and found that the word barf occurred 0.12 times per million words.  In other words: in the absence of any other information about what’s being said, you can expect that you will run into the word barf once every 8 million words or so.

If dogs are being talked about, the situation changes.  If you look only in the vicinity of the word dog, then the frequency of barf is 2.41 times per million words.  In other words, when dogs are under discussion, you will run into the word barf every 415,000 words or so.   So: the probability of the word barf is 0.12, and the conditional probability of the word “barf” given that you have seen the word “dog” is 2.41.

An aside: it isn’t necessarily the case that having seen some word tells you anything about the probability of seeing another word.  For example, the probability of the word barf and the probability of the word barf given that you have seen the word the are probably equal.  When the probability of some event (say, seeing some word) and the probability of that event given some other event (say, having seen some other word) are equal, we say that they are conditionally independent.  When the probability of some event is not the same as the probability of that event given some other event, we say that they are conditionally dependent.  

Someone saying Wow–irregular verbs, huh?  Aren’t they weird?  …is equal to the frequency of wow–irregular verbs, huh?  Aren’t they weird?  …being said, divided by the frequency of anything whatsoever being said.  In other words: vanishingly small.  However, the probability of someone saying Wow–irregular verbs, huh?  Aren’t they weird? …given that you have just been introduced to someone as a linguist is not vanishingly small–it is much, much larger than vanishingly small.

Just to be sure that we’re all paying attention here:

  1. Suppose that the probability of the word barf is not equal to the probability of the word barf given that the word dog has been said.  These two words are:
    1. conditionally dependent
    2. conditionally independent
  2. Suppose that the probability of the word barf is equal to the probability of the word barf given that the word the has been said.  These two words are:
    1. conditionally dependent
    2. conditionally independent
  3. The probability of someone saying Wow–irregular verbs, huh?  Aren’t they weird? is much higher if they have just been told that you are a linguist than the probability of someone saying Wow–irregular verbs, huh?  Aren’t they weird? …given no additional information.  Those two events are
    1. conditionally dependent
    2. conditionally independent

Answers: (1) conditionally dependent, (2) conditionally independent, (3) conditionally dependent.

What’s so irritating about this?  The answers to that question are probably as numerous as the number of linguists in the world (which is to say: not enormous, but not zero, either), but here are my top 5 explanations:

  1. Your question is looking for a specific answer–yes, they sure are weird–but I do not, in fact, think that English irregular past-tense verbs are weird, so I feel pressured into lying, so fuck you.
  2. Talking about what’s interesting about English irregular past-tense verbs (I said interesting, not weird) would require me finding a napkin and a pen with which to draw on it, and no one seems to carry pens anymore, so I would have to wander around the party like a bumbling idiot, breaking up innumerable conversations while I looked for one, plus I have facial hair, so I really need my napkin.
  3. A reasonable linguist would suspect that if they engage with you on this question, then they’re going to find themselves in an annoying conversation with you about linguistic complexity, and that would really ruin their evening, which given that I was just sipping a dark beer and minding my own business seems pretty unfair.

English is the language of my profession, and I know an enormous number of non-native speakers who can read and write it close to perfectly.  But: drink a couple of beers at the Association for Computational Linguistics convention meet-and-greet, get into an animated conversation about the inability of Big Data to demonstrate causality, and anyone will start to trip over irregular forms.  If you’re speaking English, that’s probably mostly going to involve irregular past-tense verbs.  But: practice makes perfect better, so: let’s practice!

Today we’ll look at irregular past-tense verbs that follow a specific pattern.  In this pattern, a verb with the vowel [i] (International Phonetic Alphabet) in the present tense has the vowel [ε] in the past tense.  Examples:

  • feed/fed
  • lead/led
  • meet/met
  • read/read
  • lead/led

Notice that I’m grouping these verbs by pronunciation, not by spelling–our goal here is to help you develop spoken habits.  (Mécanisation–thanks, Phil d’Ange!)  The astute reader (OK, a linguist) might also have noticed that those verbs all end with one of the two English “alveolar oral stop consonants:” that is, with a or a d.  Other verbs that have the [i]-in-the-present-[ε]-in-the-past pattern may add a or a d:

  • feel/felt
  • creep/crept
  • keep/kept
  • kneel/knelt
  • leave/left
  • mean/meant
  • leap/leapt (leaped is also possible)
  • cleave/cleft (cleaved and clove are also possible)
  • flee/fled
  • sleep/slept
  • sweep/swept
  • weep/wept
  • deal/dealt
  • dream/dreamt (dreamed is also possible)
  • plead/pled (pleaded is also possible, and I think common these days, at least in the US)

OK: practice time!  Here are some sentences that include past-tense verbs of the [i]-in-the-present-[ε]-in-the-past pattern.  Read them out loud, replacing the present-tense verb in parentheses with the past-tense form.  (In some of these examples, it’s actually a past participle that happens to have the same form as the past tense.)

All examples are from the New York Times story Trump and Putin have met five times.  What was said is a mysteryby Peter Baker, published January 15th, 2019.  I have edited some of them for clarity, e.g. by replacing they with Trump and Putin. 

The first time that Trump and Putin (meet) was in Germany.

Each of the five times President Trump has (meet) with Mr. Putin since taking office, he has fueled suspicions about their relationship.

The unusually secretive way he has handled these meetings has (leave) many in his own administration guessing what happened and piqued the interest of investigators.

At the height of the campaign, his son, son-in-law and campaign chairman had (meet) at Trump Tower with Russians on the promise of obtaining dirt on Mrs. Clinton from the Russian government.

Their most famous meeting came on July 16, 2018, in Helsinki, where they talked for more than two hours accompanied only by interpreters.  The Kremlin later reported that the leaders reached important agreements, but American government officials were (leave) in the dark.  American intelligence agencies were (leave) to glean details about the meeting from surveillance of Russians who talked about it afterward.

The picture at the top of this post is an MRI of the vowel [i] being pronounced.  Source: I don’t remember, but if you care, I’ll look it up.  Enjoying the How to irritate a linguist series?  Here are the previous episodes.

Ducklings and goslings and inklings, oh my

That moment when the elves take your baby and leave one of theirs in its place.

I dragged myself out of bed at 8:30 AM today.  Under normal circumstances, if I’m still in bed at 5:45 AM, it means that I had a rough night–I am most definitely both a morning person, and an early riser.  Seulement voilà (“the thing is”):

  1. At this time of year, it doesn’t get light outside in Paris until about 8:30 in the morning.
  2. At 2 AM I got obsessed with the need to learn all of the words for baby animals in French.

Morphemes are the things that words are made of.  For example, the plural cats has two morphemes: cat, and the that carries the meaning of plurality.  (This happens to be the example from which my child learned what a morpheme is–as a young child, and as we did the dishes together.  Must suck to be a linguist’s kid…)

English has an odd little morpheme that refers to things that are small.  Like the of cats, it is what is called a bound morpheme, meaning that it cannot be a word on its own–it has to be attached to something else.  (Contrast that with the cat in catnap (a short, light nap), catnip (a plant–it’s basically pot for cats), and cathouse (a brothel–archaic)).  Here are a couple of examples:

  • duckling: a baby duck.
  • inkling: a small hint, or a small piece of knowledge.  (I’ll give some examples of its use later.)
Source:  See the site for helpful information about how to recognize a foundling, return a foundling, etc.

The -ling morpheme is also not productive: that means that you can’t really use it freely to make “new words.”  For example, it’s not clear that anyone would know what you meant if you casually threw the words waterling (parallel to inkling) or penling (parallel to duckling) into a conversation.  (Contrast that with -gate, which over the course of my lifetime has become applicable to practically anything, with the meaning of “a scandal related to:” Bridgegate, Pizzagateetc.)  Because it’s not productive, one could list all of the words in English in which it occurs.  Limited only by my memory, of course.  My best shot at doing so:

  1. duckling: baby duck
  2. gosling: baby goose
  3. foundling: a child who has been found after having been abandoned
  4. changeling: when the elves take away your baby and leave one of their own in its place
  5. inkling: a small hint, idea, trace, piece of knowledge, clue

In the Foundling Hospital grounds, London, c1901 (1901)
The London Foundling Hospital in 1901, from an article about a 1911 foundling lottery in Paris at

Now, I know what you’re thinking: Zipf, you’re a drooling idiot.  There are lots of words in English that end with -ling: for example, DROOLING.  Feeling, wheeling and dealing (French: mic-mac or micmac), healing… 

Well… I may be an idiot, but I’m not a drooling one.  Here’s the thing: a morpheme is defined by its sound (or spelling)–in our case, ling–and by its meaning.  Drooling and gosling (baby goose) contain the same sounds/letters, but not the same meaning of smallness, so it’s not the case that they share the same morpheme.  -ling is a pretty textbook (French: typique) example of a non-productive morpheme.

So, yeah: I don’t sleep much, and I’m trying to learn to speak French, so at 2 AM I got obsessed with learning the names of baby animals in French.  This web page got me started, and then I started searching for weird English-language baby animal names (say, gosling), and here you see the results.  (Yes, some occur more than once.) At 2 AM, I only knew chiot (puppy), chaton (kitten), and veau (calf)–how about you?  And, native speakers (Phil d’Ange, I’m lookin’ at you)–can you add some more?

Adult animals:

Juvenile animals:

English-language example sentences


  • The Steel Riders Saga is a sci-fi/fantasy novel about Free Wheeler, a foundling discovered by the legendary Steve Thompson during a deep terrain ATV ride. Thompson leads an ATV pack known as the “Steel Riders.” In their fantastical journeys Free Wheeler finds true love and home.  (Twitter, @quantum_tide)
  • Meanwhile, in Australia, there’s a National . I have never heard of anything so glorious! (Nobody in my family cares about gravy as much as I do. I… might be a foundling?)  (Twitter,
  • Can I just say…Baby Faced Finster. A foundling!! You Naughty Baby!! Hahaha! 😂❤️  (Twitter, @TheSuperAmanda)


  • I’ve mentioned this numerous times on the podcast but… I have an inkling that Nintendo will use Smash DLC to promote upcoming (inc third-party) Switch releases.  (Twitter, @pixelpar)
  • My new resolution is to not read the thread of comments of tweets where I know or have an inkling that it’s not going to be a good thing.  (Twitter, @valparkie)
  • You are a gem of a friend and you don’t have an inkling of how much i appreciate your ignorance of my vices.  (Twitter, @Shakti_Shetty)
  • I don’t have an inkling of what the future holds but I’m excited  (Twitter, @JaredTench)
  • Roommate, Camden *going to Waffle House in Dunn*: “If I get the smallest inkling of a crack-whore, I’m leaving!”  (Twitter, @dr_pattyguin)


What computational linguists actually do all day: The lexical frequency version

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another. 

Tell someone that you’re a computational linguist, and the next thing out of their mouth is likely to be either:

  1. How many languages do you speak?, or…
  2. What’s that?

In theory, computational linguists spend their time thinking about fun questions like:

  1. Is natural language Turing-complete?
  2. The relationship, if any, between what we know about words (say, the word dog can be a noun or a verb, and it occurs more often with the words bark and leash than with the word meow) and what we know about the world (say, a dog is a canine, and might like to chase balls, and will eat cat shit if not instructed otherwise).
  3. How Zipf’s Law, which describes the fact that a small number of words are extremely common, while a large number of words are extremely rare, but do occur, might or might not be related to the mathematical phenomenon of the fractal.

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another.  (OK: that, and writing grant proposals.)  Think that being a computational linguist sounds glamorous?  Here’s how I spent my morning.

All I gotta do: go through a bunch of documents and count how often each word in that bunch of documents occurs.  Easy-peasy–barely hard enough for a homework in Computational Linguistics 101.

Seulement voilà…

Screen Shot 2018-12-03 at 12.54.58

Easy enough to fix–I just failed to give the complete name of the program, and…. marde.

Screen Shot 2018-12-03 at 12.57.21

OK, easy enough to fix–I had written

Screen Shot 2018-12-03 at 12.58.58

…when I shoulda written

Screen Shot 2018-12-03 at 12.58.44

Shoulda: the typical spoken form of should have. 

(Note the square bracket near the end of the middle line–I had left it out.)  Great–avançons, alors.  But, no, fuckashitpiss:

Screen Shot 2018-12-03 at 13.03.54

Easy enough to fix–turns out I wrote this:


Screen Shot 2018-12-03 at 13.05.19

…when I shoulda written this:

Screen Shot 2018-12-03 at 13.06.35

(Note the dollar sign before the rightmost instance of words now.)  And so, on we go, but…

Screen Shot 2018-12-03 at 13.07.57

…and it’s easy enough to fix–I had written this:

Screen Shot 2018-12-03 at 13.08.57

…when I shoulda written this:

Screen Shot 2018-12-03 at 13.10.10.png

(Note the double quote before $frequencies{$words[$i]}\n”;) …and now I’m wondering:

  1. These errors were all on one single line–what other horrors have I hidden in this code, and will they be as easy to find as those were?
  2. What the hell was I thinking when I wrote that line?  Was I thinking about the upcoming dissertation defense at 2 PM?  Was I thinking about Trump giving my country to China?  Was I thinking about tomorrow’s colonoscopy? Who the hell knows, really–whatever it was, it apparently wasn’t this line of code…

Mais returnons… Ah marde, but at least this one will be easy to fix…

Screen Shot 2018-12-03 at 13.14.14

…except that I verify the existence of the directory, and then get this:

Screen Shot 2018-12-03 at 13.16.03

…which is the exact same error that I got before.  So, I go back and look at my code, where I see this, and remember that my error message is supposed to print out the name of the directory that it couldn’t open, but it did no such thing:

Screen Shot 2018-12-03 at 13.22.00

…which is ’cause I never gave the program the name of the input directory.  So I take care of that, and also tell my program to print out the name of the directory that it couldn’t open if, it fact, it can’t open a directory–as we saw above, I had planned to do this, but of course left out that little detail:

Screen Shot 2018-12-03 at 13.25.31

…and now I experience a tiny little bit of success, because my program does not crash.  Seulement voilà, it doesn’t actually produce any ouput:

Screen Shot 2018-12-03 at 13.27.18

Note the lack of a bunch of lexical frequencies… So, I go back to my script, and I start looking around in the region of the program where I meant for the output to happen.  I don’t see anything obvious in that area, so: I go further up in the code, and start doing what I need to do to convince myself that the earlier parts of the program are working the way that I intended them to.  This means printing out the results at intermediate steps of the processing. The resulting code (leaving out a bunch of details) looks like this:

Screen Shot 2018-12-03 at 13.32.22

…which does nothing different than it was doing before, so I know that I need to go even further up in the program and, again, print stuff out as I go, resulting in this:

Screen Shot 2018-12-03 at 13.35.57

…which, when I run the script, produces this:

Screen Shot 2018-12-03 at 13.37.25

…which suggests to me that the directory exists, and that I’m opening it correctly, but that I am either (a) reading its contents incorrectly, or (b) making a mistake when I make a decision about whether or not to open each file.  A quick Google search finds the problem for me–I had written this:

Screen Shot 2018-12-03 at 13.41.16…when I shoulda written this:

Screen Shot 2018-12-03 at 13.41.33

(Note that the text at the left end of the line was open, and now is opendir.)

Progress!  Now I get some output, but note the last line–I’m just getting a bunch of file names, and no word frequencies.  I can see the problem right away, though–I have the directory name right, and I have the file name right, but I need to combine them in order to be able to open the file.  Doing so gives me this code:

Screen Shot 2018-12-03 at 13.46.55

…which results in my script running successfully for a while, but then crashing, and I know exactly what causes said crash…

Screen Shot 2018-12-03 at 13.48.10.png

…and I know that it’s a bear to fix, and I’ve been working on this fucking task that’s barely difficult enough to make a good homework assignment for two hours, and now it’s time to go to the aforementioned dissertation defense, and… Soupire…

Meme source:



Sometimes my mouth just stops moving

The hard part is not studying more than one language–the hard part is keeping them separate.

One of the more interesting books that I’ve read over the course of the past couple years was Michael Erard’s Babel no more: The search for the world’s most extraordinary language learners.  It is a book about polyglots and polyglossia–people who speak a lot of languages (as opposed to linguists, who are people who study language in general).

Erard is an actual linguist, and knows what he’s talking about.  One of the points that he makes that I found interesting is that there’s no single recipe for learning a “second language”–in his travels amongst the polyglots, he found that people who are into this kind of thing figure out what works for them, and it’s not necessarily the same approach for everyone.

So: I’m going to show you how I prepare for my annual trip to Guatemala, where I volunteer with a wonderful group called Surgicorps.  (We provide free specialty surgeries for people for whom the almost-free national health care system is still too expensive.)  But, don’t feel like it’s a magic recipe (am I mixing metaphors here?) for success–just know that it has been working for me for the past few years, and there’s something that will work for you.  (Which might be this!)

For context: Spanish is a “second language” for me–one that I can function in for my daily life, and professionally.  But: because I spend at least half of my life in the French language and only speak Spanish when I go to Guatemala, it’s very difficult for me to not mix French into my Spanish incessantly.  (As I believe Erard also points out: the difficulty is not learning a bunch of languages–the difficulty is keeping them apart.)  Consequently, on July 1st of every year since I started spending As Much Time As Possible in France, I cut French out of my life completely.  En contrepartie, on July 1st I start doing the same kinds of things in Spanish that I would normally do in French–listening to the news on the way to work, learning my daily vocabulary words, reading The Walking Dead comics, etc.

I also put together a schedule of everything that I need to work on between July 1st and July 30th.  If you’re unfortunate enough to have been reading my blog for the past couple years, you saw me do this for the month before I took my French C1 test.  The main difference is that for the CEFR exams, I need to include “written production” in the things that I work on–for my volunteer work in Guatemala, I don’t need that, because I almost never need to write anything in Spanish.  So, for Guatemala preparation, I have four main categories of things to focus on:

  1. Vocabulary: technical (medicosurgical)
  2. Vocabulary: general
  3. Grammar
  4. Oral  production

Why do I have an entire “section” for general vocabulary?  Because as I’ve written about before, that’s the biggest challenge.  Medical vocabulary is finite–there are only so many body parts, surgical procedures, etc.  It’s the general vocabulary that gets you–remember that Zipf’s Law reflects the fact that languages are full of words that almost never occur, but, they do.  When the guy comes to the hand surgeon with two mangled fingers hanging there uselessly, the first question that the surgeon asks him is going to be what happened, and the answer to that could be anything.

  • A snake bit me
  • I got a cactus spine stuck in my palm
  • The fuel pump caught fire and exploded while I was in the passenger seat
  • Two guys tried to steal my car and they went after me with a machete

…all of which I have run into.

So, I expand out my vocabulary study into these categories:

  • Vocabulary: technical (medicosurgical)
    • Areas of the hospital
    • Surgical techniques and equipment
    • anesthesia
    • anatomy
      • the hand (because I mostly work with a hand surgeon)
      • gynecology (because I don’t interpret for the gynecologists very often, and therefore like to make sure that I give the terminology a once-over since I don’t have occasion to use it much)
      • the face and head (because we always have multiple plastic surgeons with us)
  • Vocabulary: general
    • the Guatemalan regional dialect (lots of fun loan words, mostly from one or another of the 20+ Mayan languages spoken in the country)
    • professions (see this post for why that gets a day of its own)
    • farm work and other kinds of manual labor (because most of our patient population consists of children or manual laborers–see this post)
    • animals and plants (see above about “anything can happen to your hands”)

I split grammar into three topics:

  1. Conjugation (because when in doubt, I’ll conjugate Spanish verbs as if they were French, and that does NOT work)
  2. Usted forms of verbs (they get a day of their own because it’s the form that I should be using with patients and their family members, but I almost never use it in my daily life)
  3. The subjunctive (much easier in Spanish than in French because it gets used far more often in Spanish, so you don’t have to think about it as much–my French problem is that I use the subjunctive too often)

Now, I know you’re wondering: why do I have oral production on my list, and why don’t I have oral comprehension?  Oral comprehension is the hardest part of learning any language for most people, and oral production is what most anglophones find the easiest part of learning Spanish.  The answer goes back to Michael Erard: the hard part is not learning more than one language–the hard part is keeping them separate.

This comes into play for me in two ways.  One way will be familiar to anyone who has two foreign languages running around in their heads: when you don’t have a word that you need in one language, it’s hard not to substitute it with the word from the other.

The other way that French interference in Spanish works out for me is more subtle, and it’s purely a question of oral production: it’s very difficult for me to say sequences of sounds in Spanish that would not be possible in French.

A problem context that comes up quite often is possessive pronouns followed by vowel-initial nouns.  For example (English followed by formal/informal French and then formal/informal Spanish):

your eye votre œil ton œil su ojo tu ojo
my artery votre artère ton artère su arteria tu arteria

Francophones will note that artère is feminine, but it has the masculine form of the possessive pronoun–mon.  No huge surprise to students of French–any vowel-initial noun takes the masculine, consonant-final, form of words like possessive pronouns.  Where the problem comes up: when I have to say one of those words before a vowel-initial noun in Spanish, my tongue stops.  It’s like it runs into a wall–my mouth just stops moving.  What the fuck??

From a linguist’s point of view: I’ve developed my own little foreign-language phonology.  In languages other than my native one (American English), that little phonology really does not like sequences of vowels at the end of one word and the beginning of the next.  So, I need to say tu abuelita, your grandma, but my phonology really, really wants it to be tun abuelita, or something of that ilk, which does not exist in Spanish… and my vocal apparatus just comes to a halt.

Solution: oral production drills.  Focussed drills, not just making myself speak–that will happen in Guatemala, where I’ll show up a week before the rest of the team to get those Spanish-language juices flowing.  I’ll put together exercises for myself that focus on the specific things that I know I have trouble getting out of my mouth, et voilà.  For example: ¿le duele todavía su axila?  (Does your armpit still hurt?)  Ya hablamos con su abuela (we already spoke with your grandmother).  Both of those are short sentences that force me into saying the vowel + vowel sequences–in these cases, su axila (your armpit) and su abuela (your grandmother) that are so hard for me.

Screen Shot 2018-06-30 at 09.12.08So, you take all of those individual things to work on, mix ’em up to give yourself a little variety in your daily study.  Prioritize things in a way that makes sense for what you plan to be doing with the language–I have a day in there for learning the vocabulary of food and beverages, but that’s more so that I can translate the menu for my fellow volunteers than for the actual volunteer work, so it wouldn’t make sense to be working on that first, and I don’t.  Mix in some review days–review is essential, and you don’t want to do it all at the end.  Boum, as the French kids say–a month’s-worth of work.  I’ll start it on July 1st, and I’ll finish it sitting in the plane on the way to Guatemala on the 30th.  If I screw up and miss a day?  Not the end of the world–I’ll make it up.  If I just can’t stand anesthesia vocabulary on July 11th?  No problem–I’ll just switch a couple days around.  Is the list intimidating?  No–the opposite.  I know that if I prepare, everything will probably go fine, and I know that if I work my list, I’ll be prepared–so, it’s actually reassuring, not intimidating.

Why no days for working on oral comprehension?  Because that’s what listening to the news on the way to work, podcasts while I stretch, etc., are for.  That really has to be part of your daily life–you can’t partition that off into specific days.  Gotta work, work, work your oral comprehension.  On the good side: not one second of the time that you spend doing it will be wasted.

English notes

a couple versus a couple of: this is controversial amongst English speakers.  People who prefer a couple of are likely to complain about those of us who say a couple.  Je les emmerde.  How I used it in the post: If I just can’t stand anesthesia vocabulary on July 11th?  No problem–I’ll just switch a couple days around. 

ilk: maybe acabit in French?  How I used it in the post: My phonology really, really wants it to be tun abuelita, or something of that ilk, which does not exist in Spanish… I think in French something of that ilk would be quelque chose du même acabit, or words to that effect.  Phil d’Ange?

The picture at the top of this post is from  I picked it because in the post I carped about sequences of sounds, and the meme is about sequences of sounds (one in particular–the sound of the ch in English chat, but more on that another time, perhaps).  You don’t get it?  No worries–that just means that you’re cool, not nerdy like some stupid linguist.

Giving back: Pronouncing English words that end with -ive

Paradoxically, the better your skill in a second language, the more your mistakes stick out.

I work with a couple of French folks whose English is so good that they are effectively native speakers, as far as I can tell.  It’s super-impressive—if my French were ever anywhere near as good as their English…

It’s their very skills themselves that make it obvious when they make a pronunciation error–it’s as if I were making a pronunciation error.  It is not at all the case that I don’t make pronunciation errors in my native language, and people most definitely do notice them–but, I suspect that they’re all the more obvious precisely because (a) I’m a native speaker, and (b) I’m an “educated native speaker” (sounds hoity-toity, but it’s a technical term in linguistics).  I would guess that many of my “smaller” mistakes in French go unnoticed because they get lost in the thick fog of all of my other mistakes–in my native language, though, they all stand out.

hoity-toity: pretentious.

So, when my French-speaking-colleagues-who-are-essentially-native-speakers-of-English-too make pronunciation errors in English, it is, indeed, noticeable.  Happily, their English-language pronunciation errors often fall into a single category, and that’s what we’re going to go after today–my little attempt to repay more hours than I even want to think of that they’ve spent hammering on my pronunciation/lexicon/syntax/politeness/EVERYTHING in French.

You may have noticed that written vowels in English are pronounced differently than those vowels would be pronounced in essentially every other written language on the planet.  (That’s just a fraction of all languages, by the way–the vast majority of languages have no writing system.)

The reason behind all of this English-versus-the-world divergence in vowel sound pronunciation is something called the Great Vowel Shift.  It changed the pronunciation of many vowel sounds, and it happened after English spelling was mostly established.  The result was that English vowel sounds didn’t line up with their spelling as well as they used to.

The Great Vowel Shift, with approximate dates–and yes, with some training in phonetics, it does make perfect sense. Picture source:

One of the changes in pronunciation affected words that happen to be spelled with an at the end.  It’s a silent now, but it wasn’t always.  The preceding vowel sound changed–in a very systematic way that requires knowing a bit about what you do with your mouth to make sense of–and one of the consequences was that if that preceding vowel was i, it went from being pronounced like in most languages to being pronounced like the word eye is pronounced today.  

So, today, if you’re an Anglophone kid, you grow up being taught that when a word ends in -iCe, where means any consonant, the indicates the sound of the word eye.  There are plenty of examples of this:

  • five
  • drive
  • dive
  • thrive
  • alive
  • hive
  • archive
  • strive

But–and this is a big “but” (which is why I italicized and underlined it)–iCe (followed by a consonant followed by an at the end of the word) is not always pronounced that way.  There are plenty of times when it is not, and those tend to be longer words that educated people would use, and my French co-workers are super-educated, so they use these words.  For some of the native speakers of French that I know, mis-pronouncing these words is essentially the only mistake that I ever hear them make in English.  So: let’s work through some of these.

You’ll notice something about the words that are pronounced the way that Anglophone kids are told you always pronounce -iCe: they tend to be single-syllable.  Consider:

  • five
  • drive
  • dive
  • thrive
  • live (the adjective only, as in live bait)
  • alive
  • hive

But, not all single-syllable words of this type are pronounced that way.  Here’s the one counter-example that I can think of:

  • give

And, not all of the words in which -iCe is pronounce like “eye” are single-syllable words.  The counter-examples that I can think of:

  • archive
  • derive
  • arrive
  • survive
  • revive
  • deprive

I know what you’re thinking now: Zipf, this is simple–regardless of the number of syllables, the is pronounced as in five if it’s in a STRESSED syllable.  And, yes, that almost works–but, consider archive, which is stressed on the first syllable, but is still pronounced like five.

…and live is weird–when it’s a verb, it’s pronounced like give, but when it’s an adjective, it’s pronounced like five.  

OK, we’re more or less good with the words that end in iCe and get pronounced like five.  What about the words that don’t get pronounced like five?  Let’s take a look at some.  Now, I’m not going to select these randomly.  I went to this web page on the web site.  What it gave me is a list of words that end in -ive, sorted by how frequent they are.  Here’s what the output looks like.  You’ll notice that every word is followed by two numbers.  The first one is the length of the word in letters, while the second one is how many times the word occurs in every million words of text.  (What collection of texts did they do their counts in?  They don’t say.)  So, give is 4 letters long and occurs 1735 times per million words, executive is 9 letters long and occurs 171 times per million words, and so on.

Screen Shot 2018-01-26 at 16.40.33

With that list in my greedy little fingers, I’ll go through it and pull out some of the ones that are not pronounced like five.  That gives us this:

  • receive
  • executive
  • alternative
  • objective
  • representative
  • conservative
  • effective
  • initiative
  • positive
  • relative
  • olive

…and there’s a little attempt to help with the already-almost-perfect English spoken by so many of my French colleagues.  Got a funny story related to mispronunciation?  Tell us about it in the comments…

Matching game IV: Zipf’s Law in French

Zipf’s Law is why if someone is looking for a web page and types “dogs in marseilles” into the query box, your search engine should pay no attention to the word “in,” some attention to “dogs,” and quite a bit of attention to “marseilles.” 

Zipf’s Law describes the frequencies of words: there is a very, very small number of words that occur very, very often, and a very, very large number of words that occur very, very rarely–but, they do occur.  This blog is focused on one of the consequences of Zipf’s Law: it means that if you are seriously studying a second language, you are going to run into words that you don’t know every day for the rest of your life.

img_6216You know how the matching game works: we have words in English, words in French, and we match them.  Today’s words (and a tiny bit of grammar) are taken from the discussion of Zipf’s Law in the book Recherche d’information: Applications, modèles et algorithmes, by Massih-Reza Amini and Éric Gaussier, second edition.  Recherche d’information is information retrieval, the task of finding documents in response to an information need: what Google does for you every day.  One of the great embarrassments of linguistics is the fact that information retrieval is mostly about language, in the sense that mostly what you’re looking for is web pages with stuff written for them and you use words to find them–and yet, most of the work of information retrieval is done without actually doing anything that looks very much like doing anything with language.  At its heart, the technology of information retrieval is almost entirely done with counting and very simple arithmetic–nothing linguistic there.  You could think of that very simple arithmetic as taking advantage of Zipf’s Law–the very simple arithmetic is used to figure out things like the fact that if someone is looking for a web page and types dogs in marseilles into the query box, your search engine should pay no attention to the word in, some attention to dogs, and quite a bit of attention to marseilles when it is making the decision about which web pages to put at the top of the search results.  Scroll down to find today’s vocabulary items, and click on the pictures of the relevant pages from Amini and Gaussier’s book if you’d like to see those words in context.  As for me: a second cup of coffee, go over these flashcards, and then off to the lab.  Today’s goal: explain why researchers calculated the ratio of vocabulary size to length of conversation of a bunch of soldiers–after chasing them through the woods, catching them, depriving them of food and sleep, and then interrogating them.


I included La fréquence du second mot because I’ve been trying to understand when to use second and when to use deuxième.  If I understand the Académie’s Dire/Ne pas dire page correctly, the Academy would prefer that this be deuxième, but not even the Académie thinks that it’s mandatory to make the distinction:

On peut, par souci de précision et d’élégance, réserver l’emploi de second aux énoncés où l’on ne considère que deux éléments, et n’employer deuxième que lorsque l’énumération va au-delà de deux. Cette distinction n’est pas obligatoire.

On veillera toutefois à employer l’adjectif second, plus ancien que deuxième, dans un certain nombre de locutions et d’expressions où il doit être préféré : seconde main, seconde nature, etc., et dans des emplois substantivés : le second du navire.

As the web site puts it: C’est pour cela qu’on parle de la Seconde Guerre mondiale parce qu’on espère qu’ il n’y en aura pas de troisième !

Sexual dimorphism in elephant rumbles

I was just getting ready for my day of calculating the ratio of unique words to total words in a bunch of journal articles about spinal cord injury and regeneration when it struck me that there really aren’t enough nice pictures of elephants in our lives. 

I was just getting ready for my day of calculating the ratio of unique words to total words in a bunch of journal articles about spinal cord injury and regeneration when it struck me that there really aren’t enough nice pictures of elephants in our lives.  Not mine, anyway.  Please enjoy the following picture of Chikwenya (left) and Mike (right), two African elephants from Mana Pools National Park in Zimbabwe.  The wavy lines in the middle of the bottom part of the photograph are a spectrogram of an elephant “rumble.”  See the things labelled F1 and F2 in the panels to the left and right?  Those are the first formant (F1) and second formant (F2) of Chikwenya and Mike’s rumbles.  In a human language, it’s the height and spacing of the first and second formants that identify the various and sundry vowels.  Want to know more about African elephant rumbles?  See Anton Baotic and Angela Stoeger’s recent paper on the topic:

Baotic, Anton, and Angela S. Stoeger. Sexual dimorphism in African elephant social rumblesPloS one 12.5 (2017): e0177411.

Want to know more about formants and vowels?  Encourage me in the Comments section.

Off I go for breakfast (see below) and a nice day of calculating the ratio of unique words to total words in a bunch of scientific journal articles about spinal cord injury and regeneration…


Breakfast in Kashiwa, Japan: grilled mackerel and a bit of French grammar.  Are the “macs” in Jean Genet’s “Miracle de la rose” “maquereaux” (“mackerel”, but also “pimps”)?  I honestly don’t know.