morphophonology – Zipf's Law

I love a good monosyllable III: Fleam

Fleam: An instrument for opening veins in bloodletting.

One of the oddities of the lexicon–that is, the set of words that you know–is that you keep learning it for pretty much your entire life. This is quite different from anything else that you know about your native language: you know almost everything that you will ever know about the phonology and syntax of your language quite early in your childhood. In contrast, learning new words can continue to your dying day.

Of course, the rate of learning new words changes over the course of the lifespan. Young toddlers may learn fewer than 5 words a month; between the ages of 2 and 6 years, they’re probably learning more like 30 words a month. When you enter school, the range of semantic classes that the child is learning shifts in the direction of abstract words, from the concrete ones that formed most of their vocabulary acquisition up to that point. If you go to college (la fac in French), you will probably see another spurt in your learning of new words; by the time you finish it, the typical person will know most of the vocabulary that they’re going to have. Certainly not all of it, though. If you were to graph the number of words that you know over time, it would look something like this–fast growth early in life, followed by slow growth later in life, but no end to the growth. (Note that the numbers for vocabulary size are not realistic. Total vocabulary size by age 22 will be much larger than I have indicated, probably on the order of 30,000 words.)

vocabulary.size — Figure source: me. I generated it using a programming language called R. NOTA BENE: the vocabulary size figures are completely unrealistically low–the total vocabulary size should be something like 30,000 words, assuming a college-educated native speaker.

So: I’m 58 years old, and I spent a really long time in college and graduate school, but I am still learning new words in my native language. Some recent ones:

morganitic: relating to marriage between an aristocrat and a non-aristocrat, such that the issue of the marriage do not inherit ranks, titles, and the like.
aramid: a group of synthetic materials used to make textiles and plastics.
mephitic: nasty-smelling.
irredentism: political policy of claiming territories occupied by members of your ethnic group (think Hitler in the Sudetenland), or that were historically part of your political group (think what Hungary would like to do with Transylvania).

What doesn’t happen very often, though: I don’t learn a new monosyllable in my native language very often. Thus, when I run across one, it tickles me. So, when a recent trip to New Orleans found me in a pharmacy museum, I was delighted to come across this exhibit:

Where the fuck does this come from? Let’s go look. Merriam-Webster does not have an entry for fleam, although it does have one for fleam tooth:

A sawtooth shaped like an isosceles triangle

The Online Etymological Dictionary gives me this:

“sharp instrument for opening veins in bloodletting,” late Old English, from Old French flieme (Modern French flamme), from Medieval Latin fletoma, from Late Latin flebotomus, from Greek phlebotomos “a lancet”

So: I never would have guessed it, but it turns out to be historically related to our word phlebotomy, and in fact precedes it in English by centuries. And thus the pathetic life of a fat, bald old man is made happy by learning a new one-syllable word…

Geeky linguist notes

I’ve given 30,000 words as the size of a typical college-educated adult’s vocabulary. Take that with a grain of salt–counting the size of someone’s vocabulary is really hard, for a lot of reasons. You can find a good discussion of them in Elisabetta Jezek’s book The lexicon: An introduction.
I calculated cumulative vocabulary size to age 22 (i.e. approximately the completion of an American college education) using the rate of growth that I gave in the post for the 2:6 age range, because that was the only age range for which I could find numbers. This results in a drastic underestimate of total vocabulary size–by age 22, it gives just a bit over 7,600 words. With slow growth after leaving college, there is no fucking way to hit 30,000 in a human lifetime.

What you do on Saturday night if you have no life whatsoever

That’s a whole lotta accents…

If you have no life whatsoever, what you do on Saturday night is (a) study French verb conjugations, and (b) binge-watch the excellent Netflix series Criminal: France–and not necessarily in that order, either.

I’ve recently been working on the passé simple, a French tense that’s used in some genres of writing, but only very rarely in the spoken language. I love les chapeaux chinois (circumflex accents), and one of the nice things about the passé simple is that it uses them. Specifically, they appear in the nous and vous forms: nouss aimâmes/finîmes/prîmes, vous aimâtes/finîtes/prîtes.

Find a verb with a circumflex accent in the stem, and it gets really fun. So, it’s Saturday night, and I’m sitting on the back porch smoking a cigarette and and doing some exercises on the French Verb Forms iPhone app (no, I am not sponsored by Netflix, French Verb Forms, or Apple–I pay for that stuff just like everyone else), when I am presented with the verb apprêter “to prepare” to conjugate: Circumflex City!

How to irritate a linguist, Part 5: English irregular past-tense verb practice

You just stand there, silent–trying not to glare balefully, either at your new interlocutor or–more likely–at the back of your departing host. This is a mistake.

You’re at a party, sipping a dark beer and minding your own business, when the host introduces you to another cheerful attendee. Biggie, meet Zipf. He’s a linguist. …and disappears.

You just stand there, silent–trying not to glare balefully, either at your new interlocutor or–more likely–at the back of your departing host. This is a mistake.

Conditional probability is the likelihood of some event given some other event. For example: the probability of the word barf being said is, in the absence of any other information, equal to the frequency of the word barf being said divided by the frequency of any word whatsoever being said. For example: I went to the Sketch Engine web site, your home for fine linguistic corpora and the tools to search them with, and searched a collection of 15.7 billion words of English scraped from the Web in 2015 and found that the word barf occurred 0.12 times per million words. In other words: in the absence of any other information about what’s being said, you can expect that you will run into the word barf once every 8 million words or so.

If dogs are being talked about, the situation changes. If you look only in the vicinity of the word dog, then the frequency of barf is 2.41 times per million words. In other words, when dogs are under discussion, you will run into the word barf every 415,000 words or so. So: the probability of the word barf is 0.12, and the conditional probability of the word “barf” given that you have seen the word “dog” is 2.41.

An aside: it isn’t necessarily the case that having seen some word tells you anything about the probability of seeing another word. For example, the probability of the word barf and the probability of the word barf given that you have seen the word the are probably equal. When the probability of some event (say, seeing some word) and the probability of that event given some other event (say, having seen some other word) are equal, we say that they are conditionally independent. When the probability of some event is not the same as the probability of that event given some other event, we say that they are conditionally dependent.

So, you’re at a party, sipping a dark beer and minding your own business, when the host introduces you to another cheerful attendee. Biggie, meet Zipf. He’s a linguist. …and disappears. You just stand there, silent–trying not to glare balefully, either at your new interlocutor or–more likely–at the back of your departing host. This is a mistake, for the following reason: when you just stand there silently, you let the other person establish the grounds of the conversation. (Note that I’m assuming a party in the United States, where we find silence uncomfortable, and thus there will, indeed, be a conversation.)

Someone saying Wow–irregular verbs, huh? Aren’t they weird? …is equal to the frequency of wow–irregular verbs, huh? Aren’t they weird? …being said, divided by the frequency of anything whatsoever being said. In other words: vanishingly small. However, the probability of someone saying Wow–irregular verbs, huh? Aren’t they weird? …given that you have just been introduced to someone as a linguist is not vanishingly small–it is much, much larger than vanishingly small.

Just to be sure that we’re all paying attention here:

Suppose that the probability of the word barf is not equal to the probability of the word barf given that the word dog has been said. These two words are:

conditionally dependent

conditionally independent

Suppose that the probability of the word barf is equal to the probability of the word barf given that the word the has been said. These two words are:

conditionally dependent

conditionally independent

The probability of someone saying Wow–irregular verbs, huh? Aren’t they weird? is much higher if they have just been told that you are a linguist than the probability of someone saying Wow–irregular verbs, huh? Aren’t they weird? …given no additional information. Those two events are

conditionally dependent

conditionally independent

Answers: (1) conditionally dependent, (2) conditionally independent, (3) conditionally dependent.

What’s so irritating about this? The answers to that question are probably as numerous as the number of linguists in the world (which is to say: not enormous, but not zero, either), but here are my top 5 explanations:

Your question is looking for a specific answer–yes, they sure are weird–but I do not, in fact, think that English irregular past-tense verbs are weird, so I feel pressured into lying, so fuck you.
Talking about what’s interesting about English irregular past-tense verbs (I said interesting, not weird) would require me finding a napkin and a pen with which to draw on it, and no one seems to carry pens anymore, so I would have to wander around the party like a bumbling idiot, breaking up innumerable conversations while I looked for one, plus I have facial hair, so I really need my napkin.
A reasonable linguist would suspect that if they engage with you on this question, then they’re going to find themselves in an annoying conversation with you about linguistic complexity, and that would really ruin their evening, which given that I was just sipping a dark beer and minding my own business seems pretty unfair.

English is the language of my profession, and I know an enormous number of non-native speakers who can read and write it close to perfectly. But: drink a couple of beers at the Association for Computational Linguistics convention meet-and-greet, get into an animated conversation about the inability of Big Data to demonstrate causality, and anyone will start to trip over irregular forms. If you’re speaking English, that’s probably mostly going to involve irregular past-tense verbs. But: practice makes perfect better, so: let’s practice!

Today we’ll look at irregular past-tense verbs that follow a specific pattern. In this pattern, a verb with the vowel [i] (International Phonetic Alphabet) in the present tense has the vowel [ε] in the past tense. Examples:

feed/fed
lead/led
meet/met
read/read
lead/led

Notice that I’m grouping these verbs by pronunciation, not by spelling–our goal here is to help you develop spoken habits. (Mécanisation–thanks, Phil d’Ange!) The astute reader (OK, a linguist) might also have noticed that those verbs all end with one of the two English “alveolar oral stop consonants:” that is, with a t or a d. Other verbs that have the [i]-in-the-present-[ε]-in-the-past pattern may add a t or a d:

feel/felt
creep/crept
keep/kept
kneel/knelt
leave/left
mean/meant
leap/leapt (leaped is also possible)
cleave/cleft (cleaved and clove are also possible)
flee/fled
sleep/slept
sweep/swept
weep/wept
deal/dealt
dream/dreamt (dreamed is also possible)
plead/pled (pleaded is also possible, and I think common these days, at least in the US)

OK: practice time! Here are some sentences that include past-tense verbs of the [i]-in-the-present-[ε]-in-the-past pattern. Read them out loud, replacing the present-tense verb in parentheses with the past-tense form. (In some of these examples, it’s actually a past participle that happens to have the same form as the past tense.)

All examples are from the New York Times story Trump and Putin have met five times. What was said is a mystery, by Peter Baker, published January 15th, 2019. I have edited some of them for clarity, e.g. by replacing they with Trump and Putin.

The first time that Trump and Putin (meet) was in Germany.

Each of the five times President Trump has (meet) with Mr. Putin since taking office, he has fueled suspicions about their relationship.

The unusually secretive way he has handled these meetings has (leave) many in his own administration guessing what happened and piqued the interest of investigators.

At the height of the campaign, his son, son-in-law and campaign chairman had (meet) at Trump Tower with Russians on the promise of obtaining dirt on Mrs. Clinton from the Russian government.

Their most famous meeting came on July 16, 2018, in Helsinki, where they talked for more than two hours accompanied only by interpreters. The Kremlin later reported that the leaders reached important agreements, but American government officials were (leave) in the dark. American intelligence agencies were (leave) to glean details about the meeting from surveillance of Russians who talked about it afterward.

The picture at the top of this post is an MRI of the vowel [i] being pronounced. Source: I don’t remember, but if you care, I’ll look it up. Enjoying the How to irritate a linguist series? Here are the previous episodes.

How to irritate a linguist, Part I: asking The Question That We Wish You Would Not Ask
How to irritate a linguist, Part 2: claiming that there are languages in which some ideas cannot be expressed
How to irritate a linguist, Part 3: going on and on about how, specifically, Language X is special when you know nothing about other languages and therefore don’t realize that those traits are totally common
How to irritate a linguist, Part 4: asking us which languages are the hardest/easiest to learn

Ducklings and goslings and inklings, oh my

That moment when the elves take your baby and leave one of theirs in its place.

I dragged myself out of bed at 8:30 AM today. Under normal circumstances, if I’m still in bed at 5:45 AM, it means that I had a rough night–I am most definitely both a morning person, and an early riser. Seulement voilà (“the thing is”):

At this time of year, it doesn’t get light outside in Paris until about 8:30 in the morning.
At 2 AM I got obsessed with the need to learn all of the words for baby animals in French.

Morphemes are the things that words are made of. For example, the plural cats has two morphemes: cat, and the s that carries the meaning of plurality. (This happens to be the example from which my child learned what a morpheme is–as a young child, and as we did the dishes together. Must suck to be a linguist’s kid…)

English has an odd little morpheme that refers to things that are small. Like the s of cats, it is what is called a bound morpheme, meaning that it cannot be a word on its own–it has to be attached to something else. (Contrast that with the cat in catnap (a short, light nap), catnip (a plant–it’s basically pot for cats), and cathouse (a brothel–archaic)). Here are a couple of examples:

duckling: a baby duck.
inkling: a small hint, or a small piece of knowledge. (I’ll give some examples of its use later.)

Changeling-fée-irlande-légende-mythologie-3 — Source: http://www.vivre-en-irlande.fr/culture-irlandaise/changeling-fee-legende. See the site for helpful information about how to recognize a foundling, return a foundling, etc.

The -ling morpheme is also not productive: that means that you can’t really use it freely to make “new words.” For example, it’s not clear that anyone would know what you meant if you casually threw the words waterling (parallel to inkling) or penling (parallel to duckling) into a conversation. (Contrast that with -gate, which over the course of my lifetime has become applicable to practically anything, with the meaning of “a scandal related to:” Bridgegate, Pizzagate, etc.) Because it’s not productive, one could list all of the words in English in which it occurs. Limited only by my memory, of course. My best shot at doing so:

duckling: baby duck
gosling: baby goose
foundling: a child who has been found after having been abandoned
changeling: when the elves take away your baby and leave one of their own in its place
inkling: a small hint, idea, trace, piece of knowledge, clue

In the Foundling Hospital grounds, London, c1901 (1901) — The London Foundling Hospital in 1901, from an article about a 1911 foundling lottery in Paris at http://time.com/4433717/paris-baby-raffle-history/.

Now, I know what you’re thinking: Zipf, you’re a drooling idiot. There are lots of words in English that end with -ling: for example, DROOLING. Feeling, wheeling and dealing (French: mic-mac or micmac), healing…

Well… I may be an idiot, but I’m not a drooling one. Here’s the thing: a morpheme is defined by its sound (or spelling)–in our case, ling–and by its meaning. Drooling and gosling (baby goose) contain the same sounds/letters, but not the same meaning of smallness, so it’s not the case that they share the same morpheme. -ling is a pretty textbook (French: typique) example of a non-productive morpheme.

So, yeah: I don’t sleep much, and I’m trying to learn to speak French, so at 2 AM I got obsessed with learning the names of baby animals in French. This web page got me started, and then I started searching WordReference.com for weird English-language baby animal names (say, gosling), and here you see the results. (Yes, some occur more than once.) At 2 AM, I only knew chiot (puppy), chaton (kitten), and veau (calf)–how about you? And, native speakers (Phil d’Ange, I’m lookin’ at you)–can you add some more?

Adult animals:

Juvenile animals:

English-language example sentences

Foundling:

The Steel Riders Saga is a sci-fi/fantasy novel about Free Wheeler, a foundling discovered by the legendary Steve Thompson during a deep terrain ATV ride. Thompson leads an ATV pack known as the “Steel Riders.” In their fantastical journeys Free Wheeler finds true love and home. (Twitter, @quantum_tide)
Meanwhile, in Australia, there’s a National #GravyDay. I have never heard of anything so glorious! (Nobody in my family cares about gravy as much as I do. I… might be a foundling?) (Twitter, @VG28263355)
@Decervelage Can I just say…Baby Faced Finster. A foundling!! You Naughty Baby!! Hahaha! (Twitter, @TheSuperAmanda)

Inkling:

I’ve mentioned this numerous times on the podcast but… I have an inkling that Nintendo will use Smash DLC to promote upcoming (inc third-party) Switch releases. (Twitter, @pixelpar)
My new resolution is to not read the thread of comments of tweets where I know or have an inkling that it’s not going to be a good thing. (Twitter, @valparkie)
You are a gem of a friend and you don’t have an inkling of how much i appreciate your ignorance of my vices. (Twitter, @Shakti_Shetty)
I don’t have an inkling of what the future holds but I’m excited (Twitter, @JaredTench)
Roommate, Camden *going to Waffle House in Dunn*: “If I get the smallest inkling of a crack-whore, I’m leaving!” (Twitter, @dr_pattyguin)

What computational linguists actually do all day: The lexical frequency version

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another.

Tell someone that you’re a computational linguist, and the next thing out of their mouth is likely to be either:

How many languages do you speak?, or…
What’s that?

In theory, computational linguists spend their time thinking about fun questions like:

Is natural language Turing-complete?
The relationship, if any, between what we know about words (say, the word dog can be a noun or a verb, and it occurs more often with the words bark and leash than with the word meow) and what we know about the world (say, a dog is a canine, and might like to chase balls, and will eat cat shit if not instructed otherwise).
How Zipf’s Law, which describes the fact that a small number of words are extremely common, while a large number of words are extremely rare, but do occur, might or might not be related to the mathematical phenomenon of the fractal.

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another. (OK: that, and writing grant proposals.) Think that being a computational linguist sounds glamorous? Here’s how I spent my morning.

All I gotta do: go through a bunch of documents and count how often each word in that bunch of documents occurs. Easy-peasy–barely hard enough for a homework in Computational Linguistics 101.

Seulement voilà…

Easy enough to fix–I just failed to give the complete name of the program, and…. marde.

OK, easy enough to fix–I had written

…when I shoulda written

Shoulda: the typical spoken form of should have.

(Note the square bracket near the end of the middle line–I had left it out.) Great–avançons, alors. But, no, fuckashitpiss:

Easy enough to fix–turns out I wrote this:

…when I shoulda written this:

(Note the dollar sign before the rightmost instance of words now.) And so, on we go, but…

…and it’s easy enough to fix–I had written this:

…when I shoulda written this:

(Note the double quote before $frequencies{$words[$i]}\n”;) …and now I’m wondering:

These errors were all on one single line–what other horrors have I hidden in this code, and will they be as easy to find as those were?
What the hell was I thinking when I wrote that line? Was I thinking about the upcoming dissertation defense at 2 PM? Was I thinking about Trump giving my country to China? Was I thinking about tomorrow’s colonoscopy? Who the hell knows, really–whatever it was, it apparently wasn’t this line of code…

Mais returnons… Ah marde, but at least this one will be easy to fix…

…except that I verify the existence of the directory, and then get this:

…which is the exact same error that I got before. So, I go back and look at my code, where I see this, and remember that my error message is supposed to print out the name of the directory that it couldn’t open, but it did no such thing:

…which is ’cause I never gave the program the name of the input directory. So I take care of that, and also tell my program to print out the name of the directory that it couldn’t open if, it fact, it can’t open a directory–as we saw above, I had planned to do this, but of course left out that little detail:

…and now I experience a tiny little bit of success, because my program does not crash. Seulement voilà, it doesn’t actually produce any ouput:

Note the lack of a bunch of lexical frequencies… So, I go back to my script, and I start looking around in the region of the program where I meant for the output to happen. I don’t see anything obvious in that area, so: I go further up in the code, and start doing what I need to do to convince myself that the earlier parts of the program are working the way that I intended them to. This means printing out the results at intermediate steps of the processing. The resulting code (leaving out a bunch of details) looks like this:

…which does nothing different than it was doing before, so I know that I need to go even further up in the program and, again, print stuff out as I go, resulting in this:

…which, when I run the script, produces this:

…which suggests to me that the directory exists, and that I’m opening it correctly, but that I am either (a) reading its contents incorrectly, or (b) making a mistake when I make a decision about whether or not to open each file. A quick Google search finds the problem for me–I had written this:

…when I shoulda written this:

(Note that the text at the left end of the line was open, and now is opendir.)

Progress! Now I get some output, but note the last line–I’m just getting a bunch of file names, and no word frequencies. I can see the problem right away, though–I have the directory name right, and I have the file name right, but I need to combine them in order to be able to open the file. Doing so gives me this code:

…which results in my script running successfully for a while, but then crashing, and I know exactly what causes said crash…

…and I know that it’s a bear to fix, and I’ve been working on this fucking task that’s barely difficult enough to make a good homework assignment for two hours, and now it’s time to go to the aforementioned dissertation defense, and… Soupire…

Meme source: https://imgur.com/gallery/fzbkRI8

Sometimes my mouth just stops moving

The hard part is not studying more than one language–the hard part is keeping them separate.

One of the more interesting books that I’ve read over the course of the past couple years was Michael Erard’s Babel no more: The search for the world’s most extraordinary language learners. It is a book about polyglots and polyglossia–people who speak a lot of languages (as opposed to linguists, who are people who study language in general).

Erard is an actual linguist, and knows what he’s talking about. One of the points that he makes that I found interesting is that there’s no single recipe for learning a “second language”–in his travels amongst the polyglots, he found that people who are into this kind of thing figure out what works for them, and it’s not necessarily the same approach for everyone.

So: I’m going to show you how I prepare for my annual trip to Guatemala, where I volunteer with a wonderful group called Surgicorps. (We provide free specialty surgeries for people for whom the almost-free national health care system is still too expensive.) But, don’t feel like it’s a magic recipe (am I mixing metaphors here?) for success–just know that it has been working for me for the past few years, and there’s something that will work for you. (Which might be this!)

For context: Spanish is a “second language” for me–one that I can function in for my daily life, and professionally. But: because I spend at least half of my life in the French language and only speak Spanish when I go to Guatemala, it’s very difficult for me to not mix French into my Spanish incessantly. (As I believe Erard also points out: the difficulty is not learning a bunch of languages–the difficulty is keeping them apart.) Consequently, on July 1st of every year since I started spending As Much Time As Possible in France, I cut French out of my life completely. En contrepartie, on July 1st I start doing the same kinds of things in Spanish that I would normally do in French–listening to the news on the way to work, learning my daily vocabulary words, reading The Walking Dead comics, etc.

I also put together a schedule of everything that I need to work on between July 1st and July 30th. If you’re unfortunate enough to have been reading my blog for the past couple years, you saw me do this for the month before I took my French C1 test. The main difference is that for the CEFR exams, I need to include “written production” in the things that I work on–for my volunteer work in Guatemala, I don’t need that, because I almost never need to write anything in Spanish. So, for Guatemala preparation, I have four main categories of things to focus on:

Vocabulary: technical (medicosurgical)
Vocabulary: general
Grammar
Oral production

Why do I have an entire “section” for general vocabulary? Because as I’ve written about before, that’s the biggest challenge. Medical vocabulary is finite–there are only so many body parts, surgical procedures, etc. It’s the general vocabulary that gets you–remember that Zipf’s Law reflects the fact that languages are full of words that almost never occur, but, they do. When the guy comes to the hand surgeon with two mangled fingers hanging there uselessly, the first question that the surgeon asks him is going to be what happened, and the answer to that could be anything.

A snake bit me
I got a cactus spine stuck in my palm
The fuel pump caught fire and exploded while I was in the passenger seat
Two guys tried to steal my car and they went after me with a machete

…all of which I have run into.

So, I expand out my vocabulary study into these categories:

Vocabulary: technical (medicosurgical)
- Areas of the hospital
- Surgical techniques and equipment
- anesthesia
- anatomy
  - the hand (because I mostly work with a hand surgeon)
  - gynecology (because I don’t interpret for the gynecologists very often, and therefore like to make sure that I give the terminology a once-over since I don’t have occasion to use it much)
  - the face and head (because we always have multiple plastic surgeons with us)
Vocabulary: general
- the Guatemalan regional dialect (lots of fun loan words, mostly from one or another of the 20+ Mayan languages spoken in the country)
- professions (see this post for why that gets a day of its own)
- farm work and other kinds of manual labor (because most of our patient population consists of children or manual laborers–see this post)
- animals and plants (see above about “anything can happen to your hands”)

I split grammar into three topics:

Conjugation (because when in doubt, I’ll conjugate Spanish verbs as if they were French, and that does NOT work)
Usted forms of verbs (they get a day of their own because it’s the form that I should be using with patients and their family members, but I almost never use it in my daily life)
The subjunctive (much easier in Spanish than in French because it gets used far more often in Spanish, so you don’t have to think about it as much–my French problem is that I use the subjunctive too often)

Now, I know you’re wondering: why do I have oral production on my list, and why don’t I have oral comprehension? Oral comprehension is the hardest part of learning any language for most people, and oral production is what most anglophones find the easiest part of learning Spanish. The answer goes back to Michael Erard: the hard part is not learning more than one language–the hard part is keeping them separate.

This comes into play for me in two ways. One way will be familiar to anyone who has two foreign languages running around in their heads: when you don’t have a word that you need in one language, it’s hard not to substitute it with the word from the other.

The other way that French interference in Spanish works out for me is more subtle, and it’s purely a question of oral production: it’s very difficult for me to say sequences of sounds in Spanish that would not be possible in French.

A problem context that comes up quite often is possessive pronouns followed by vowel-initial nouns. For example (English followed by formal/informal French and then formal/informal Spanish):

your eye	votre œil	ton œil	su ojo	tu ojo
my artery	votre artère	ton artère	su arteria	tu arteria

Francophones will note that artère is feminine, but it has the masculine form of the possessive pronoun–mon. No huge surprise to students of French–any vowel-initial noun takes the masculine, consonant-final, form of words like possessive pronouns. Where the problem comes up: when I have to say one of those words before a vowel-initial noun in Spanish, my tongue stops. It’s like it runs into a wall–my mouth just stops moving. What the fuck??

From a linguist’s point of view: I’ve developed my own little foreign-language phonology. In languages other than my native one (American English), that little phonology really does not like sequences of vowels at the end of one word and the beginning of the next. So, I need to say tu abuelita, your grandma, but my phonology really, really wants it to be tun abuelita, or something of that ilk, which does not exist in Spanish… and my vocal apparatus just comes to a halt.

Solution: oral production drills. Focussed drills, not just making myself speak–that will happen in Guatemala, where I’ll show up a week before the rest of the team to get those Spanish-language juices flowing. I’ll put together exercises for myself that focus on the specific things that I know I have trouble getting out of my mouth, et voilà. For example: ¿le duele todavía su axila? (Does your armpit still hurt?) Ya hablamos con su abuela (we already spoke with your grandmother). Both of those are short sentences that force me into saying the vowel + vowel sequences–in these cases, su axila (your armpit) and su abuela (your grandmother) that are so hard for me.

So, you take all of those individual things to work on, mix ’em up to give yourself a little variety in your daily study. Prioritize things in a way that makes sense for what you plan to be doing with the language–I have a day in there for learning the vocabulary of food and beverages, but that’s more so that I can translate the menu for my fellow volunteers than for the actual volunteer work, so it wouldn’t make sense to be working on that first, and I don’t. Mix in some review days–review is essential, and you don’t want to do it all at the end. Boum, as the French kids say–a month’s-worth of work. I’ll start it on July 1st, and I’ll finish it sitting in the plane on the way to Guatemala on the 30th. If I screw up and miss a day? Not the end of the world–I’ll make it up. If I just can’t stand anesthesia vocabulary on July 11th? No problem–I’ll just switch a couple days around. Is the list intimidating? No–the opposite. I know that if I prepare, everything will probably go fine, and I know that if I work my list, I’ll be prepared–so, it’s actually reassuring, not intimidating.

Why no days for working on oral comprehension? Because that’s what listening to the news on the way to work, podcasts while I stretch, etc., are for. That really has to be part of your daily life–you can’t partition that off into specific days. Gotta work, work, work your oral comprehension. On the good side: not one second of the time that you spend doing it will be wasted.

English notes

a couple versus a couple of: this is controversial amongst English speakers. People who prefer a couple of are likely to complain about those of us who say a couple. Je les emmerde. How I used it in the post: If I just can’t stand anesthesia vocabulary on July 11th? No problem–I’ll just switch a couple days around.

ilk: maybe acabit in French? How I used it in the post: My phonology really, really wants it to be tun abuelita, or something of that ilk, which does not exist in Spanish… I think in French something of that ilk would be quelque chose du même acabit, or words to that effect. Phil d’Ange?

The picture at the top of this post is from lolphonology.tumblr.com. I picked it because in the post I carped about sequences of sounds, and the meme is about sequences of sounds (one in particular–the sound of the ch in English chat, but more on that another time, perhaps). You don’t get it? No worries–that just means that you’re cool, not nerdy like some stupid linguist.

Giving back: Pronouncing English words that end with -ive

Paradoxically, the better your skill in a second language, the more your mistakes stick out.

I work with a couple of French folks whose English is so good that they are effectively native speakers, as far as I can tell. It’s super-impressive—if my French were ever anywhere near as good as their English…

It’s their very skills themselves that make it obvious when they make a pronunciation error–it’s as if I were making a pronunciation error. It is not at all the case that I don’t make pronunciation errors in my native language, and people most definitely do notice them–but, I suspect that they’re all the more obvious precisely because (a) I’m a native speaker, and (b) I’m an “educated native speaker” (sounds hoity-toity, but it’s a technical term in linguistics). I would guess that many of my “smaller” mistakes in French go unnoticed because they get lost in the thick fog of all of my other mistakes–in my native language, though, they all stand out.

hoity-toity: pretentious.

So, when my French-speaking-colleagues-who-are-essentially-native-speakers-of-English-too make pronunciation errors in English, it is, indeed, noticeable. Happily, their English-language pronunciation errors often fall into a single category, and that’s what we’re going to go after today–my little attempt to repay more hours than I even want to think of that they’ve spent hammering on my pronunciation/lexicon/syntax/politeness/EVERYTHING in French.

You may have noticed that written vowels in English are pronounced differently than those vowels would be pronounced in essentially every other written language on the planet. (That’s just a fraction of all languages, by the way–the vast majority of languages have no writing system.)

The reason behind all of this English-versus-the-world divergence in vowel sound pronunciation is something called the Great Vowel Shift. It changed the pronunciation of many vowel sounds, and it happened after English spelling was mostly established. The result was that English vowel sounds didn’t line up with their spelling as well as they used to.

greatVowelShift-time — The Great Vowel Shift, with approximate dates–and yes, with some training in phonetics, it does make perfect sense. Picture source: http://sites.millersville.edu/bduncan/221/history/4.html

One of the changes in pronunciation affected words that happen to be spelled with an e at the end. It’s a silent e now, but it wasn’t always. The preceding vowel sound changed–in a very systematic way that requires knowing a bit about what you do with your mouth to make sense of–and one of the consequences was that if that preceding vowel was i, it went from being pronounced like i in most languages to being pronounced like the word eye is pronounced today.

So, today, if you’re an Anglophone kid, you grow up being taught that when a word ends in -iCe, where C means any consonant, the i indicates the sound of the word eye. There are plenty of examples of this:

five
drive
dive
thrive
alive
hive
archive
strive

But–and this is a big “but” (which is why I italicized and underlined it)–iCe (i followed by a consonant followed by an e at the end of the word) is not always pronounced that way. There are plenty of times when it is not, and those tend to be longer words that educated people would use, and my French co-workers are super-educated, so they use these words. For some of the native speakers of French that I know, mis-pronouncing these words is essentially the only mistake that I ever hear them make in English. So: let’s work through some of these.

You’ll notice something about the words that are pronounced the way that Anglophone kids are told you always pronounce -iCe: they tend to be single-syllable. Consider:

five
drive
dive
thrive
live (the adjective only, as in live bait)
alive
hive

But, not all single-syllable words of this type are pronounced that way. Here’s the one counter-example that I can think of:

give

And, not all of the words in which -iCe is pronounce like “eye” are single-syllable words. The counter-examples that I can think of:

archive
derive
arrive
survive
revive
deprive

I know what you’re thinking now: Zipf, this is simple–regardless of the number of syllables, the i is pronounced as in five if it’s in a STRESSED syllable. And, yes, that almost works–but, consider archive, which is stressed on the first syllable, but is still pronounced like five.

…and live is weird–when it’s a verb, it’s pronounced like give, but when it’s an adjective, it’s pronounced like five.

OK, we’re more or less good with the words that end in iCe and get pronounced like five. What about the words that don’t get pronounced like five? Let’s take a look at some. Now, I’m not going to select these randomly. I went to this web page on the Morewords.com web site. What it gave me is a list of words that end in -ive, sorted by how frequent they are. Here’s what the output looks like. You’ll notice that every word is followed by two numbers. The first one is the length of the word in letters, while the second one is how many times the word occurs in every million words of text. (What collection of texts did they do their counts in? They don’t say.) So, give is 4 letters long and occurs 1735 times per million words, executive is 9 letters long and occurs 171 times per million words, and so on.

Screen Shot 2018-01-26 at 16.40.33 — Source: MoreWords.com

With that list in my greedy little fingers, I’ll go through it and pull out some of the ones that are not pronounced like five. That gives us this:

receive
executive
alternative
objective
representative
conservative
effective
initiative
positive
relative
olive

…and there’s a little attempt to help with the already-almost-perfect English spoken by so many of my French colleagues. Got a funny story related to mispronunciation? Tell us about it in the comments…

Matching game IV: Zipf’s Law in French

Zipf’s Law is why if someone is looking for a web page and types “dogs in marseilles” into the query box, your search engine should pay no attention to the word “in,” some attention to “dogs,” and quite a bit of attention to “marseilles.”

Zipf’s Law describes the frequencies of words: there is a very, very small number of words that occur very, very often, and a very, very large number of words that occur very, very rarely–but, they do occur. This blog is focused on one of the consequences of Zipf’s Law: it means that if you are seriously studying a second language, you are going to run into words that you don’t know every day for the rest of your life.

You know how the matching game works: we have words in English, words in French, and we match them. Today’s words (and a tiny bit of grammar) are taken from the discussion of Zipf’s Law in the book Recherche d’information: Applications, modèles et algorithmes, by Massih-Reza Amini and Éric Gaussier, second edition. Recherche d’information is information retrieval, the task of finding documents in response to an information need: what Google does for you every day. One of the great embarrassments of linguistics is the fact that information retrieval is mostly about language, in the sense that mostly what you’re looking for is web pages with stuff written for them and you use words to find them–and yet, most of the work of information retrieval is done without actually doing anything that looks very much like doing anything with language. At its heart, the technology of information retrieval is almost entirely done with counting and very simple arithmetic–nothing linguistic there. You could think of that very simple arithmetic as taking advantage of Zipf’s Law–the very simple arithmetic is used to figure out things like the fact that if someone is looking for a web page and types dogs in marseilles into the query box, your search engine should pay no attention to the word in, some attention to dogs, and quite a bit of attention to marseilles when it is making the decision about which web pages to put at the top of the search results. Scroll down to find today’s vocabulary items, and click on the pictures of the relevant pages from Amini and Gaussier’s book if you’d like to see those words in context. As for me: a second cup of coffee, go over these flashcards, and then off to the lab. Today’s goal: explain why researchers calculated the ratio of vocabulary size to length of conversation of a bunch of soldiers–after chasing them through the woods, catching them, depriving them of food and sleep, and then interrogating them.

I included La fréquence du second mot because I’ve been trying to understand when to use second and when to use deuxième. If I understand the Académie’s Dire/Ne pas dire page correctly, the Academy would prefer that this be deuxième, but not even the Académie thinks that it’s mandatory to make the distinction:

On peut, par souci de précision et d’élégance, réserver l’emploi de second aux énoncés où l’on ne considère que deux éléments, et n’employer deuxième que lorsque l’énumération va au-delà de deux. Cette distinction n’est pas obligatoire.

On veillera toutefois à employer l’adjectif second, plus ancien que deuxième, dans un certain nombre de locutions et d’expressions où il doit être préféré : seconde main, seconde nature, etc., et dans des emplois substantivés : le second du navire.

academie-francaise.fr/second-deuxieme

As the CarriereOnline.com web site puts it: C’est pour cela qu’on parle de la Seconde Guerre mondiale parce qu’on espère qu’ il n’y en aura pas de troisième !

Sexual dimorphism in elephant rumbles

I was just getting ready for my day of calculating the ratio of unique words to total words in a bunch of journal articles about spinal cord injury and regeneration when it struck me that there really aren’t enough nice pictures of elephants in our lives. Not mine, anyway. Please enjoy the following picture of Chikwenya (left) and Mike (right), two African elephants from Mana Pools National Park in Zimbabwe. The wavy lines in the middle of the bottom part of the photograph are a spectrogram of an elephant “rumble.” See the things labelled F1 and F2 in the panels to the left and right? Those are the first formant (F1) and second formant (F2) of Chikwenya and Mike’s rumbles. In a human language, it’s the height and spacing of the first and second formants that identify the various and sundry vowels. Want to know more about African elephant rumbles? See Anton Baotic and Angela Stoeger’s recent paper on the topic:

Baotic, Anton, and Angela S. Stoeger. Sexual dimorphism in African elephant social rumbles. PloS one 12.5 (2017): e0177411.

Want to know more about formants and vowels? Encourage me in the Comments section.

Off I go for breakfast (see below) and a nice day of calculating the ratio of unique words to total words in a bunch of scientific journal articles about spinal cord injury and regeneration…

Breakfast in Kashiwa, Japan: grilled mackerel and a bit of French grammar. Are the “macs” in Jean Genet’s “Miracle de la rose” “maquereaux” (“mackerel”, but also “pimps”)? I honestly don’t know.

PITA ferret: the informal imperative

What’s a Jewish mother’s favorite metro station? Read to the end of the post and you’ll get the answer, plus a video of a ferret.

042411_2018_limpratif1 — Picture source: the Le Coin du français blog. https://goo.gl/jN0fh8

I never stop being amazed at how basic some of the mistakes that I still make are, even after three and a half years of intensive study of la langue de Molière. Case in point: the spelling of the tu form of the imperative. The thing that you have to remember is that it doesn’t have an s at the end–except when it does.

The wonderful Lawless French web site gives this explanation of the general rule (keep going for some exceptions):

The imperative tu conjugation for –er, –frir, and –vrir verbs is the present tense minus the final s.

Here are some examples from the Nouvel Obs’s (the form of this genitive explained below in the English notes) description of the informal imperative:

Rentre immédiatement !
Ne discute pas !
Va voir tes grands parents !

OK, an exception: when the verb is followed immediately by y or en, you have an s at the end. Here’s the explanation from the Français Facile web site:

Cependant, devant « en » et « y » qui suivent immédiatement le verbe, on ajoute un « s » au verbe en « er » à l’impératifsingulier, et on le joint par un trait d’union comme tous les pronoms qui suivent un impératif.

Ex. Amènes-y ta soeur.

Cette règle s’applique aussi au verbe « aller »

Ex. Vas-y.

Fiez-vous à votre oreille. Si vous prononcez le verbe et que le son vous paraît étrange, il peut y avoir un problème.

Mange-en, sans « s » sonnerait d’une façon étrange à l’oreille.

EX :

À Londres, vas-y si tu veux, mais amènes-y ta soeur et rapporte-moi un cadeau.

OK: that’s the “first group” verbs (-er)–we’ll return to the –frir and -vrir verbs that Laura mentions in a bit. For -ir and -re verbs, the s is always present.

Finis ta soupe. (Je Révise web site)
Choisis une date qui te convient. (FluentU web site)
Prends ton stylo. (Je Révise web site)
Tais-toi ! (self-evident)
Descends tout de suite ! (FluentU web site)

Now: some exceptions. First, as we’ve seen before, verbs that end in -frir or –vrir sometimes have odd behaviors. (See this post if you want some insights into what they have in common, and how they differ phonologically from other –ir verbs.) These verbs do not have an s in the informal imperative…

Couvre ta bouche quand tu tousses, dégueu !

…except when they do, which is the same as when the first-group (-er) verbs do, i.e. when followed by en or y.

Couvres-en un peu avant d’attraper une pneumonie. (Reverso)

(Native speakers: do you have dissenting opinions about this? I had to ask around a bit…)

Almost at the end! Just four verbs that are totally irregular in this respect:

Aller: Va te faire voir, but vas-y !
Être: always s-final: Sois beau et tais-toi.
Avoir: N’en aie pas marre, c’est bon pour les pépitos ! …but Aies-en de meilleures (notes), tes profs te féliciteront
Savoir: Sache qu’elle a vomi ce matin, alors que le thon était frais, but saches-en plus pour réussir ton examen.

So, the Jewish mother: here’s the first joke I ever understood in French. I’m minding my own business in the basement of a bar near the St-Sebastien Froissart metro station (none of your business why I was in the basement of a bar near the St-Sebastien Froissart metro station, or why I’m ever in the basement of any bar anywhere, for that matter) when I heard the following from the table behind me: La station de métro d’une mère juive, c’est laquelle ? Monge, parce qu’elle dit “mange, mange, mon fils.” In English: what’s a Jewish mother’s favorite metro station? Monge, because she says “eat, eat (in French, mange, mange), my son.” Now, this is interesting on a number of levels; the one that I’d like to point out is that it might only make sense to someone who does not speak hexagonal French, and that might be the only reason that I got it. As a monolingual native speaker of English, I can’t hear the difference between the vowels of mange and Monge–we don’t have contrasting nasalized vowels in English, and those two in particular are particularly impossible for me to hear, and pretty tough to pronounce, too, leading me to say things like marde, je t’ai trempée (“shit, I got you wet”–marde is a Canadianism that I can’t seem to get past) and getting responses like “but we’re not going out together!”…which suggests that I pronounce it as je t’ai trompée, “I cheated on you.” I’ll throw in to the mix the fact that I’m told that pieds-noirs (the pieds-noirs, “black feet,” are the French who returned to France after France lost Algeria as a colony in 1962–maybe 800,000 people) don’t differentiate between the nasalized vowels an and on, either. Not surprising–differences in the nasalized vowel inventory are a common feature of francophone dialect differentiation, including in France. What does this joke have to do with the subject of this post? It only works with the informal imperative, i.e. mange, mange (“eat, eat”)–with the formal or plural imperative (mongez, mongez), “eat” doesn’t sound anything at all like the name of the metro station (Monge), and you have no joke.

Here’s a video that has approximately a bazillion examples of the informal imperative. There’s a bit of vocabulary that might help you out here, if you’re not a native speaker of French:

le furet : ferret.
relou : here’s the best I can do for a definition of this word, which I haven’t found in a French-English dictionary as of yet: “Relou” est un mot verlan (langage des rues semblable à un ver lent grignotant doucement… ) signifiant “lourd”. Dans un contexte particulier, désigne une action/personne qui a fait/dit une chose qui a déplu à l’émetteur de ce mot. Source: lachal.neamar.fr. The source gives these synonyms: casse-couille (familier), chiant (familier), casse-pied, and lourd. So: maybe irritating, or “pain in the ass?”

English notes

PITA: a less-shocking way of saying “pain in the ass.” This is something somewhat more than annoying. Assembling the appropriate forms in order to be able to fill out the forms that you need in order to get permission to ask for (more) permission from the Dean’s office before doing any international travel is a PITA. (I’m talking about America here–everything you’ve ever heard about French bureaucracy being worse than American bureaucracy is bullshit, period.) My old neighbor was a PITA–always complaining if anyone parked in front of her house, although she didn’t have a car. The constant flood of papers that you have to review when you’re on Christmas vacation is a PITA. The ferret in the video is being a PITA to the cats–hence relou.

An excellent example, both using and defining the abbreviation:

Steven is saved in my phone as Bae…Biggest Asshole Ever. I’m in his as Pita. Pain In The Ass.

— Jesci (@JustJesci) 24 avril 2017

Of course, if we can have an example with a cat, all the better, seeing as how we’re on the Interwebs and all…

@sarahburchett81: I’d like to nominate my cat as asshole of the forever. He’s not named PitA for nothing. pic.twitter.com/332zHUlyOd

— JudiBootie (@flwr6pwr9_) 18 février 2017

A geeky example, but a very good one–you could hear this around my lab in the US any day of the week:

#Android notifications API is such a PITA compared to #iOS

— Jeff Jones (@JeffJonesInMT) 8 juin 2017

… and now you have to know what this means:

Any day of the week: (at) any time.

@LukeBryanOnline can play the piano and sing to me any day of the week! #CMTawards

— Beth Schwartz (@BethSchwartzND) 8 juin 2017

An idiot:

I’ll believe Trump over any politically connected elites or CNN, ANY DAY of the week.

— Mom (@sunnynodak) 7 juin 2017

Gratuitous picture of a guy with no shirt on:

I’d throw myself in the sea any day of the week, to get rescued by this big hunkkkk!! #BAYWATCHMOVIE pic.twitter.com/nw23zDGq2x

— HollyMcInnes (@holly_mcinnes) 7 juin 2017

	zipfslaw1 on Lawless French: an interview w…
	Anonymous on Lawless French: an interview w…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	zipfslaw1 on Ukrainian military nouns, by s…