300th post on the Zipf’s Law blog

Dear readers,

This is the 300th post on the Zipf’s Law blog!

In the two years since I started writing miscellaneous and sundry stuff here, the blog has had 13,913 page views and 7,634 visits.  It’s hard for me to believe that what started out as a venue for sharing information about French judo clubs has turned into a blog that gets read by about 40 people per day from every continent on the planet—sometimes hundreds of people in a day. As time passed, this went from being a judo thing, to a place for me to tell my friends and family about my adventures in France in a longer format than Facebook is really made for, and then a place for me to record the French words that I learn every day (whether I’m in France, or in the US). At some point I realized that I just enjoy writing, and this blog became a place for me to scribble about whatever happens to be absorbing me at the moment, usually with some connection to the French language or to the contrasts between American and French society (to the extent that I understand either of them, which isn’t necessarily much). Sometimes I feel free to geek out on topics in computational linguistics, or the anatomy of speech, or the oddities of human language. At this point, about half of my readers come from outside of the US, and so I’ve started occasionally discussing odd corners of the English language–particularly slang and the grammar of casual conversation–along with my misadventures in learning the French language: basically, whatever garbage happens to be rolling around my head at the moment, but almost always with some connection to language, and particularly to French.

Amongst the 40 or so people who visit this blog every day, I’m especially happy about–and grateful to–the people who make my posts better by commenting on them, pointing out mistakes, adding native speaker perspectives, contributing pre-publication material and personal insights, sharing their experiences of the high points and the low points of the “Anglo-Saxon” expat experience in France, and just generally making what I write into less of a solipsistic meander and more of a conversation. I happened to look today at the number of comments that people have left on the blog–for your amusement, here’s the data:

Screenshot 2016-06-27 16.52.27

Thank you to all of you for your support, and a special thanks to Ellen Rosenblum for suggesting this in the first place!

How we’re sounding stupid today: noun phrases

Screenshot 2016-06-27 19.12.23
Picture source: screen shot of zombilingo.org.

Like I always say: it’s the little things that get you.  One of the things that I love about France is that people feel totally free to correct each other’s language, and they certainly feel free to correct mine.  (Truly, I love this–it’s such a help in trying to learn the language.)  I gave a talk in French the other day.  Descriptivism versus prescriptivism, duality of patterning, how even very small choices in building computer programs for processing human languages can imply stances on very contentious issues in linguistics–all that kind of good stuff.  I had memorized the relevant French vocabulary–la référentialité (referentiality), l’épistémologie (epistemology), inné (innate).  I was about as ready as I could be.

Not ready enough, it turns out.  One of the folks in the audience came up to me afterwards to explain a not-very-subtle word choice error that I had blown.  My mistake: I said “phrase” wrong. I was talking about groups of words smaller than a sentence, and used the French word la phrase.  Not okay!  La phrase means “sentence.”  If you want to talk about phrases, you need another word.  What that word is–that’s not so clear.

Why would one want to talk about phrases, anyway?  One of Chomsky’s contributions to linguistics that didn’t suck was demonstrating that syntax isn’t about relationships between words–rather, it’s about relationships between groups of words.  Matt Willsey gives a nice example that illustrates how this works.  In English, one could say:

  • If x, then y. 
  • Either x, or y.

You can embed these:

  • If (either x or y), then (either x or y).

You can embed things in those, too:

  • If either (a or b or c or d), then either (e and f or g and h) or (i and j but k and l).

The point: you get nowhere trying to explain this kind of hierarchical structure by means of the behavior of words.  On the other hand, you can get very far by discussing this kind of hierarchical structure in terms of groups of words.

In linguistics, we tend to refer to these groups of words as phrases.  English has noun phrases, verb phrases, and prepositional phrases–maybe more, but at least these.  (At some level, a sentence is just another kind of phrase, but we do tend to maintain some notion of “sentence.”)

Phrases are typically thought of as having something called a head.  From a syntactic point of view, you could think of the head of the phrase as the thing that determines whether the phrase behaves as a noun, a verb, or whatever.  In the following phrases, I’ve bolded the head:

  • those bananas from the corner store
  • this banana that I got from my cousins

To see why I say that the head determines how the phrase behaves, consider these sentences:

  • Those bananas from the corner store are almost rotten.
  • This banana that I got from my neighbors is just about ready for the trash can.

Prior to Chomsky, the most fully elaborated theory of how syntax works is that it was about connections between sequences of words.  What you can’t explain with that kind of model is how you can have sequences like the corner store are or my neighbors is.  To account for sequences like that, you have to have some notion of structure that can let you represent the fact that it’s the head of a group of words that controls whether the verb is singular (is) or plural (are). 

So, how do you talk about “phrases” in French?  That’s where my problem came up, and how I ended up sounding stupid.  One of my ways of trying to find acceptable technical terminology is to look things up on Wikipedia in English, and then follow the link to the corresponding French-language page.  No love: there’s an English-language page for noun phrase, but no corresponding French page.  Around the lab, some of the students call them phrases–phrase nominale, phrase verbale, etc.  The issue: la phrase is typically used to refer to a sentence.  When I gave my talk, I used the word la phrase to mean “phrase,” as some folks do around the lab.  It didn’t go over well.

So, what do you call a phrase in French?  Here are some options that I’ve found.  The one that has the most support in terms of the number of places where I found it used is one that I have never actually heard!

  • le groupe nominal/les groupes nominaux (Linguee.fr)
  • la locution nominale (Linguee.fr)
  • le syntagme nominal (Linguee.fr; Denis Roycourt’s Noam Chomsky: une théorie générative du langage, in Le langage: nature, histoire et usage, edited by Jean-François Dortier; Maurice Pergnier’s Le mot)

I even came across this, in Maurice Pergnier’s Le mot:

C’est également avec ce sens qu’on rencontre le terme [syntagme] dans les traductions françaises des ouvrages de Chomsky, pour traduire le mot anglais “phrase” (Noun-Phrase; Verb-Phrase = syntagme nominal; syntagme verbal). 

Perpignon goes on to add: Il faut noter cependant que, pour cette…école, le syntagme (angl. “phrase”) ne se définit pas seulement comme ensemble d’unités minimales, il se définit surtout comme partie de phrase, puisqu’il est dégagé par découpage de la phrase (“sentence”) selon la structure arborescente. 

So, we have a very explicit contrast between le syntagme (English “phrase”) and la phrase (English “sentence”).

Now that we know how to talk about phrases, in French and otherwise: getting a computer to find the heads of phrases can be a lot harder than it is for humans to do it.  There’s a very cool web site that lets people play a game that’s designed to create data to be used to help computers learn for themselves how to find the heads of phrases in French.  It’s called Zombi Lingo: zombie, ’cause you have to find heads, and zombies like to eat brains.  (Clearly this is a pre-Walking-Dead conception of what it means to be a zombie.)  Check it out at this link–it’s quite fun.

So, yeah–I gave a talk in which I explained duality of patterning, but screwed up the word for “phrase.”  Oh, well–as Jigoro Kano, the founder of judo, would have put it: I got valuable insight into what I need to work on.

Incidentally, here are some details on some of the 85 gun deaths in the United States in the past 72 hours:

  • 3 people in one incident, Marion County, Oregon (source here)
  • 1 church deacon in Shelby County, Tennessee (source here)
  • 1 person in Houston, Texas (source here)
  • 1 person in San Antonio, Texas (source here)

I really don’t have the stomach to go through all 85 of them–sigh…  72 hours, 85 deaths…


How we’re sounding stupid today: synonyms

Synonyms are way more complicated than you might think.

For several years, my judo club in the States had a number of highly-ranked players on the junior national level.  The coaches decided to take them to Mexico to train at one of the national Olympic training centers, and they brought me along to interpret.  We spent a week at the training center in Guadalajara, and I interpreted for everything from practice sessions to our head coach explaining his philosophy of judo.

At the end of the week, we all piled into a bus and headed to Mexico City for the annual national tournament–a few of us grown-ups, our kids, and a lot of young Mexican children.

The bus ride was long, and going through the mountains, it was coooold.  As kids got cranky and the ride got miserable, I decided to kill two birds with one stone: distract the kids for a while, and take advantage of an opportunity to improve my Spanish.  I asked the busfull of kids what I sounded like when I spoke Spanish.  Could they imitate me?

I thought that I would learn something that I already knew–hilarious imitations of my aspirated voiceless stops, ludicrously elongated syllable nuclei, vocalic offglides, and the like. I figured that the kids would get a laugh out of it.  In fact, when you learn to do linguistic fieldwork on under-studied languages, you’re encouraged to go to adolescents for feedback–the idea is that teenagers being what they are, they might be less deferential than adults, and more willing to tell you the truth about how bad you sound.  No one was biting, though.

Finally, I managed to convince one of the older guys to speak up.  For once, the bus got quiet.  Well…you always say estoy contento (“I’m happy”) instead of  estoy feliz (also “I’m happy”). 

The kids roared–apparently, they had noticed.  They’re…you know…synonyms, he added apologetically.  Sometimes you use synonyms wrong.

Now other kids jumped in with poor synonym choices that I apparently made quite regularly.  Who knew??  It seemed to be the case that I made a lot of poor synonym choices, because this activity kept the kids in stitches for quite a while.  A bus-wide global meltdown was averted, and we reached Mexico City without any major traumas.

This story came back to me today while reading the comments on a blog post that I wrote the other day about faces.  A question came up: I gave the French words figure and visage for the English word “face,” but what about the French word face?

Simple answer: I didn’t know that the French word face meant “face.”  To my knowledge, I’ve only ever heard it in the expressions face à and faire face à.  My old nemesis: synonyms.

What is a synonym, though?  Here’s the definition from Merriam-Webster:

Screenshot 2016-06-17 06.50.30
Picture source: http://www.merriam-webster.com/dictionary/synonym

Linguists don’t typically like that definition of “synonym,” though.  Meaning is really, really hard to pin down (we’ve had a couple of posts on the difficulty of describing word meanings, looking at a number of options for doing so, none of which works out perfectly–see here for representing meanings with necessary and sufficient conditions, and here for representing meanings with prototypes).  We tend to use a definition more like this: two (or more) words are “synonyms” if they can freely replace each other in all contexts.  The idea would be that if you can say pail every place that you can say bucket, then they’re synonyms.  If you can’t, then they’re not.

The thing is this: on this “distributional” definition of the term “synonym,” there are almost no synonyms.  In American English, I can think of two pairs of synonyms:

  • pail/bucket
  • stone/pit (in the sense of the seed of a succulent fruit–a peach, or a plum, or an apricot)

Bullshit, you’re thinking–English is full of synonyms.  Good, virtuous, righteous, moral.  Bad, wicked, sinful, immoral.  If you look at data, though, you’ll soon see that there are almost no words in English that have this characteristic of being freely replaceable.  Rather, words that we think of as synonymous usually have subtle differences in how they’re used in the language.  In technical terms, they have different “distributions.”

Let’s take two words that I imagine every native speaker of American English would think of as synonymous: big, and large. 

All of the data on big and large in this post comes from Douglas Biber, Susan Conrad, and Randi Reppen’s 1998 book Investigating language structure and use, published by Cambridge University Press.  The graphics are from my lecture notes and are based on Biber, Conrad, and Reppen’s data.

There’s a nice collection of naturally-occurring English texts called the Longman-Lancaster Corpus.  It contains 5.7 million words from fiction and from academic prose.  If you count the number of occurrences of big, the number of occurrences of large, and then convert those counts to frequencies per million words, you get this:

big versus large
Picture source: http://www.merriam-webster.com/dictionary/synonym

What are we seeing here?  If we look at the combined texts, we see that large occurs more frequently than big, and that’s about it–not much of interest.

If we break out the two categories of texts, though–academic prose, and fiction–something jumps out at us.  The two words have very different distributions in academic prose and in fiction.  In academic prose, large is far more common than big.  On the other hand, in fiction, big is far more common than large.  What the hell?

Let’s look at the contexts that the words show up in.  We’ll separate out academic prose and fiction, and within those categories, we’ll separate out big and large.  For each one, we’ll show the most common words that appear to the right of the word in question.

Screenshot 2016-06-17 07.15.10
Picture source: me.

We’ll only show words that show up to the right of these words at least 1 million times.  In the academic prose, that only leaves two–remember how big the bar for large was compared to the tiny bar for big in the academic prose part of the graph above.  In fiction, we see both, although you’ll notice that the numbers for the words to the right of large in the fictional texts are much smaller than the numbers for the words to the right of large in the academic prose–large just doesn’t show up as often in the fictional texts.

Think about the two sets of words–the ones that show up after big, and the ones that show up after large–and you might notice something:

  • big tends to appear before physical objects.
  • large tends to appear before amounts and quantities.

How does that relate to the differences in the distributions of the two words across academic prose, and fiction?

  • Fiction contains lots of physical descriptions, which can refer to size (and therefore uses big)
  • Academic prose is more likely to use measurements to describe size (and therefore is less likely to use big)
  • Academic prose deals more with amounts and quantities (and therefore uses large)

I’ll try not to drone on and on with details, but the effect is quite robust.  It shows up at longer distances, such as when the words are separated by an adjective: big black eyes, big black saucepan, big black mongrel dog.  It shows up when the words follow the words that they modify: The cart was not really big enough…. The revolver, which looked big enough to…. The ratio is large enough, however…. …a finite number of steps (which may be large enough to…

The moral of the story: could you substitute big and large for each other?  You could–it’s not like it’s not interpretible if you say large revolver or big quantity.  You probably do produce things like that–I’m sure I do, too.  This stuff is probabilistic–it’s about frequencies, about what you do more often or less often, not about always or never.  But: if you sound like a native speaker, you mostly don’t just swap these two words in and out randomly.  The distributions are different: if you’re a native speaker, you don’t just substitute big and large for each other freely.  You use them differently, in ways that are so subtle that you’re almost certainly not aware of it.  (I sure as hell wasn’t before I read the book.  I’ll point out that I’ve given linguistics graduate students the homework assignment of finding differences in the use of big and large for maybe ten years, and in all of that time, exactly one student has come up with this.)

So: back to the three French words visage, figure, and face, all of which correspond to the English word “face.”  How the hell could I not know that face meant “face”?  Why have I only ever heard it in face à and faire face à?  And why can’t I figure out the difference between visage and figure?  Let’s look at some data.

I went to the Sketch Engine web site.  This gives me access to a bunch of big collections of texts in an astounding variety of languages, and a tool for searching those collections.  The tool will also do analyses of statistical data–what other words a word tends to occur with in those text collections, what verbs it tends to be the subject and the object of (if it’s a noun), what nouns it tends to have as its subjects and objects (if it’s a verb), and so on.

I picked a corpus (collection of linguistic data) called frTenTen, just because it’s big–9.9 billion words.  For each word–visage, figure, and face–I got an analysis of the words that it tends to occur with, and the structures that it tends to occur in–what verbs it tends to be the subject and object of, which prepositions it tends to modify and to be modified by, and so on.  You can see screen shots of the three analyses below.

The first thing that we see is that the frequencies of the three words are different, and face is actually the most common.  In 9.9 billion words of French text, this is how often they show up:

  • visage: 115 times per million words
  • figure: 48 million times per million words
  • face: 258 times per million words

Seriously?  How did I miss face, when it shows up more than twice as often as visage, which shows up more than twice as often as figure?  If we look closely at how these words tend to combine with other words and structures, it starts to make sense.  In what follows, I’m going to focus on two things: (1) the kinds of words that modify the word that we’re talking about, and (2) the kind of words that it gets coordinated with–in other words, what kinds of words show up on the other side of the word “and” or the word “or” with the word in question.

We’ll start with le visage.  To begin with, let’s look at the words that modify it.  Visage is a noun, so these are probably going to be adjectives.  Why do I care about the words that modify it?  Because different kinds of things tend to get modified with different kinds of words.  Kittens are cuddly, warm, and cute.  Sharks are hungry, vicious, and deadly.  Knowing something about the kinds of words that modify something tells you something about how the people who speak a language think about that thing.
So, the words that modify visage: look at the box to the left in the figure below, labeled modifier.  Here are the words that we see most frequently modifying visage in that 9.9-billion-word sample:

Screenshot 2016-06-17 02.55.46
“Word sketch” of the French noun “visage.” Picture source: me.

Definitions from WordReference.com:

  • pâle:
  • impassible: impassive, calm, emotionless, and many related words
  • angélique: angelic
  • familier: familiar
  • souriant: smiling, cheerful, happy
  • ovale: oval
  • fin: in ths context: small or thin, according to what I found on Linguee.fr.

The generalization that I would suggest here is that these are all words that you would not be surprised to see being used to describe a human face.

Now let’s look at the words that most frequently show up with visage on the other side of the words “and” or “or.”  I care about this because words are often combined by and or or with similar categories of words.  For example, nouns tend to get joined with other nouns, verbs with other verbs, etc.  This time we’ll look at the fourth box from the left, labelled et_ou.  Let’s see if that suggests anything to us about how to understand visage:

  • cou: neck
  • corps: body
  • cheveu: hair (this probably shows up as cheveu rather than cheveux because Sketch Engine oftend does something called “lemmatization:” converting all forms of a word into what you might think of as their “basic” form–in the case of nouns, the singular form)
  • silhouette: profile, shape, contour
  • oeil: eye
  • lèvre: lip
  • sourire: smile

The generalization that I would suggest here is that these are mostly body parts.  Not surprising, if visage is a body part.

Now let’s look at the word that I’m struggling with–la face.  Here are the statistics:

Screenshot 2016-06-17 02.53.23
“Word sketch” of the no: northun “face” in French. Picture source: me.

Once again, let’s look at the most frequent modifiers.  Here’s what we get:

  • nord: north
  • arrière: rear
  • visible: visible
  • postérieur: back, posterior
  • ventral: ventral (this word refers to the side that your stomach is on.  To see why this is a useful word from an anatomical point of view, think about a person, and a fish.  On a person, the belly is to the front, while on a fish, the belly is on the bottom.  Using the word ventral lets you refer to the side that the stomach is on, regardless of the orientation of that side (forward, or down).
  • sud: south
  • latéral: lateral (side)

Here are some uses of ventral (and its opposite, dorsal)–scroll down past them to continue reading:

A totally different set of modifiers from visage!  These sound a lot more like words that word describe one of the several faces of a mountain, or of a building.  When we look for the words that face occurs with in coordinations with et or ou, we find:

  • pile: In pile ou face, it’s “heads or tails.”
  • profil: profile.
  • dos: back
  • cou: neck
  • arête: bridge (of nose)
  • soir: evening
  • samedi: Saturday
  • finale: final

Some of those are consistent with the interpretation of face as a body part–profile, back, necks, bridge of the nose.  The others aren’t.

When we look at the “word sketch” for figure, there’s very little that suggests that the word is used as a body part–at any rate, not as often as it’s used for other meanings:

Screenshot 2016-06-17 02.57.10
“Word sketch” of the French noun “figure.” Picture source: me.

So, what insight does this give us?  For one thing: it’s not surprising that I haven’t come across face with the meaning “face (of a person).”  Rather, it seems to be used more often for the “faces” of objects–buildings, mountains, computers, etc.  For another thing: it’s surprising that I’ve come across figure with the meaning “face” at all, since it doesn’t seem to be used for that as often as it’s used with other meanings.  Finally, the major point: it’s hard to see any of these as synonyms for the others, as the patterns of usage are quite different.  On the definition of “synonym” as “word that is freely replaceable for another word,” these aren’t.

Having said all of this: I don’t mean to imply that synonymy is not a useful concept.  In fact, there’s an enormously useful resource called WordNet that is organized completely around the notion of synonymy.  WordNet encodes relationships between words.  But, what’s the definition of word?  For WordNet, it’s what they call a “synset:” not a single word, but the full set of synonyms for that word.  Synsets are the basic unit of WordNet–this whole (very useful, as I said) resource is organized as relationships between them.

The kids did great at the tournament.  As Jigoro Kano, the founder of judo, would have put it: the ones who won got positive feedback on their training, and the ones who lost got valuable insight into the things that they needed to work on.  I got off the plane in the US a couple days later with my boots coated with dust from an Aztec temple, and thought a lot about how small the world is these days.

Paris is not all avant-garde theater and haute couture, but it charms me nonetheless

Music, the junkie across the street, and the Cratylus.

“The truth about heroin.” Let me just point out (a) how cool it is that “heroïne” is spelt with a tréma (umlaut), and (b) that if you search Google Images for accro paris (junkie Paris), you would not believe how many pictures of Paris Hilton you find. Picture source: http://fr.drugfreeworld.org/drugfacts.html

It’s on evenings like this that I especially appreciate summertime in Paris.  The Euro 2016 is in full swing, and the streets of my neighborhood are full of little crowds of soccer fans wearing the jerseys of their team and chanting and singing in the language of whatever country they happen to be from.  Weaving in and out of them are women in cute dresses and impressive heels–not unusual here, but especially salient to me today due to their contrast with the junkie nodding off in a doorway across the street, who I would guess couldn’t tell you what she’s wearing or, indeed, what feet are for.

Fête de la musique. Photograph by Nicolas Vigier, en domaine public sur Flickr.

It’s also June 21st, the summer solstice and the day of the Fête de la musique, the annual festival that is marked by musical happenings large and small all over France.  Standing on my balcony (don’t get excited, it’s about the size of a phone booth and occasionally splattered with bird shit), I can hear a guy playing the guitar and singing in front of a set of speakers that are much, much bigger than his limited skills merit.  In other neighborhoods you might hear a local choir singing on a street corner, or a full brass section in a park, or whatever.  It’s totally cool.

Screenshot 2016-06-21 14.06.56
Russian soccer hooligan association president Alexander Shprygin gets thrown out of France, two days later tweets a picture of himself at a match in Toulouse, and lands in jail this time. Picture source: Twitter.

Of course, with crowds come assholes.  I took a break from memorizing vocabulary about semantics and knowledge representation to take a walk by the Eiffel Tower just now, and saw seven guys running the ball-and-cup scam (the norm would be zero to one), including one guy who was speaking Russian (there are tons of them in town–the president of the Russian soccer hooligan association was escorted onto a plane and out of the country by the French police a couple days ago; today he tweeted a picture of himself in a stadium in Toulouse, having taken advantage of the Schengen Agreement to get back into France, and is now sitting in a jail cell) and one guy who, by his accent, his enthusiasm, and his backwards ball cap, seemed pretty clearly to be an American.  It is, after all, entre chien et loup at the moment, I guess–dusk, when dogs go home and the wolves come out.  Back to my apartment to memorize vocabulary and feel grateful that if Europe has to end, I had the good fortune to see a bit of what the glory was like first.

  • le shit: hash, pot. Probably not what the junkie across the street has been doing today.
  • l’essentialisme: essentialism.  Easy enough to spell, but I have no clue how to pronounce it–seems like there oughta be some accents in there somewhere.  This is the idea that language is the way it is because it reflects something real about the world–Cratylus’s position in Plato’s Cratylean dialogues.
  • l’arbitraire (n.m.): arbitrariness.  This is the idea that language is the way it is purely as a matter of social convention and chance–Hermogenes’s position in the same.
  • le normativisme: the attitude that language is something to be regulated by fiat.


How to get a haircut in any country but France

A friendly smile will get you a long way in most places. Saying that kids and dogs are cute helps, too.

yul brynner
Yul Brynner–much more handsome than me, but every bit as bald. Picture source: https://www.youtube.com/watch?v=ObdUuw5JETo

I’ve been getting my hair cut every Friday by the same person whenever I’ve been in France over the course of the past two years, but I still can’t get her to cut it as short as I want.  I’m an old man and am mostly bald, and my preference is to have every bit of hair removed.  However, Nadine always sends me out the door with some very short hair left.  At first, I thought it was because my French isn’t good enough to explain exactly what I want.  However, one day I was sitting in the chair.  Nadine cut my hair as we chatted about tout et rien–nothing in particular.  Suddenly I realized that she understands what I want just fine–she just doesn’t agree that I should wear what little I have left that short.  Cutting hair is her métier, and she knows what looks best, right?

In general, you can get what you need in life most places, even if you don’t speak the language–pointing at things and a friendly smile will get you a long way.  Since I like to be totally bald, I need to get my hair cut frequently, and since I’m on the road a lot, that means that I have gotten my hair cut all over the world.  I’ve come across a way to explain what I want without words: I carry a picture of Yul Brynner on my phone, and I just point at it, point at my head, and of course smile politely.  At the moment I’m in a little town on the west coast of Japan, and I was feeling kinda shaggy, so I snuck out at lunch today and walked to a little barbershop down the street.  The interaction went like this–if it’s in italics, it happened in Japanese, which you’ll soon see that I don’t speak at all:

Me, entering empty little shop: Excuse me?

Little old lady comes out, sees me, immediately looks worried, and disappears.  Young lady comes out.

Her: Welcome!

Me: English-language-do know-do question?

Her: aaaah, I’m sorry, no.

Me: pull out phone, point at picture of Yul Brynner, smile.

Her: ah, OK.  あなたが散髪をしたいですか?

Me: Excuse me, Japanese-language-do know-not.

What followed was a wonderful experience.  Getting your hair cut in Japan can be really nice, and I highly recommend it as a non-touristy sort of experience.  The routine can vary, but my haircut today involved warm shaving cream, hot towels, a razor, and a backrub–not an unusual sequence of events in a Japanese barbershop.

Her: Do you like it?

Me: This-do, enjoy-is! 

Her: You speak Japanese so well!

Me: No—skill yet is-not!

(In many, many, many cultures, the right answer to You speak X well!  is always, always, always some version of no, I don’t!  Thanking the person for the compliment just reveals a lack of understanding of the culture.  Japan is one of those places.)

Smiles all around.  I paid, her toddler came out and we waved bye-bye a lot–cute is the case, isn’t it so?–pretty much the most useful Japanese expression I know–and back to work I went.

It amazes me that despite the fact that I’ve spent a grand total of 11 weeks of my life in Japan over the course of the past 15 years or so, I’m able to pretty much get all of my needs met here despite not speaking the language at all, while in France, I remain befuddled by the most basic tasks–getting phone service, keeping phone service, buying a nail file, figuring out how to get to work when there’s a train strike.  I have to say, though: I love the challenges.  Accomplishing the tiniest things in France can give me enormous satisfaction, and really adds to my enjoyment of the expat experience.  Check it out–there can be something liberating about being completely dans le brouillard–“at sea.”

Useful Japanese vocabulary:

  • kawaii, ne!  Cute, huh!  Use it for kids and dogs–great conversation-starter.  Yes, the ii has to be long.

The modern human face

Anatomically modern humans have a facial structure that is different from our evolutionary predecessors and close relatives. Here’s how to recognize it.

I’ve occasionally read that Neanderthals were so similar to modern humans that you wouldn’t notice one if you walked by them on the street.  This is probably not true.  Leaving aside the question of whether people who write things like that know anything about what I, personally, am and am not likely to notice, anatomically modern humans have quite different facial structures from anything else out there today, and also from any of our hominid relatives.  That includes Neanderthals.

In recent posts, we’ve talked about three unique features of the anatomically modern human face:

If you’re not sure about what any of those mean and you want to know, follow the links, which will take you to illustrated posts on each of those individual features.

The tendency to notice faces, and the ability to read facial expressions, seem to be very important in humans, based on things like the sophistication of the musculature that we have for controlling facial expressions, the amount of the motor nervous system that is developed to controlling those muscles, and the skill that most humans have in recognizing facial expressions of emotion.  For an accessible discussion of the psychology and biology of all of this, see this Wikipedia page.  Chimpanzees are lousy at recognizing human facial expressions–dogs can be pretty good at it, though.  (There’s a lot of variation here, so don’t get pissed at your dog if he doesn’t seem to be up to the task.  He undoubtedly has other charms.  Another cool thing that dogs can learn to do, but chimpanzees can’t: understand that when you point in a direction, they should look that way.)

If you’ve read the preceding posts, and you can remember these three features–forehead, chin, and being located under the eyes–then I’m guessing that you can impress your kids the next time you go to the zoo/museum/catacombs by explaining what to notice about the faces of the skulls of the various and sundry critters that they’ll see.  Want to test yourself?  Here are some skulls to check out.  See if you can tell which are anatomically modern humans and which aren’t.  Answers at the bottom of the post, along with some French vocabulary for talking about faces.



human and neanderthal Skulls-800x430
human infant skull replica product-754-main-original-1415040576
australopithecus skull Mrs_Ples.jpg
bornean orangutan variants_large_3886
human skull discolored clone s521972503441136676_p925_i1_w640.jpeg


  1. Modern human.  This is the ballot box from Yale’s Cross and Bones society.  Picture source: http://www.thehistoryblog.com/archives/4503
  2. Modern human in the front, Neanderthal behind it.  Picture source: http://www.rawstory.com/2015/12/dramatic-wall-of-skulls-to-be-built-at-london-museum-to-illustrate-human-evolution/
  3. Modern human infant.  (Trick question.)  This is a reproduction of the skull of a deceased 4-month-old child.  Human infant skulls are similar to chimpanzee infant skulls in that they both have foreheads (which the chimp will lose as it ages), but note that the human infant’s face is located beneath the eyes.  Picture source: https://boneclones.com/category/child-skulls/human-anatomy#view=grid&category=76&page=1&pageSize=30
  4. Australopithecus.  No lower jaw, so you can’t look for a chin, but notice the lack of a forehead and the forward-protruding muzzle (i.e., the face is not located under the eyes).  Picture source: http://www.wikiwand.com/en/Australopithecus
  5. Chimpanzee–underside of skull.  You don’t have to look for a forehead or a chin to know that this wasn’t an anatomically modern human–the muzzle protrudes way out in front.  Picture source: http://www.dlt.ncssm.edu/tiger/360views/Hominid_Skull-Chimpanzee_1200x900/bottom.htm
  6. Bornean orangutan.  Picture source: http://www.skullsunlimited.com/record_species.php?id=1767
  7. Modern human.  (Replica.)  If you got this one wrong: maybe the discoloration threw you off?  It’s totally typical, though–forehead, face below the eyes, and a chin.  Picture source: http://www.boneroom.com/store/c115/Museum_Quality_Human_Skull_%26_Skeleton_Casts.html
  • la figure: face.
  • le visage: face. I think this might be a higher register of language than la figure–perhaps more literary?  Not sure.  Here’s a link to the Noir Désir song Des visages des figures, just for fun: https://www.youtube.com/watch?v=wW533pwLMv0.
  • le sourcil: eyebrow.  The l at the end is silent, unlike most word-final l‘s.
  • le cil: eyelash.  This l is pronounced.
  • la joue: cheek.
  • un œil: eye.  Pronunction: [œj].  That is: the L is silent.  Follow the link to the Lawless French web site if you want to hear a recording of the proper pronunciation.  (I threw this one in despite the fact that we all probably know it because I was recently in a French theater class, and I noticed that NONE of the students (including me) was sure how to pronounce it when it would come up in a play, despite the fact that we all had good enough handles on French to talk about Molière in it.  Like I always say–it’s the little things that get you…)
  • les pattes: sideburns.
  • la barbe: beard.
  • barbe-à-papa: cotton candy.
  • la gueule: mouth.  Ta gueule!  Shut up!
  • être très physionomiste: to have a good memory for faces
  • le/la physionomiste: bouncer






Compound nouns: why my kid said friendgirl instead of girlfriend

The errors of a child learning their native language can be tremendously interesting.

french knife vocabulary 09c37ab6157f4e281abd6477065caf2fWhen my kid was about four years old, he went through a period where he switched the orders of certain kinds of words.  It wasn’t random–this happened only with a particular kind of word formed by putting two nouns together.  For example, he would say:

  • light kitchen instead of “kitchen light”
  • friendgirl instead of “girlfriend”

On the other hand, if there were a noun preceded by an adjective, he got the order right:

  • big kitchen
  • mean girl

The phenomenon has some implications for theories of how children learn language.  In particular, it’s difficult to give a simple behaviorist explanation for this phenomenon, where the kid gets exposed to stimuli, repeats them, and gets reinforced for producing them correctly: to my knowledge, the kid was never exposed to things like friendgirl.  There are also interesting things about his pronunciation of these things on a smaller scale, though, and in particular, how we make compounds–read on, if you want to know more.

One of the most difficult problems in getting a computer to understand language is understanding compound nouns.  These are nouns that are made up of two or more words in a sequence.  The toughest ones can be compounds where the words that make up the compound are both nouns. For example, in English:

  • school bus
  • kitchen cupboard
  • fire engine

I’ve given you examples where the two nouns are written with a space between them, but they might also be spelt with a hyphen, or without a space.  For example:

  • gunboat (no space)
  • timesheet (no space)
  • rainbow (no space)
  • gun-carriage (hyphen)
  • train-spotting (hyphen, and yes, you are allowed to argue about whether or not spotting is a noun)

From a theoretical perspective, there isn’t a distinction between these–they’re all compound nouns.  From the point of view of writing a computer program that deals with language, we would tend to treat the ones that are written with a hyphen or with no space as single words that don’t necessarily get analyzed further, but the ones written with a space usually need special treatment.  (In fact, amongst people who do natural language processing, there’s a whole field of research concerning what are called multi-word expressions. 

From both a theoretical and a practical perspective, the big question about compound nouns is: how can you describe, understand, and get a computer to deal with the different kinds of relationships that can exist between the nouns?  It’s not a random thing–languages tend to exploit particular kinds of relationships in compounds.  Even describing these things from the perspective of theoretical linguistics is tough, though, separately from the practical problem of getting a computer program to process them.  A classic English example (due, I believe, to the recently departed linguist Chuck Fillmore) is the names for different kinds of knives in English.

  • bread knife: a knife for cutting bread
  • butter knife: a knife for spreading butter
  • pocket knife: a knife that is carried in a pocket
  • butcher knife: a knife that is used by a butcher
  • palette knife: a knife that is shaped like a palette
  • utility knife: a knife that is used in food preparation
  • paring knife: a knife that is used for paring
  • steak knife: a knife that is used for cutting steak
  • boning knife: a knife that is used to trim meat from a bone
  • boot knife: a knife that’s meant to be carried on or in a boot

Just with this partial list, we can see some patterns of semantic relationships between the nouns in the compound:

intended material bread knife, butter knife, steak knife
used by butcher knife
used for paring knife,  boning knife
carried in pocket knife, boot knife
 shaped as  palette knife
dog bones 1003118_10201602413728925_39172732_n
Dog bones at a Hungarian butcher shop in Cleveland, Ohio. Picture source: me.

How should we classify utility knife?  Or dog bone?  I don’t know.  As I said, this is difficult–it’s not like this is something that they teach you in linguistics grad school.  And, do you get to just make these kinds of relationships up on an ad hoc basis?  If so, you’ve got descriptions that couldn’t possibly be shown to be wrong, and from a scientific point of view, that’s bad–your theories need to be testable, and falsifiable.  (Generally we assume that we can’t prove anything, but we do try to construct theories in such a way that if they’re wrong, in principle we should be able to demonstrate that.)  Some people have proposed limited sets of relationships that they hope can capture all such compound nouns–for example, the Generative Lexicon theory of James Pustejovsky.  It’s not clear that all of the issues that are involved in this are resolved, though.

Rather than this kind of noun-noun compound, French generally has nouns modified by prepositional phrases.  That is, you have the noun, then a preposition, and then another noun.  For example, compare these English and French nouns:

railroad (rail + road) chemin de fer
windmill moulin à vent
wine glass verre à vin
goods transport transport de marchandises
shaped as palette knife

For more examples, see the picture in this post, which shows the vocabulary for a variety of kinds of knives in French.

It’s not the case that all French nouns of this sort follow the prepositional phrase pattern–for example, we have homme grenouille, “frogman.”  But, the pattern with the prepositional phrase is much more common. Having said that: one of the biggest mysteries of French for me is how you know when the preposition will be de versus à.  Is there some principle that would let me know that it’s a boîte à gants (glovebox) and a cuillere à café (coffee spoon), but a animal de compagnie (pet) and a crème de cacao?  A boîte à bijoux (jewelry box), but a boîte d’allumettes (matchbox)?  A boîte à chaussures (shoebox), but a boîte de nuit (nightclub)?  I have no clue.

Some details of compound nouns in English: the pronunciation of these things is different from phrases with adjectives.  In general, in a compound noun, you’ll have the stress on the first noun, e.g.:

  • chef’s knife is pronounced CHEF’S knife, while David’s knife would usually be pronounced equal stress on both words.
  • coffee spoon is pronounced COFFEE spoon, while yellow spoon would be pronounced with stress on both words.
  •   beat box is pronounced BEAT box, while big box would be pronounced with stress on both words.

Some details of compound nouns in French: I have no clue how to pluralize these things, and I’m not sure that all French people do, either.  Here’s what the Wikipedia page on French compound nouns has to say on the topic.  It breaks the compounds down to what they’re made up of: a noun plus a  noun, a verb plus a noun, a noun plus a verb, etc.:

  • noun + noun: pluralize both.  Example: oiseau-mouche, oiseaux-mouches (hummingbird).  Exception: I don’t understand the Wikipedia explanation for this, but sometimes you only pluralize the first noun: des chefs-d’œuvre (masterpiece), des arcs-en-ciel (rainbox).
  • verb + noun: plural only at the end.  Example: cure-dent, cure-dents.  Exception: I don’t understand the Wikipedia explanation for this, either, but sometimes you don’t mark the plural at all: des chasse-neige (snowplow) (= chasser la neige, devenu variable dans l’orthographe de 1990), des trompe-l’œil… (direct quote from Wikipedia)
  • adjective + noun: pluralize both.  Example: la basse-cour, des basses-cours (farmyard; chickens and rabbits; outer courtyard).
  • verb + verb: don’t mark the plural at all.  Example: des garde-manger (pantry).

If you’d like to know more about the Generative Lexicon theory and how it accounts for these kinds of relationships between nouns, but don’t feel like you want to tackle the primary sources (I have a PhD in linguistics and I’ve never been able to finish working my way through the last chapter), there’s a book called Generative Lexicon theory: A guide, by James Pustejovsky and Elisabetta Jezek, coming out. For a detailed discussion of relationships in this kind of noun in French and Italian, see this paper by Pierrette Bouillon, Elisabetta Jezek, Chiara Melloni, and Aurélie Picton. (I got some of the examples in this post from there.)

So, back to my poor kid: why friendgirl and light kitchen, but mean girl and big kitchenHe seems to have come up with some conception of there being a difference between the compound nouns and a sequence of an adjective and a noun.  Remember that he was maybe 4 years old, so no one taught him this.  As is characteristic of kids learning their native language(s), he came up with a hypothesis about how to produce the difference between these things, and what he came up with was an ordering difference for the compound nouns.  So: don’t freak out if your kid comes up with some weird things in the language department, and be aware that it’s mostly not trying to correct them–it’s not like they’re consciously aware of these “rules,” and nothing that you can say to them is going to change them.  However: they’ll figure it out.  Keep Calm And Keep Talking.

Some French vocabulary on the topic:

  • le mot composé: compound word