For several years, my judo club in the States had a number of highly-ranked players on the junior national level. The coaches decided to take them to Mexico to train at one of the national Olympic training centers, and they brought me along to interpret. We spent a week at the training center in Guadalajara, and I interpreted for everything from practice sessions to our head coach explaining his philosophy of judo.
At the end of the week, we all piled into a bus and headed to Mexico City for the annual national tournament–a few of us grown-ups, our kids, and a lot of young Mexican children.
The bus ride was long, and going through the mountains, it was coooold. As kids got cranky and the ride got miserable, I decided to kill two birds with one stone: distract the kids for a while, and take advantage of an opportunity to improve my Spanish. I asked the busfull of kids what I sounded like when I spoke Spanish. Could they imitate me?
I thought that I would learn something that I already knew–hilarious imitations of my aspirated voiceless stops, ludicrously elongated syllable nuclei, vocalic offglides, and the like. I figured that the kids would get a laugh out of it. In fact, when you learn to do linguistic fieldwork on under-studied languages, you’re encouraged to go to adolescents for feedback–the idea is that teenagers being what they are, they might be less deferential than adults, and more willing to tell you the truth about how bad you sound. No one was biting, though.
Finally, I managed to convince one of the older guys to speak up. For once, the bus got quiet. Well…you always say estoy contento (“I’m happy”) instead of estoy feliz (also “I’m happy”).
The kids roared–apparently, they had noticed. They’re…you know…synonyms, he added apologetically. Sometimes you use synonyms wrong.
Now other kids jumped in with poor synonym choices that I apparently made quite regularly. Who knew?? It seemed to be the case that I made a lot of poor synonym choices, because this activity kept the kids in stitches for quite a while. A bus-wide global meltdown was averted, and we reached Mexico City without any major traumas.
This story came back to me today while reading the comments on a blog post that I wrote the other day about faces. A question came up: I gave the French words figure and visage for the English word “face,” but what about the French word face?
Simple answer: I didn’t know that the French word face meant “face.” To my knowledge, I’ve only ever heard it in the expressions face à and faire face à. My old nemesis: synonyms.
What is a synonym, though? Here’s the definition from Merriam-Webster:
Linguists don’t typically like that definition of “synonym,” though. Meaning is really, really hard to pin down (we’ve had a couple of posts on the difficulty of describing word meanings, looking at a number of options for doing so, none of which works out perfectly–see here for representing meanings with necessary and sufficient conditions, and here for representing meanings with prototypes). We tend to use a definition more like this: two (or more) words are “synonyms” if they can freely replace each other in all contexts. The idea would be that if you can say pail every place that you can say bucket, then they’re synonyms. If you can’t, then they’re not.
The thing is this: on this “distributional” definition of the term “synonym,” there are almost no synonyms. In American English, I can think of two pairs of synonyms:
- stone/pit (in the sense of the seed of a succulent fruit–a peach, or a plum, or an apricot)
Bullshit, you’re thinking–English is full of synonyms. Good, virtuous, righteous, moral. Bad, wicked, sinful, immoral. If you look at data, though, you’ll soon see that there are almost no words in English that have this characteristic of being freely replaceable. Rather, words that we think of as synonymous usually have subtle differences in how they’re used in the language. In technical terms, they have different “distributions.”
Let’s take two words that I imagine every native speaker of American English would think of as synonymous: big, and large.
All of the data on big and large in this post comes from Douglas Biber, Susan Conrad, and Randi Reppen’s 1998 book Investigating language structure and use, published by Cambridge University Press. The graphics are from my lecture notes and are based on Biber, Conrad, and Reppen’s data.
There’s a nice collection of naturally-occurring English texts called the Longman-Lancaster Corpus. It contains 5.7 million words from fiction and from academic prose. If you count the number of occurrences of big, the number of occurrences of large, and then convert those counts to frequencies per million words, you get this:
What are we seeing here? If we look at the combined texts, we see that large occurs more frequently than big, and that’s about it–not much of interest.
If we break out the two categories of texts, though–academic prose, and fiction–something jumps out at us. The two words have very different distributions in academic prose and in fiction. In academic prose, large is far more common than big. On the other hand, in fiction, big is far more common than large. What the hell?
Let’s look at the contexts that the words show up in. We’ll separate out academic prose and fiction, and within those categories, we’ll separate out big and large. For each one, we’ll show the most common words that appear to the right of the word in question.
We’ll only show words that show up to the right of these words at least 1 million times. In the academic prose, that only leaves two–remember how big the bar for large was compared to the tiny bar for big in the academic prose part of the graph above. In fiction, we see both, although you’ll notice that the numbers for the words to the right of large in the fictional texts are much smaller than the numbers for the words to the right of large in the academic prose–large just doesn’t show up as often in the fictional texts.
Think about the two sets of words–the ones that show up after big, and the ones that show up after large–and you might notice something:
- big tends to appear before physical objects.
- large tends to appear before amounts and quantities.
How does that relate to the differences in the distributions of the two words across academic prose, and fiction?
- Fiction contains lots of physical descriptions, which can refer to size (and therefore uses big)
- Academic prose is more likely to use measurements to describe size (and therefore is less likely to use big)
- Academic prose deals more with amounts and quantities (and therefore uses large)
I’ll try not to drone on and on with details, but the effect is quite robust. It shows up at longer distances, such as when the words are separated by an adjective: big black eyes, big black saucepan, big black mongrel dog. It shows up when the words follow the words that they modify: The cart was not really big enough…. The revolver, which looked big enough to…. The ratio is large enough, however…. …a finite number of steps (which may be large enough to…
The moral of the story: could you substitute big and large for each other? You could–it’s not like it’s not interpretible if you say large revolver or big quantity. You probably do produce things like that–I’m sure I do, too. This stuff is probabilistic–it’s about frequencies, about what you do more often or less often, not about always or never. But: if you sound like a native speaker, you mostly don’t just swap these two words in and out randomly. The distributions are different: if you’re a native speaker, you don’t just substitute big and large for each other freely. You use them differently, in ways that are so subtle that you’re almost certainly not aware of it. (I sure as hell wasn’t before I read the book. I’ll point out that I’ve given linguistics graduate students the homework assignment of finding differences in the use of big and large for maybe ten years, and in all of that time, exactly one student has come up with this.)
So: back to the three French words visage, figure, and face, all of which correspond to the English word “face.” How the hell could I not know that face meant “face”? Why have I only ever heard it in face à and faire face à? And why can’t I figure out the difference between visage and figure? Let’s look at some data.
I went to the Sketch Engine web site. This gives me access to a bunch of big collections of texts in an astounding variety of languages, and a tool for searching those collections. The tool will also do analyses of statistical data–what other words a word tends to occur with in those text collections, what verbs it tends to be the subject and the object of (if it’s a noun), what nouns it tends to have as its subjects and objects (if it’s a verb), and so on.
I picked a corpus (collection of linguistic data) called frTenTen, just because it’s big–9.9 billion words. For each word–visage, figure, and face–I got an analysis of the words that it tends to occur with, and the structures that it tends to occur in–what verbs it tends to be the subject and object of, which prepositions it tends to modify and to be modified by, and so on. You can see screen shots of the three analyses below.
The first thing that we see is that the frequencies of the three words are different, and face is actually the most common. In 9.9 billion words of French text, this is how often they show up:
- visage: 115 times per million words
- figure: 48 million times per million words
- face: 258 times per million words
Seriously? How did I miss face, when it shows up more than twice as often as visage, which shows up more than twice as often as figure? If we look closely at how these words tend to combine with other words and structures, it starts to make sense. In what follows, I’m going to focus on two things: (1) the kinds of words that modify the word that we’re talking about, and (2) the kind of words that it gets coordinated with–in other words, what kinds of words show up on the other side of the word “and” or the word “or” with the word in question.
We’ll start with le visage. To begin with, let’s look at the words that modify it. Visage is a noun, so these are probably going to be adjectives. Why do I care about the words that modify it? Because different kinds of things tend to get modified with different kinds of words. Kittens are cuddly, warm, and cute. Sharks are hungry, vicious, and deadly. Knowing something about the kinds of words that modify something tells you something about how the people who speak a language think about that thing.
So, the words that modify visage: look at the box to the left in the figure below, labeled modifier. Here are the words that we see most frequently modifying visage in that 9.9-billion-word sample:
Definitions from WordReference.com:
- impassible: impassive, calm, emotionless, and many related words
- angélique: angelic
- familier: familiar
- souriant: smiling, cheerful, happy
- ovale: oval
- fin: in ths context: small or thin, according to what I found on Linguee.fr.
The generalization that I would suggest here is that these are all words that you would not be surprised to see being used to describe a human face.
Now let’s look at the words that most frequently show up with visage on the other side of the words “and” or “or.” I care about this because words are often combined by and or or with similar categories of words. For example, nouns tend to get joined with other nouns, verbs with other verbs, etc. This time we’ll look at the fourth box from the left, labelled et_ou. Let’s see if that suggests anything to us about how to understand visage:
- cou: neck
- corps: body
- cheveu: hair (this probably shows up as cheveu rather than cheveux because Sketch Engine oftend does something called “lemmatization:” converting all forms of a word into what you might think of as their “basic” form–in the case of nouns, the singular form)
- silhouette: profile, shape, contour
- oeil: eye
- lèvre: lip
- sourire: smile
The generalization that I would suggest here is that these are mostly body parts. Not surprising, if visage is a body part.
Now let’s look at the word that I’m struggling with–la face. Here are the statistics:
Once again, let’s look at the most frequent modifiers. Here’s what we get:
- nord: north
- arrière: rear
- visible: visible
- postérieur: back, posterior
- ventral: ventral (this word refers to the side that your stomach is on. To see why this is a useful word from an anatomical point of view, think about a person, and a fish. On a person, the belly is to the front, while on a fish, the belly is on the bottom. Using the word ventral lets you refer to the side that the stomach is on, regardless of the orientation of that side (forward, or down).
- sud: south
- latéral: lateral (side)
Here are some uses of ventral (and its opposite, dorsal)–scroll down past them to continue reading:
A totally different set of modifiers from visage! These sound a lot more like words that word describe one of the several faces of a mountain, or of a building. When we look for the words that face occurs with in coordinations with et or ou, we find:
- pile: In pile ou face, it’s “heads or tails.”
- profil: profile.
- dos: back
- cou: neck
- arête: bridge (of nose)
- soir: evening
- samedi: Saturday
- finale: final
Some of those are consistent with the interpretation of face as a body part–profile, back, necks, bridge of the nose. The others aren’t.
When we look at the “word sketch” for figure, there’s very little that suggests that the word is used as a body part–at any rate, not as often as it’s used for other meanings:
So, what insight does this give us? For one thing: it’s not surprising that I haven’t come across face with the meaning “face (of a person).” Rather, it seems to be used more often for the “faces” of objects–buildings, mountains, computers, etc. For another thing: it’s surprising that I’ve come across figure with the meaning “face” at all, since it doesn’t seem to be used for that as often as it’s used with other meanings. Finally, the major point: it’s hard to see any of these as synonyms for the others, as the patterns of usage are quite different. On the definition of “synonym” as “word that is freely replaceable for another word,” these aren’t.
Having said all of this: I don’t mean to imply that synonymy is not a useful concept. In fact, there’s an enormously useful resource called WordNet that is organized completely around the notion of synonymy. WordNet encodes relationships between words. But, what’s the definition of word? For WordNet, it’s what they call a “synset:” not a single word, but the full set of synonyms for that word. Synsets are the basic unit of WordNet–this whole (very useful, as I said) resource is organized as relationships between them.
The kids did great at the tournament. As Jigoro Kano, the founder of judo, would have put it: the ones who won got positive feedback on their training, and the ones who lost got valuable insight into the things that they needed to work on. I got off the plane in the US a couple days later with my boots coated with dust from an Aztec temple, and thought a lot about how small the world is these days.