When my kid was about four years old, he went through a period where he switched the orders of certain kinds of words. It wasn’t random–this happened only with a particular kind of word formed by putting two nouns together. For example, he would say:
- light kitchen instead of “kitchen light”
- friendgirl instead of “girlfriend”
On the other hand, if there were a noun preceded by an adjective, he got the order right:
- big kitchen
- mean girl
The phenomenon has some implications for theories of how children learn language. In particular, it’s difficult to give a simple behaviorist explanation for this phenomenon, where the kid gets exposed to stimuli, repeats them, and gets reinforced for producing them correctly: to my knowledge, the kid was never exposed to things like friendgirl. There are also interesting things about his pronunciation of these things on a smaller scale, though, and in particular, how we make compounds–read on, if you want to know more.
One of the most difficult problems in getting a computer to understand language is understanding compound nouns. These are nouns that are made up of two or more words in a sequence. The toughest ones can be compounds where the words that make up the compound are both nouns. For example, in English:
- school bus
- kitchen cupboard
- fire engine
I’ve given you examples where the two nouns are written with a space between them, but they might also be spelt with a hyphen, or without a space. For example:
- gunboat (no space)
- timesheet (no space)
- rainbow (no space)
- gun-carriage (hyphen)
- train-spotting (hyphen, and yes, you are allowed to argue about whether or not spotting is a noun)
From a theoretical perspective, there isn’t a distinction between these–they’re all compound nouns. From the point of view of writing a computer program that deals with language, we would tend to treat the ones that are written with a hyphen or with no space as single words that don’t necessarily get analyzed further, but the ones written with a space usually need special treatment. (In fact, amongst people who do natural language processing, there’s a whole field of research concerning what are called multi-word expressions.
From both a theoretical and a practical perspective, the big question about compound nouns is: how can you describe, understand, and get a computer to deal with the different kinds of relationships that can exist between the nouns? It’s not a random thing–languages tend to exploit particular kinds of relationships in compounds. Even describing these things from the perspective of theoretical linguistics is tough, though, separately from the practical problem of getting a computer program to process them. A classic English example (due, I believe, to the recently departed linguist Chuck Fillmore) is the names for different kinds of knives in English.
- bread knife: a knife for cutting bread
- butter knife: a knife for spreading butter
- pocket knife: a knife that is carried in a pocket
- butcher knife: a knife that is used by a butcher
- palette knife: a knife that is shaped like a palette
- utility knife: a knife that is used in food preparation
- paring knife: a knife that is used for paring
- steak knife: a knife that is used for cutting steak
- boning knife: a knife that is used to trim meat from a bone
- boot knife: a knife that’s meant to be carried on or in a boot
Just with this partial list, we can see some patterns of semantic relationships between the nouns in the compound:
|intended material||bread knife, butter knife, steak knife|
|used by||butcher knife|
|used for||paring knife, boning knife|
|carried in||pocket knife, boot knife|
|shaped as||palette knife|
How should we classify utility knife? Or dog bone? I don’t know. As I said, this is difficult–it’s not like this is something that they teach you in linguistics grad school. And, do you get to just make these kinds of relationships up on an ad hoc basis? If so, you’ve got descriptions that couldn’t possibly be shown to be wrong, and from a scientific point of view, that’s bad–your theories need to be testable, and falsifiable. (Generally we assume that we can’t prove anything, but we do try to construct theories in such a way that if they’re wrong, in principle we should be able to demonstrate that.) Some people have proposed limited sets of relationships that they hope can capture all such compound nouns–for example, the Generative Lexicon theory of James Pustejovsky. It’s not clear that all of the issues that are involved in this are resolved, though.
Rather than this kind of noun-noun compound, French generally has nouns modified by prepositional phrases. That is, you have the noun, then a preposition, and then another noun. For example, compare these English and French nouns:
|railroad (rail + road)||chemin de fer|
|windmill||moulin à vent|
|wine glass||verre à vin|
|goods transport||transport de marchandises|
|shaped as||palette knife|
For more examples, see the picture in this post, which shows the vocabulary for a variety of kinds of knives in French.
It’s not the case that all French nouns of this sort follow the prepositional phrase pattern–for example, we have homme grenouille, “frogman.” But, the pattern with the prepositional phrase is much more common. Having said that: one of the biggest mysteries of French for me is how you know when the preposition will be de versus à. Is there some principle that would let me know that it’s a boîte à gants (glovebox) and a cuillere à café (coffee spoon), but a animal de compagnie (pet) and a crème de cacao? A boîte à bijoux (jewelry box), but a boîte d’allumettes (matchbox)? A boîte à chaussures (shoebox), but a boîte de nuit (nightclub)? I have no clue.
Some details of compound nouns in English: the pronunciation of these things is different from phrases with adjectives. In general, in a compound noun, you’ll have the stress on the first noun, e.g.:
- chef’s knife is pronounced CHEF’S knife, while David’s knife would usually be pronounced equal stress on both words.
- coffee spoon is pronounced COFFEE spoon, while yellow spoon would be pronounced with stress on both words.
- beat box is pronounced BEAT box, while big box would be pronounced with stress on both words.
Some details of compound nouns in French: I have no clue how to pluralize these things, and I’m not sure that all French people do, either. Here’s what the Wikipedia page on French compound nouns has to say on the topic. It breaks the compounds down to what they’re made up of: a noun plus a noun, a verb plus a noun, a noun plus a verb, etc.:
- noun + noun: pluralize both. Example: oiseau-mouche, oiseaux-mouches (hummingbird). Exception: I don’t understand the Wikipedia explanation for this, but sometimes you only pluralize the first noun: des chefs-d’œuvre (masterpiece), des arcs-en-ciel (rainbox).
- verb + noun: plural only at the end. Example: cure-dent, cure-dents. Exception: I don’t understand the Wikipedia explanation for this, either, but sometimes you don’t mark the plural at all: des chasse-neige (snowplow) (= chasser la neige, devenu variable dans l’orthographe de 1990), des trompe-l’œil… (direct quote from Wikipedia)
- adjective + noun: pluralize both. Example: la basse-cour, des basses-cours (farmyard; chickens and rabbits; outer courtyard).
- verb + verb: don’t mark the plural at all. Example: des garde-manger (pantry).
If you’d like to know more about the Generative Lexicon theory and how it accounts for these kinds of relationships between nouns, but don’t feel like you want to tackle the primary sources (I have a PhD in linguistics and I’ve never been able to finish working my way through the last chapter), there’s a book called Generative Lexicon theory: A guide, by James Pustejovsky and Elisabetta Jezek, coming out. For a detailed discussion of relationships in this kind of noun in French and Italian, see this paper by Pierrette Bouillon, Elisabetta Jezek, Chiara Melloni, and Aurélie Picton. (I got some of the examples in this post from there.)
So, back to my poor kid: why friendgirl and light kitchen, but mean girl and big kitchen? He seems to have come up with some conception of there being a difference between the compound nouns and a sequence of an adjective and a noun. Remember that he was maybe 4 years old, so no one taught him this. As is characteristic of kids learning their native language(s), he came up with a hypothesis about how to produce the difference between these things, and what he came up with was an ordering difference for the compound nouns. So: don’t freak out if your kid comes up with some weird things in the language department, and be aware that it’s mostly not trying to correct them–it’s not like they’re consciously aware of these “rules,” and nothing that you can say to them is going to change them. However: they’ll figure it out. Keep Calm And Keep Talking.
Some French vocabulary on the topic:
- le mot composé: compound word