Zipf’s Law is why if someone is looking for a web page and types “dogs in marseilles” into the query box, your search engine should pay no attention to the word “in,” some attention to “dogs,” and quite a bit of attention to “marseilles.”
Zipf’s Law describes the frequencies of words: there is a very, very small number of words that occur very, very often, and a very, very large number of words that occur very, very rarely–but, they do occur. This blog is focused on one of the consequences of Zipf’s Law: it means that if you are seriously studying a second language, you are going to run into words that you don’t know every day for the rest of your life.
You know how the matching game works: we have words in English, words in French, and we match them. Today’s words (and a tiny bit of grammar) are taken from the discussion of Zipf’s Law in the book Recherche d’information: Applications, modèles et algorithmes, by Massih-Reza Amini and Éric Gaussier, second edition. Recherche d’information is information retrieval, the task of finding documents in response to an information need: what Google does for you every day. One of the great embarrassments of linguistics is the fact that information retrieval is mostly about language, in the sense that mostly what you’re looking for is web pages with stuff written for them and you use words to find them–and yet, most of the work of information retrieval is done without actually doing anything that looks very much like doing anything with language. At its heart, the technology of information retrieval is almost entirely done with counting and very simple arithmetic–nothing linguistic there. You could think of that very simple arithmetic as taking advantage of Zipf’s Law–the very simple arithmetic is used to figure out things like the fact that if someone is looking for a web page and types dogs in marseilles into the query box, your search engine should pay no attention to the word in, some attention to dogs, and quite a bit of attention to marseilles when it is making the decision about which web pages to put at the top of the search results. Scroll down to find today’s vocabulary items, and click on the pictures of the relevant pages from Amini and Gaussier’s book if you’d like to see those words in context. As for me: a second cup of coffee, go over these flashcards, and then off to the lab. Today’s goal: explain why researchers calculated the ratio of vocabulary size to length of conversation of a bunch of soldiers–after chasing them through the woods, catching them, depriving them of food and sleep, and then interrogating them.
I included La fréquence du second mot because I’ve been trying to understand when to use second and when to use deuxième. If I understand the Académie’s Dire/Ne pas dire page correctly, the Academy would prefer that this be deuxième, but not even the Académie thinks that it’s mandatory to make the distinction:
On peut, par souci de précision et d’élégance, réserver l’emploi de second aux énoncés où l’on ne considère que deux éléments, et n’employer deuxième que lorsque l’énumération va au-delà de deux. Cette distinction n’est pas obligatoire.
On veillera toutefois à employer l’adjectif second, plus ancien que deuxième, dans un certain nombre de locutions et d’expressions où il doit être préféré : seconde main, seconde nature, etc., et dans des emplois substantivés : le second du navire.
I was just getting ready for my day of calculating the ratio of unique words to total words in a bunch of journal articles about spinal cord injury and regeneration when it struck me that there really aren’t enough nice pictures of elephants in our lives.
I was just getting ready for my day of calculating the ratio of unique words to total words in a bunch of journal articles about spinal cord injury and regeneration when it struck me that there really aren’t enough nice pictures of elephants in our lives. Not mine, anyway. Please enjoy the following picture of Chikwenya (left) and Mike (right), two African elephants from Mana Pools National Park in Zimbabwe. The wavy lines in the middle of the bottom part of the photograph are a spectrogram of an elephant “rumble.” See the things labelled F1 and F2 in the panels to the left and right? Those are the first formant (F1) and second formant (F2) of Chikwenya and Mike’s rumbles. In a human language, it’s the height and spacing of the first and second formants that identify the various and sundry vowels. Want to know more about African elephant rumbles? See Anton Baotic and Angela Stoeger’s recent paper on the topic:
What’s a Jewish mother’s favorite metro station? Read to the end of the post and you’ll get the answer, plus a video of a ferret.
I never stop being amazed at how basic some of the mistakes that I still make are, even after three and a half years of intensive study of la langue de Molière. Case in point: the spelling of the tu form of the imperative. The thing that you have to remember is that it doesn’t have an s at the end–except when it does.
The wonderful Lawless French web site gives this explanation of the general rule (keep going for some exceptions):
The imperative tu conjugation for –er, –frir, and –vrir verbs is the present tense minus the final s.
Here are some examples from the Nouvel Obs’s (the form of this genitive explained below in the English notes) description of the informal imperative:
Rentre immédiatement !
Ne discute pas !
Va voir tes grands parents !
OK, an exception: when the verb is followed immediately by y or en, you have an s at the end. Here’s the explanation from the Français Facile web site:
Cependant, devant « en » et « y » qui suivent immédiatement le verbe, on ajoute un « s » au verbe en « er » à l’impératifsingulier, et on le joint par un trait d’union comme tous les pronoms qui suivent un impératif.
Ex. Amènes-y ta soeur.
Cette règle s’applique aussi au verbe « aller »
Fiez-vous à votre oreille. Si vous prononcez le verbe et que le son vous paraît étrange, il peut y avoir un problème.
Mange-en, sans « s » sonnerait d’une façon étrange à l’oreille.
À Londres, vas-y si tu veux, mais amènes-y ta soeur et rapporte-moi un cadeau.
OK: that’s the “first group” verbs (-er)–we’ll return to the –frir and -vrir verbs that Laura mentions in a bit. For -ir and -re verbs, the s is always present.
Now: some exceptions. First, as we’ve seen before, verbs that end in -frir or –vrir sometimes have odd behaviors. (See this post if you want some insights into what they have in common, and how they differ phonologically from other –ir verbs.) These verbs do not have an s in the informal imperative…
Couvre ta bouche quand tu tousses, dégueu !
…except when they do, which is the same as when the first-group (-er) verbs do, i.e. when followed by en or y.
Couvres-en un peu avant d’attraper une pneumonie. (Reverso)
(Native speakers: do you have dissenting opinions about this? I had to ask around a bit…)
Almost at the end! Just four verbs that are totally irregular in this respect:
Aller: Va te faire voir, but vas-y !
Être: always s-final: Sois beau et tais-toi.
Avoir: N’en aie pas marre, c’est bon pour les pépitos ! …butAies-en de meilleures (notes), tes profs te féliciteront
Savoir: Sache qu’elle a vomi ce matin, alors que le thon était frais, but saches-en plus pour réussir ton examen.
So, the Jewish mother: here’s the first joke I ever understood in French. I’m minding my own business in the basement of a bar near the St-Sebastien Froissart metro station (none of your business why I was in the basement of a bar near the St-Sebastien Froissart metro station, or why I’m ever in the basement of any bar anywhere, for that matter) when I heard the following from the table behind me: La station de métro d’une mère juive, c’est laquelle ? Monge, parce qu’elle dit “mange, mange, mon fils.” In English: what’s a Jewish mother’s favorite metro station? Monge, because she says “eat, eat (in French, mange, mange), my son.” Now, this is interesting on a number of levels; the one that I’d like to point out is that it might only make sense to someone who does not speak hexagonal French, and that might be the only reason that I got it. As a monolingual native speaker of English, I can’t hear the difference between the vowels of mange and Monge–we don’t have contrasting nasalized vowels in English, and those two in particular are particularly impossible for me to hear, and pretty tough to pronounce, too, leading me to say things like marde, je t’ai trempée (“shit, I got you wet”–marde is a Canadianism that I can’t seem to get past)and getting responses like “but we’re not going out together!”…which suggests that I pronounce it as je t’ai trompée, “I cheated on you.” I’ll throw in to the mix the fact that I’m told that pieds-noirs (the pieds-noirs, “black feet,” are the French who returned to France after France lost Algeria as a colony in 1962–maybe 800,000 people) don’t differentiate between the nasalized vowels an and on, either. Not surprising–differences in the nasalized vowel inventory are a common feature of francophone dialect differentiation, including in France. What does this joke have to do with the subject of this post? It only works with the informal imperative, i.e. mange, mange (“eat, eat”)–with the formal or plural imperative (mongez, mongez), “eat” doesn’t sound anything at all like the name of the metro station (Monge), and you have no joke.
Here’s a video that has approximately a bazillion examples of the informal imperative. There’s a bit of vocabulary that might help you out here, if you’re not a native speaker of French:
le furet : ferret.
relou : here’s the best I can do for a definition of this word, which I haven’t found in a French-English dictionary as of yet: “Relou” est un mot verlan (langage des rues semblable à un ver lent grignotant doucement… ) signifiant “lourd”. Dans un contexte particulier, désigne une action/personne qui a fait/dit une chose qui a déplu à l’émetteur de ce mot. Source: lachal.neamar.fr. The source gives these synonyms: casse-couille (familier), chiant (familier), casse-pied, and lourd. So: maybe irritating, or “pain in the ass?”
PITA: a less-shocking way of saying “pain in the ass.” This is something somewhat more than annoying. Assembling the appropriate forms in order to be able to fill out the forms that you need in order to get permission to ask for (more) permission from the Dean’s office before doing any international travel is a PITA. (I’m talking about America here–everything you’ve ever heard about French bureaucracy being worse than American bureaucracy is bullshit, period.) My old neighbor was a PITA–always complaining if anyone parked in front of her house, although she didn’t have a car. The constant flood of papers that you have to review when you’re on Christmas vacation is a PITA. The ferret in the video is being a PITA to the cats–hence relou.
An excellent example, both using and defining the abbreviation:
Steven is saved in my phone as Bae…Biggest Asshole Ever. I’m in his as Pita. Pain In The Ass.
Sordid tryst followed sordid tryst. Then there was some phonology. Want to understand what “gonna” means? Read on.
Some years ago, a beautiful summer afternoon found a much younger and cuter me at a picnic, chatting with a new acquaintance. We quickly switched from English (my native language) to Spanish (not my native language), at which point he began telling me, in great detail, about what a slut his wife was. Story of sordid tryst followed story of sordid twist–she even fucked one of the pool boys on our honeymoon…
Linguists often split words into two categories: content words and function words. Content words are words that you could think of as having a fixed meaning–nouns, verbs, and adjectives, for the most part. In contrast, function words tell you things about grammatical and semantic connections–the, to, not–and include words without fixed meanings. That means pronouns–I, we, she…
In languages that have stress, function words are often unstressed. That makes them more likely to be misunderstood, or not to be understood at all. It’s sometimes a problem even for native speakers of such languages, and it can be a really big problem for non-native speakers. This lesson was brought home to me in a big way when I realized that I’d been confusing the pronouns that my interlocutor was using in our Spanish-language conversation. He wasn’t telling me what a slut his wife was–he was telling me what a slut he was. I even fucked one of the pool boys on our honeymoon…
This loss of distinctiveness of pronouns (and other function words) is an example of a process called reduction, which leads to things having a range of ways that they could be pronounced, some of which are less distinct than others. Reduction processes are rampant in spoken American English, and they can make the language pretty difficult to understand if you’re not a native speaker. I’m trying my hand at putting some videos together that aim to help people learn to understand these reductions. You can find the second one, on the topic of the reduction of going to to gonna, at the link below. If you’re as mystified by spoken American English as I am by spoken French, check it out–I’d love to have feedback on what does and doesn’t work, whether that be here on this blog, or in the Comments section on YouTube. Unfortunately, I haven’t figured out the whole subtitle thing, and I’d like to know to what extent that does or doesn’t interfere with the effectiveness (or lack thereof) of the video. Any input at all would be appreciated, though!
Want to know more about reduction in American English? Check out my video on the pronunciation of “let me” as “lemme:”
In which a cook thinks I’m an idiot because of some vowels.
French and English have pretty different sets of vowels. (Vowel inventories is the technical term in linguistics.) One of the basic facts of humans and languages is that we can be unable to hear differences between sounds that we don’t have in our native tongue, and each of the two languages has lots of vowels that the other doesn’t have. When I say that we can’t hear differences between sounds, that implies that there are sounds with which we confuse them, and which sounds those are is not random at all: people categorize the sounds of their language in pretty structured, principled ways, and when they fail to distinguish the sounds in other languages, that “failure to distinguish” manifests itself as (se traduit par, I think, in French) putting sounds from the other guy’s language into the same category as some sound in your language.
The principles by which this kind of thing gets structured can be described in terms of the articulatory characteristics of the sounds (what you do with your mouth parts to make them), the acoustic characteristics of the sounds (what the waveform would look like if you graphed it), and the auditory perception system (how your brain and your peripheral nervous system interpret incoming sounds). I mention this not because I think that you’ll be fascinated by the details of the effects of, say, Helmholtz resonators versus two-tube models (see the picture) of vowels, but so that you know that there’s a reason that you (if you’re a native speaker of English), me, and all of our fellow “Anglo-Saxons” (a term which seems to be falling out of use in France today, but which I still find amusing, since if there’s anything that I’m not, it’s an Anglo-Saxon) are confusing the same vowels.
For English speakers (Americans, anyway–I don’t know very many of our friends from the Commonwealth and wouldn’t presume to speak for them), one problem pair in French is the vowels that are spelt ou and u. Technically, those are both what are called high tense rounded vowels (here’s a post with a link to a nice video about them from the Comme une française YouTube series). In English, we only have the vowel that’s written ou, which is more or less the same vowel that we have in the words who’d and boot. We tend to hear French words with the vowel spelt u as the vowel spelt ou. Both of them are super-common in French; here are some examples, from the amazing site MinimalPairs.net (y is the International Phonetic Alphabet symbol for the French vowel spelt u):
Most of the time, even us Anglo-Saxons (see the disclaimer above) can get by on context: there just aren’t that many times when the situation doesn’t let you figure out whether your waiter is asking you about joue (cheek) versus jus (juice), or when the rest of the sentence won’t give you a pretty good guess as to whether your interlocutor just said coup (a blow, roughly) or q (the letter of the alphabet).
However: there’s one French “minimal pair”–set of two words that only differ by a single sound–that can pretty much always show up in the same context. To wit: au dessus and au dessous. What those mean: roughly, over and under. The only difference in the sounds of those is the ou (which we have in English) of under, and the u of over. Have you seen my cigarettes? Yeah, they’re (on top of/underneath) your sweater. Would you do me a favor and put this (on/under) that box? It happens all the time.
To wit: I was feeling badly in need of an actual meal the other day, but too tired to cook after work. Not a problem, as there’s a little Breton place right across the street from the metro station that’s popular for take-out. I popped in on my way home and ordered a couple gallettes de sarazin–a buckwheat crêpe–one a complet (“with everything”), and one with zucchini and cheese. The nice lady brought them out to me in the bag that you see in the picture, and explained: The complet is on the (top/bottom), and the gratinée is on the (top/bottom).
Fuck: my old nemesis, au-dessus and au-dessous. I gave her a baffled look. She gave me a baffled look right back: what could I possibly not be understanding?? We’d just had an involved conversation on the topic of why I should really be topping off my dinner with her home-made apple crumble (her position on the topic) and why my general fatness suggested that I should not, in fact, be doing so (my position), so why would I suddenly be confused by something that any French toddler would understand? She looked at me for a bit, with that look on her face that means Is this bizarre foreigner jerking me around, or what?, and then finally tried again: en haut–gratinée. En bas–complet. No verbs, no pronouns, none of that fancy stuff–two prepositions, two nouns.
Message received. I left a good tip in hopes of maintaining some semblance of normalcy in the relationship, ’cause I am, in fact, de souche Bretonne (half, anyway), and I do love my cider and chicken gizzards, and that restaurant is the best place in the neighborhood to get them. It’s not like there aren’t other good Breton restaurants in Paris, but this one’s mine, damn it.
Spoken American English can be very difficult to understand. Here’s a video to help you cope with one of the problems therewith.
Walking out of the exam on oral comprehension during the testing for the Diplôme approfondi de langue française a couple months ago, I found a very unhappy-looking young man waiting for the elevator. Are you OK? He shook his head glumly: I flunked again, I know it. I made sympathetic noises. Was this your first time taking the test? I responded in the affirmative. He gave me a look of pity–clearly the expectation was that I was going to find the experience as brutal as he had. Repeatedly, apparently.
Indeed, the oral comprehension exam got me my worst score out of the whole test. Spoken French and spoken English can both be brutally difficult to understand if they’re not your native language, and for many of the same reasons. One of those is their sets of vowels–both languages have vowel “inventories” (the technical term) that are shared by relatively few languages. Another is a process called reduction, which leads to things having a range of ways that they could be pronounced, some of which are less distinct than others. For example, in French, some unstressed vowels are optional in casual spoken language, so that cheveux is often pronounced chveux, matelot can be pronounced matlot, and so on. Furthermore, the sounds that are “left behind” can be changed as a result, so that, for example, the j in je becomes pronounced as ch when je suis is “reduced” to chuis. So, when I describe this as becoming “less distinct,” think about this. In French, there are these two words, and the difference between them is the sound of j versus the sound of ch:
le jar: secret language, argot
le char: chariot; in Canada, car.
When j becomes ch, as in chuis, the difference between the two sounds goes away, and in that sense, a “reduced” word is less distinct from other words than it might have been.
Reduction processes are rampant in spoken American English, and they can make the language pretty difficult to understand if you’re not a native speaker. I’m trying my hand at putting some videos together that aim to help people learn to understand these reductions. You can find the first one, on the topic of the reduction of let meto lemme, at the link below. If you’re as mystified by spoken American English as I am by spoken French, check it out–I’d love to have feedback on what does and doesn’t work, whether that be here on this blog, or in the Comments section on YouTube. Unfortunately, I haven’t figured out the whole subtitle thing, and I’d like to know to what extent that does or doesn’t interfere with the effectiveness (or lack thereof) of the video. Any input at all would be appreciated, though!
I was feeling good the other day. Chatting up a pretty French girl, showing off my (lack of) familiarity with 20th-century French philosophers, and using lots of great abstract nouns–transcendence, immanence, agency, objectivity. Feeling smart, feeling charming, feeling sparkling. Pride comes before a fall, and my fall came hard: Oh, Kevin–your neologisms are so cute. The way that you make up new words!
Shit–Not what I was going for. My mistake: throwing around deadjectival nouns too freely. Overgeneralizing from limited data. The Fallacy of Small N. Bref, as we say in French: I was making nouns from adjectives–transcendence from transcendent, objectivity from objective–but I was using the wrong word endings to do it, trying to generalize from too few examples (overgeneralizing from limited data), extracting a pattern that seemed to have held a few times (the Fallacy of Small N).
You can see from my very small set of examples that English has pretty good facilities for making nouns from adjectives. We call these deadjectival nouns. Start with the adjective objective, add -ity, and you’ve got a noun. Start with the adjective transcendent, add -ce, and you’ve got transcendence. You can see something else from my example, too: you don’t get to add just any ol’ word ending to the adjective. Transcendity? Not OK. Objectiveness? It’s OK, but it means something different from objectivity. French also has pretty good facilities for making nouns from adjectives. And, in French, as in English, you don’t get to add just any ol’ word ending to the adjective–you have to know, for any given adjective-noun pair, what the right word ending is. Let’s look at some examples, including of course some that are more or less randomly chosen from recent things I’ve said that have made people snicker, plus an encounter with the always-hilarious old lady who owns a bookstore near my apartment, and then some more thrown in just to show the diversity of possibilities. I’ve relied heavily on WordReference.com to make this table; if I indicate an English word as NFE, I mean to communicate that there is No French Equivalent, at least according to WordReference.
Impossible or other nouns
la radinerie stinginess
câlin cuddly, affectionate
les câlineries cuddles (plural only, as far as I can tell)
la complicité complicity (which I’ve only ever heard in a positive sense, meaning something like closeness, bondedness)
banal common, banal, mundane
la banalité banality
abrasivité abrasiveness (I think–can a native speaker help?)
la lassitude weariness
ingrat ungrateful; unattractive; unrewarding
l’ingratitude ingratitude, ungracefulness
transcendent, transcendental transcendent
la transcendance transcendence
Things to note:
There is a pretty tight requirement for specific endings to be added to specific adjectives.
There is quite a bit of phonology (technically, morphophonology) going on with some of these endings–for example, abrasif (with an f) versus abrasivité (with a v); impécunieux (with no consonant pronounced at the end of the adjective) versus impécuniosité (with an s, which of course is pronounced as a z).
As far as I can tell, there’s no simple mapping between the ending in English and the ending in French (or vice versa). English -ity might match to French -ité (e.g. English banality, French banalité), but then again, so might -ness (e.g. English impecuniousness, French impécuniosité). Of course, English -ness might map to something else, too (e.g. English stinginess, French radinerie). And, forget remembering how to spell any of this–I can’t spell either language anymore… Make me read and write in Spanish for a week, and I won’t be able to do any of the three…
Is there an easy way to predict, or at least to group together for memorization, any of this? I haven’t found one yet–suggestions appreciated…
To put all of this in a bigger picture: what we’re looking at here–things that can change the part of speech of other words–are what is known as derivational morphemes. The whole phenomenon of derivational morphology has some pretty interesting implications for the nature of human language. You can read more about derivational morphology, and those implications, at this blog post. In fact, we’ve recently been talking about a very particular kind of morphological derivation–zero derivation, or changing part of speech without adding an affix, as in this recent post that discusses why that particular phenomenon is interesting, and then this recent post that explores at some depth the range of zero-derived verbs that come from nouns that refer to parts of the mouth and that refer to some form of communication.
Not everyone would agree that some of the English nouns that I have in the third column are OK–particularly, affectionateness and complicitness.