Before you hit your dog, remember that he can bite your hand hard enough to break it–but, he chooses not to.
Due to some WordPress layout issues, there are occasional gaps in this page. Please scroll down to get past them. Sorry!
In America, we do love our dogs. A culturally common way for us to show our dogs affection is this: we pet them, while saying Who’s a good boy? (or Who’s a good girl?, depending on gender). In my family, we do it a little differently: we pet the dog while saying Who’s got a sagittal crest? Dogs don’t look at you with any more or less puzzlement regardless of which one you pick, so: feel free to go crazy with this one.
What’s a sagittal crest? The next time you run into a dog, run your hand along the center of the top of his skull. That ridge that you feel is his sagittal crest. Sagittalmeans along a plane that runs from the front to the back of the body. A sagittal crest runs along that plane. This sense of crestmeans something sticking out of the top of the head–think the plume on top of a knight’s helmet. Many animals have a sagittal crest, but not us modern humans. You see them in species that have really strong jaw muscles. A sagittal crest serves as one of the points of the attachment of the temporalis muscle, which is one of the main muscles used for chewing. If you have a sagittal crest, you can have a bigger temporalis muscle, which means that you can bite/chew harder.
If you look at relatively close relatives to humans, you see sagittal crests on some of them. To the left, you see a gorilla. You wouldn’t want to get bitten by this guy. (Note that some gorilla species, especially their males, have really enormous sagittal crests–this is actually a pretty modest one, for a gorilla.)
Here’s (an excellent replica of) a Pan troglodytes (common chimpanzee) skull. This guy (I think it was a guy) had more of a sagittal crest than you (you don’t have any), but he didn’t have much, compared to that gorilla. Other chimps vary. Monkey species vary pretty widely regarding the presence or absence of a sagittal crest.
Some hominids that were ancestral to us had sagittal crests, but they disappeared pretty early in the course of our evolution. Here is a picture of the “Black Skull,” about 2.5 million years old. It’s from a type of Australopithecus robustus. By the time Homo erectus comes along (starting about 1.9 million years ago and lasting until about 70,000 years ago), the sagittal crest is gone. Picture below.
So: feel free to express your affection for your dog any way you want–you can’t possibly be any geeker than my son and me. Scroll down past the picture for French vocabulary.
Relevant French vocabulary (see the Comments section for more):
la crête sagittale: sagittal crest
le muscle masticatoire: chewing muscle (note: the “c” in muscle is pronounced in French)
Parsing, data mining, and encryption are not going to get you. That pistol in your nightstand might, though.
Every once in a while an innocuous technical term suddenly enters public discourse with a bizarrely negative connotation. I first noticed the phenomenon some years ago, when I saw a Republican politician accusing Hillary Clinton of “parsing.” From the disgust with which he said it, he clearly seemed to feel that parsing was morally equivalent to puppy-drowning. It seemed quite odd to me, since I’d only ever heard the word “parse” used to refer to the computer analysis of sentence structures. The most recent word to suddenly find itself stigmatized by Republicans (yes, it does somehow always seem to be Republican politicians who are involved in this particular kind of linguistic bullshittery) is “encryption.” Apparently encryption is now right up there with dirty bombs in terms of things that terrorists are about to use to kill us all. (“All” might be an exaggeration. I find it interesting that the United States had 33,169 firearm deaths in 2013–roughly 11 times as many deaths as on 9/11–and yet, Republicans seem to think that it’s important that we make firearms as widely available as possible. I guess they just don’t like people very much.) As a moderately technical person, this strikes me as odd, since I’ve always thought of encryption as that nifty mathematical technique (I was about to say “algorithm,” but I think the Republicans are down on that one now, too) that keeps you from intercepting my text messages, me from reading your Ashley Madison profile, and so on.
In between the Republican outrage over parsing and the current panic over encryption, we had the sudden appearance in the public consciousness of data mining. As far as I knew up to that point, data mining was a bunch of statistical techniques for finding relationships between things. Suddenly it was showing up in scary news stories–Google the phrase “data mining is evil” (you have to put the quotes around it to search for the phrase, as opposed to the individual words) and you will get 1,400 hits as of the time of writing (May 2016).
Besides being bemused by this intrusion of American know-nothingness into public discourse, I have a personal stake in the issue, because people often refer to what I do for a living as text data mining. This is a misnomer–by its nature, data mining is not something that you can do with texts. Bear with me and I’ll explain why, and then we’ll look at some French vocabulary for talking about all of this.
Data mining is basically about databases. In a database, the statistical techniques of data mining can help you do things like discover that Republicans with HBO subscriptions are more likely to consider voting for Romney in a primary than Republicans who don’t have HBO subscriptions. (Real one, if I remember the facts correctly.) You can do that because you have a table in the database that tells who’s a Republican, a table that tells who has HBO subscriptions, and a table that tells you which members of a random sample told the interviewer that they would/wouldn’t consider voting for Romney in a primary. Data mining is the science/art of figuring out what things are related (HBO subscription/willingness to vote for Romney) and what things aren’t related (making one up here: having bought an Escalade and being willing/unwilling to vote for Romney in a primary)–this among probably thousands and thousands of variables. Doing data mining research requires things like knowing particular kinds of math, understanding how to sample a population, getting computers to do complicated calculations in a way that is time-efficient—stuff like that.
With data mining, you have that database, and you know what everything is. With “text mining,” or “text data mining,” as some people call it, you have texts, and you don’t know what anything is. (By “you,” I mean a computer program.) This is usually talked about as a difference between “structured” data (i.e., the database)–you know what everything “is”–what it “means”–in some sense, its semantics. Whoops–that sentence got a little out of control. “Unstructured” data: that’s typically how we would describe text. With text, you know what nothing is–you don’t know what anything means–in a very literal sense, you don’t know its semantics.
“Text mining” could be thought of as turning unstructured data into structured data. You’ve got a bunch of texts, and you want to use it to populate a database, perhaps. Maybe you have 23 million journal articles in the National Library of Medicine, and you want to find every statement that those 23 million articles make about which genes are affected by which drugs. Maybe you have a huge collection of French fairy tales, and you want (the computer) to find every time that a stepmother is mentioned and whether the portrayal of the stepmother is positive or negative. You could think of both of those as turning unstructured data into structured data–you’re taking that unstructured data and using it to build a database about drugs and proteins, or a database about stepmothers. You can see now why we tend to prefer the term “text mining” to “text data mining”–to the extent that “data mining” is about structured data, it doesn’t really make sense to talk about “data mining” with respect to language. Where the data mining person basically just needs to know math, the text mining person needs to know something about how people write about whatever it is that you’re interested in. I do a bit of text mining. People will have really specific requests–tell me whether or not the genes from some experiment show up in the cancer literature, say; tell me if this is a suicide note or not; read this doctor’s note and tell me if this kid is a candidate for epilepsy surgery; stuff like that. It’s not really linguistics, but it pays the bills, and it suits my need to do something that might actually make the world a better place.
A related field is natural language processing. Natural language means human language, as opposed to computer languages. Natural language processing is about building tools to handle specific linguistic tasks–parse a sentence, figure out parts of speech, stuff like that. You might use a combination of different language processing programs to do a text mining task. I find this more interesting, since the questions are less about some set of facts than they are about the language itself. Where the data mining person needs to know math and the text mining person needs to know how people write about genes and drugs, or stepmothers, or whatever, the natural language processing person needs to know something about language itself–what kinds of structures sentences can have, how word frequencies are distributed, how to build linguistic resources for letting a computer process things that can’t be directly observed (e.g. semantics). I do a lot of this kind of stuff. Recently I’ve been working on coreference resolution–how to get a computer to recognize that Obama, President Obama, and Barak Obama are all referring to the same thing in the world, while Mrs. Obama and Michelle Obama are referring to something else in the world. (Recognizing that those “things” in the world are people, as opposed to, say, locations, or the names of companies, is a whole different story.)
Yet another field is computational linguistics. This is about using computational models to test theories about language. This is my favorite, but it’s the hardest to pay the bills with. I do some of this, too. Nowadays a lot of my time goes into large-scale attempts to model the semantics of biomedical language. I’m trying to investigate differences in the semantic primitives of biomedical language versus “general” English by building a large set of data-driven semantic representations of predicates found in journal articles; I’ll then compare that resource to a similar resource built for general English and look for things like whether or not the semantic primitives seem to come from the same set, whether or not given verbs have different representations in the two types of language, etc. My hope is to get a sense of the range of types of semantic variability from this particular project. You could imagine using computational linguistics work to build natural language processing tools, and then using those to carry out practical text mining tasks. You could use the text “data” mining results to do actual data mining.
As you can tell from my examples, I’m very much in the world of biomedical language. There’s also a lot that you can do in the humanities with this kind of stuff. A hot topic in the future might be using mathematical representations of semantics to study things that are/are not thought of as binaries–gender, sexuality, race, political economy, whatever. However, I would not claim to do ANY of that–I can just barely explain it. For more on that kind of stuff, see this excellent post by Ben Schmidt.
In practice, even people in the field don’t always differentiate between these terms, or at least don’t draw sharp boundaries between them. My business card says that I’m the director of a text mining group, but I identify most strongly as a computational linguist. We figured that “text mining” makes more sense as a practical field of inquiry to have within a medical school (which is where I work), so that’s what we called the group when we formed it. If you go to the annual conference of the Association for Computational Linguistics, you will see almost no computational linguistics, but rather a ton of natural language processing. If you go to the annual Biomedical Natural Language Processing meeting, you’ll see a mix of text mining, natural language processing, and a bit of computational linguistics. Sometimes the distinctions really matter, though. This post started its life as a response to someone who asked me to be on a panel about data mining, to talk specifically about text data mining. When I responded that I don’t do data mining, they asked what the difference is–this blog post started out as my response.
As far as I can tell, the relevant community in France doesn’t make these distinctions in any kind of rigid fashion, either, despite the much-vaunted French penchant for categorization (see Nadeau and Barlow’s excellent book for a discussion of where it comes from). However, French does have technical vocabulary for all of these fields. Here it is:
fouiller: to excavate; to rummage through, to search (see also here)
la fouille de données: data mining
la fouille de texte(s): text mining
le traitement automatique des langues naturelles: natural language processing
la linguistique informatique: computational linguistics
There are historical reasons for the large number of beggars in Paris.
The typical stereotype of Paris is as a beautiful, majestically historical city that just oozes romance, and indeed, Paris is all that. But, visitors are often surprised to find that it is also a city with a sometimes astounding number of beggars on the street. The reasons behind this are many, and varied, and, I think, interesting.
In the pre-modern period, the vast majority of the French (like the vast majority of everyone else in the world) were farmers. Most children didn’t live to adulthood, and you needed a lot of hands to work the farm, so people had big families.
In the 1500s, the French death rate took a relatively sudden drop. People were still having those big families, so there were a relatively large number of people making it to adulthood. The inheritance laws of the time included primogeniture, i.e. inheritance of everything by the oldest son, so lots of those people wouldn’t have a farm of their own to work. Options were limited, and if they couldn’t find other employment, a lot of people hit the road. (There’s an excellent description of the mechanics of this phenomenon in Robert Darnton’s The Great Cat Massacre and other episodes in French cultural history.)
If you hit the road in France, you’re eventually going to end up in Paris, if for no other reason than that it’s the hub of the road system (and today, the rail system). If you can’t find other employment, your options come down to begging or stealing, and most people aren’t thieves. So: begging.
Begging actually has a very long and somewhat respectable history in Europe. As Robert Cole puts it: “In the middle ages, ‘Christian charity’ perceived the poor as God’s special children and therefore deserving of alms.” Begging can be a profession, really. (Old Eastern European Jewish joke: beggar hits a guy up for money. Guy gives him some helpful hints on improving his approach. Beggar responds: YOU’RE telling ME how to beg? This would make total sense in a French context: a métier (profession) is a métier, whether you’re a doctor, an engineer, or an elevator operator.)
If you’re gonna be a beggar, though, it helps to have a schtick. Physical lack of ability to work was a good one, and Parisian beggars were known for faking such a disability, leading to their squatting areas being known as Cours des miracles(“Courts of miracles”) for their recovery at the end of the working day. (There was one just to the north of what is now the Place des Vosges, I believe.) By the 1500s, begging wasn’t viewed quite as kindly. Robert Cole again:
In sixteenth-century Paris the poor were viewed as merely layabouts who preferred to live off public welfare. Meanwhile bad harvests, plagues, inflation and religious war increased their number dramatically. Public begging was outlawed in 1536, and in 1551 laws were enacted which limited eligibility for public assistance and forbad women to have their children in tow when selling candles outside churches. To do so, went the rationale, evoked sympathy from prospective customer, which proved that such women were really only begging. A traveller’s history of Paris.
So: there have been a lot of beggars in Paris for centuries. In 2007, the European Union was enlarged to include a couple countries with large Roma populations. There have always been Roma in France, but now a lot more came (the Roma rights group FNASAT says 12,000 currently, and that’s after 10,000 being expelled in 2009 and another 8,000 in 2011; other estimates range from 20,000 to 400,000), and they are a prominent part of the Parisian begging ecosystem. (There is, indeed, a Parisian begging ecosystem, and there are actually a number of distinct genres of begging in Paris–a subject in and of itself.)
Buddhism views charity as an act to reduce personal greed which is an unwholesome mental state which hinders spiritual progress. What Buddhists believe, Venerable K. Sri Dhammananda Maha Thera.
Judo’s view of the best human relationships is mutual welfare–we’re taught that human interactions should be mutually beneficial. So, if it’s the case that charity benefits both the giver and the receiver, then it’s very judo. Seriously, give charity–if for no other reason than that you’ll feel better about humanity if you take part in it being more humane.
le mendiant: beggar.
le gueux/la gueuse: beggar (literary). A number of other, more pejorative meanings–highwayman for men, whore for women, etc. Probably obsolete, but keep it mind for when you read Tartuffe.
le clochard: beggar; also bum. (Slang.)
le/la clodo: beggar; also homeless person, tramp, hobo.
I was in the Navy with an Armenian woman. (No, you don’t have to be a citizen to serve in the American military, and that’s probably true in most countries. In France, you can get citizenship by serving in the military–you are français par le sang versé, “French by spilt blood.” This isn’t the case in the United States–you can apply for citizenship as a member of our military, but there actually isn’t any guarantee that you’ll get it.) We’ll call her Nairi (not her real name). Like many members of the Armenian diaspora, Nairi was massively multilingual–she spoke Armenian, Arabic, and Spanish natively, and French and English as very strong second languages. (I once saw her mother test her to make sure that she wasn’t forgetting any of them.) One day Nairi came back from leave (what we call vacation in the military) with a seven-language dictionary. I admired it, and she insisted that I take it. I refused, she insisted, I refused, she insisted, I refused, she insisted, and finally, I took it. What I didn’t realize was that in Armenian culture, if someone admires something of yours, you must insist that they take it. Armenians know that they most certainly should not take it–I didn’t. Now I do. Stupid me–every time I see that dictionary on my bookshelf, I feel like a total jerk.
In a recent post, we talked about monolingual dictionaries–that is, dictionaries that list words in some language and give definitions of them in that same language. Today, let’s talk about bilingual dictionaries–that is, words that list words in some language and give corresponding words in another language. Of course, anything that we might say about bilingual dictionaries applies equally to dictionaries with even more languages, like the one that I stupidly took from poor Nairi.
I carefully said “corresponding” words just above–I carefully didn’t say “equivalent” or “the same” words. This is because it’s often the case that there isn’t a single translation from one word in one language to one word in another language. Even when there is one, it doesn’t necessarily “mean” the same thing, in some sense of the word “meaning.” To give you an example from my college French 101 textbook: a fenêtre in French is a window in English–fine so far. But, say window in English, and the referent is most likely a casement window, specifically–one that slides up and down. Say fenêtre in French, and the reference is most likely a window that opens in the middle–horizontally. (We would call this a French window in English. See this post for a list of things that we call French something-or-other in English that aren’t called anything of the sort in French.) And, as I said, there often isn’t just one. A language that I worked on in grad school has the word invert. But: invert what? If you’re inverting a hollow object, that’s one verb–if you’re inverting a solid object, it’s another verb. French has maybe two words for snow–la neige, and la poudreuse (powder snow). Depending on how you count, English has 13 or 55 or 120 (scroll down past the Inuit words) or 182 words for snow. So: not a 1-to-1 correspondence.
Having at least mentioned some of the theoretical issues, let’s look at the practical points of buying and using a bilingual dictionary. In these days of Amazon, you can use reader reviews in a way that we never could before–it’s really a nice advantage over the old pre-Internet days. However, there are also some specific things to look for.
Example sentences: you want a dictionary with example sentences, at least in the language that’s foreign to you.
Verb + preposition combinations: a good dictionary should tell you which prepositions, if any, go with which verbs. You need to know, for instance, that in English you shoot at something, you lean toward (have a preference for) something, and you stop doing something, with no preposition. Likewise, in French you need to know that you tirer suror tirer contre (shoot “on” or shoot “against”) something, you pencher pour(lean “for”) something, and you arrêter de (stop “from”) doing something.
If you are working with language(s) that have gender, you want the gender to show up both in the Language1 -> Language2 section and in the Language2 -> Language1 section. If you look up kitchen towel and find that the translation to French is torchon, you don’t want to then have to go to the French -> English section to see whether it’s le torchon (it is) or la torchon (it isn’t).
This might seem obvious, but make sure that the pronunciation is given for the words in any language whose pronunciation isn’t obvious from the spelling–and, yes, that includes both English and French.
This takes a while, but: when you find the word that you’re looking for in the other language, you might want to look it up in the other direction. For example: suppose that you look up the English word towel in a crappy bilingual English/French dictionary. In a crappy dictionary, you might find the following: serviette, torchon. Both of those can, indeed, be used to translate towel from English to French–but, they’re not equivalent. Serviette is for a bath or beach towel, while torchon is for a kitchen towel. You want a dictionary that will distinguish between the various possible translations. It’s often useful to look the French words up in turn (or the English words, if you’re going from French to English). If you do that, you’ll find that a serviette can be a towel, but also a napkin, or a briefcase. A torchon, you’ll find, can also be a messy document, or a rag. It’s good to be on top of this kind of thing when you’re trying to choose between supposed synonyms.
Labelling of registers, or levels of appropriateness: you most definitely want a dictionary that includes slang, obscenities, informal words, etc., or you’re not going to get very far in real life. However, you also want a dictionary that labels words that are non-standard–offensive words, etc. This kind of thing can be really, really hard to catch when you’re learning a language from movies, your neighbors, etc.
The always-awesome Lawless French web site has a good page on the subject of how to use a bilingual dictionary, and it has much better examples than I do. You can find it here.
So, what are some good bilingual English/French dictionaries? Here are some options.
The best thing out there these days is almost certainly WordReference.com. It has lots of language pairs, example sentences, colloquial expressions, pronunciations, male and female forms of adjectives, plurals, a verb conjugator, and a reverse look-up feature that does exactly what I suggest you do in the last bulletted item above. The auto-c0mplete feature in the search box saves me enormous amounts of time (and guessing about spellings). There’s an excellent WordReference iPhone app. Be aware, though, that the iPhone app will not generally let you look up obscenities–you have to go to the web site for that.
For the Kindle or for the Kindle app on your phone, the Collins English-French and French-English dictionaries are quite good. They’re quite highly rated on Amazon.com. I have the Collins dictionaries on my phone, and use them whenever I don’t have Internet access and therefore can’t get to WordReference.com. The Collins dictionaries also have an advantage over WordReference: they don’t give as many super-subtle translations. The only bad thing about WordReference is that it can sometimes give an overwhelming number of other-language translations. That’s great when you want it, but when you don’t, you might prefer the Collins dictionary. As it happens, there is a Collins dictionary tab on the WordReference site, and it’s easy to click on that.
Linguee.fr is fantastic for seeing things in context. You will generally get lots of example sentences. There’s an iPhone app for that, too.
Reverso.net is another good one for seeing things in context. It sometimes has better coverage of colloquial, slang, and obscene language than Linguee does. Again, there’s an iPhone app.
I found Nairi on Facebook recently. I sent her a friend request–no response. Is it because she doesn’t remember who the hell I am? Is it because she hates me for taking her dictionary? I have no idea. Nairi, if you’re reading this: I’m sorry!
Refugees and migrants are dying in shocking numbers in the Mediterranean. Here is some vocabulary that you’ll need to know to talk about the tragedy in French.
One of the ways that the world is sucking right now is the migrant crisis in Europe. As I write this (in April 2016), there are tens of thousands of refugees and migrants stranded in Greece. Many of these people cross from Turkey to Greece by boat, and many go from North Africa to Italy by ship. Tragically high numbers of these sink; in April of last year, five vessels sank, with a death toll of about 1,200 people.
The other day I was listening to the news on the radio. It was yet another story about the refugee crisis. The word aufrage kept coming up, but I couldn’t find it in my dictionary. Un aufrage, I kept hearing. Looking up similar stories on line solved the mystery: it was not un aufrage, but unnaufrage–a capsizing or shipwreck. I had “segmented” (as linguists say) the n of naufrage as part of a separate word, coming up with un aufrage.
This isn’t an uncommon phenomenon. One of the surprises for students in introductory linguistics classes is that in speech, there are no breaks between words–if I showed you a spectrogram (a sort of recording of a sound wave) of a sentence, you would see a continuous sound. “Segmenting” that stream of speech into smaller units is something that humans do–it’s not something that’s there in the acoustics.
Occasionally speakers of a language will, over time and as a community, “reanalyze” words in a way that changes the segmentation, and eventually the pronunciation. The word uncle is a word that has undergone this process. A variant of the word in English is nuncle. Oxford describes it as archaic or dialectal, but it’s there. You can see it in Shakespeare:
Can you make no use of nothing, nuncle?
–King Lear, Act 1, Scene 4
The word is thought to have come from a segmentation of phrases like mine uncle as my nuncle, thine uncle as thy nuncle, etc.
The same thing can happen in other languages, too–any time people speak, there’s an opportunity for segmentation errors. Children who are learning their mother tongue often try out different segmentations. For example: in a past post, we looked at some bear-related vocabulary in French and English. Here are various and sundry relevant phrases:
un ours: a male bear.
une ourse: a female bear.
un ourson: a baby bear; a teddy bear.
un nounours: a teddy bear.
I once read a great blog post in which a French guy wrote about his toddler producing three different pronunciations of the word ours (male bear) in one day: ours, nours, and I believe lours (the last one would be a reanalysis of l’ours, “the bear”). (Sorry I’m guessing about that last one–I can’t find the guy’s post.)
Linguistics geekery, which you should feel free to skip: one of my homeworks in Phonetics 101 was to look at spectrograms and find indications of syllabic association, which can correspond to word segmentation, on occasion. It’s possible to do so–sometimes. For nasals in French, as far as I know, it would be restricted to some variability in when a vowel is nasalized before a nasal consonant, versus when it’s produced as a sequence of an unnasalized vowel before a nasal consonant. American English speakers, who have no contrast in nasalization versus lack of nasalization before a vowel, are unlikely to be able to perceive it, and I don’t know at what age a French kid would be likely to acquire it.
I have no clue how the current situation will or should be resolved. Obviously, if your town is being destroyed by the Syrian government, or ISIS, or whatever other assholes are causing death and misery in the Middle East these days, it makes sense that you would take your family and go elsewhere, and it’s simple human decency to shelter people in that situation. However, the situation is not clear in other ways–even the fact that the Wikipedia article on the subject is titled European migrant crisis and not European refugee crisis is a loaded choice, and one that has implications about how the people who are affected should be treated. The situation continues to evolve, with European and world sympathies tilting now one way and now the other–in favor of sheltering the affected people after a tragedy like the widely-publicized drowning of a Syrian toddler, and in opposition to it after the despicable assaults on women by crowds of migrant men last New Year’s Eve in Germany. Certainly the situation will have long-range effects on Europe. I began this post by talking about one of the ways in which the world sucks right now–the existence of this crisis. One of the ways in which the world doesn’t suck right now is that many people in many countries have been very active in welcoming refugees, providing real support services for them, and generally acting like decent human beings. This will get worked out.
It’s raining, it’s pouring, the old man is snoring,
He went to bed and he bumped his head and he didn’t get up ’til the morning.
Adam Gopnik once described Paris as “a scowling gray universe, relieved by pastry.” The “gray” part comes from the observation that it’s very often cloudy here. Actually, one of the things that I love about Paris is that it rains here. In the US, I live in a very sunny, dry part of the country–300 days of sunshine a year. However, I grew up in a very, very wet part of the country, and I miss that. So, coming to Paris in March and seeing flowers bursting from wet earth on my walk to work through the forest is a real treat.
Being from a very wet place, I have a large vocabulary for talking about rain in English. Here are some examples of relevant verbs. These are all impersonal verbs, using what linguists call a pleonastic pronoun, i.e. it’s:
to rain: the default verb.
to pour: to rain hard–see the children’s song above.
to rain cats and dogs: to rain hard.
to rain/pour buckets: to rain hard.
to mist: to rain very lightly.
to drizzle: to rain, especially if it’s cold. (I’ve seen a couple definitions of this as “to rain lightly.”)
to sprinkle: to rain, especially for a short period of time.
to storm: to rain very hard, often with thunder and lightning.
In which an encounter with a crazy guy on the subway leads to a statistical analysis of French adverbs.
One evening I was riding the metro home when a guy got into the car with some used books to sell. A man sitting across the aisle from me asked to see them. He flipped through one of them, then took a pen out of his jacket pocket and began circling words–in this book that the other guy was trying to sell. Are you going to buy that?, the would-be bookseller asked the guy with the pen. They exchanged words–the bookseller was not happy about having his books marked up. The bookseller said something that Mr. Pen apparently thought was obvious or stupid. Il est fort, lui, he snorted–he’s a sharp one.
The central meaning of fort/forte is “strong,” but it can also be used adverbially. You hear it a lot that way, and I’ve been trying to figure out exactly when you can use it in that way–it’s often the case that there are word combinations that are possible in a language, but that don’t sound right. Rather, there are particular words that are conventionally used in very specific combinations. Violeta Seretan of the University of Geneva gives some examples of English words that are used to describe the magnitude of various nouns. The semantics of each of these is the same, but the words that are typically used are quite different. We talk about big problems, heavy rain… How about injury? (Answer below.) It would certainly be possible to say large problem, but it’s nowhere near as likely, and it sounds odd, as a native speaker. For example, you could say large problem, but it seems odd. I wanted to be able to demonstrate that this corresponds to some actual statistical tendency, not just my intuitions, so I searched the enTenTen corpus, a collection of almost 20 billion words of written English, looking for big problem and large problem. Here are the frequencies:
big problem: occurs 6 times per million words.
large problem: occurs 0.5 times per million words.
Big problem occurs twelve times more often than large problem–the latter is possible, but it’s not really what you would expect to hear from a native speaker. We call these things like big problem “collocations”–combinations of words that occur statistically more often than you would expect by chance.
You can find collocation dictionaries for English, and they’re quite useful for second-language learners. I don’t know of any for French, though, or at least not where to find them in the US, which is where I am at the moment. (I’ve seen similar things in Canada.) I additionally want to know how these adverbial uses of fort should be translated into English, so I need a way to figure this kind of thing out for myself.
First step: find a whole lot of French text in some easily searchable form. I started with the French section of EUROPARL–a collection of documents from the European Parliament, translated to/from a wide variety of languages. The French section of EUROPARL contains about 59 million words–so, a whole lot–and you can access it through the Sketch Engine web site–so, easily searchable. A quick search showed me that fort is quite common in that data set:
Once I know that, I know that there will be enough data to calculate the collocations–recall that this is a statistical thing, so you need plenty of data. The Sketch Engine interface gives me a number of options for how to do the calculations (scroll down to get past the screen shot):
…which I show you just so that you’ll see that there are a lot of approaches to doing this. I just went with the defaults.
The calculations yielded quite a few possibilities. Here are some of them:
If you’re a stickler for data, you might have noticed that the collocations are ordered by the log of the Dice coefficient, which you could think of as a measure of the statistical effect, I guess. I am really looking for the most common collocations involving fort, though, so I’ll reorder by the cooccurrence count, i.e. the raw count of how often the collocations occurred:
Crap–that basically tells me nothing. Why not? Zipf’s Law. Remember that Zipf’s Law tells us not only that most words are pretty rare, but also that some words are really, really common, and in French, that certainly includes de (“of”), et (“and”), une (“a”), and the rest of what we’re seeing here. (Moral of the story: don’t expect the most frequent things in a language to necessarily be the most revealing things in a language.) If I scroll down a bit, though, I see bien on the list. 683 examples of this–a frequency of 10.25 per million words. Bien is often an adjective, which would presumably make fort adverbial in these cases, so we’re on to something now. Let’s check out some of those examples:
So, now I have some cases where it would make sense to use fort, but I want to know how they would correspond to English, too. This requires that I have access to the corresponding English text. No problem–recall that the EUROPARL corpus is multilingual. In particular, it is what is known as a parallel corpus, which means that it contains the same contents in multiple languages, not just similar contents (although that kind of corpus can be useful, too). I searched for the phrase fort bien. Here’s an example of the output:
So, now I have some French/English equivalents for fort bien:
Étant donné les prévisions de la politique structurelle que je connais fort bien… With these forecasts of the structural policy – which I know very well…
…ce que Jean-Pierre Chevènement a fort bien nommé récemment… …referred to recently, and very aptly, by Jean-Pierre Chevènement…
C’est pourquoi, comme l’a déjà fort bien expliqué M. Kalas…Hence, as Mr Karas has stated to his credit…
…je comprends fort bien la préoccupation… … I have a great deal of sympathy for the unease…
Vous savez fort bien que… You know very well that…
…non seulement parce que le président le connaît fort bien… …not only because the President is very familiar with it…
Il est fort bien d’ organiser des réunions, mais ce sont les résultats qui comptent.Meetings are all very well, but it is the result that counts.
…ils se tirent fort bien d’affaire. …they are managing really rather well.
…et je les comprends fort bien. …which I fully understand.
Ils les connaissent fort bien et un par un. They recognise each and every one of them very well.
I’m feeling good about how to use fort bien now, but I want to know about other ways that fort could be used with an adjective. So, I’ll do another search of the parallel corpus (i.e. the matched French and English texts), but this time I’ll just search for fort, and I’ll specify that I want it to be an adverb. Here are some of the results:
Now I have some general examples of how to use fort:
Nous estimons fort positif que…We see it as a very positive sign that…
Le rapporteur constate également fort justement que…The rapporteur has also quite rightly stated that…
Ce que nous faisons maintenant est probablement fort important…What is being done may well be very important…
…l’ Union européenne a fort justement octroyé… …the European Union was right to support…
…nous entretenons des relations bilatérales fort satisfaisantes avec… …We have very satisfactory bilateral relations with…
I don’t know every adjective with which it would be OK to use fort, but I know one more than I did when I got out of bed this morning, and I’m cool with that–one less time when I’ll have to use très, which is all that they teach us in school.
A colleague had some observations on this:
On top of being used in collocations, it also marks a style / genre which is somewhat formal or elevated (“soutenu”). This might explain why it remains frequent mostly in collocations and is less frequent (or more marked) in freer combinations. This gives the expression a literary turn or a pretense to a higher register. Both in speech and in writing, it is “soutenu.”
Another native speaker had this to say about it:
“Fort” is used as a synonym of “très”, before adjectives or adverbs . You can use it in about any case, it’s just more elegant than “très”, but not really literary .
The Mr. Pen guy on the subway turned out to be pretty crazy, as far as I could tell. At one point he snapped at my adorable cousin, who happened to be visiting, and I told him to cut it out. This was followed by an initially amusing conversation between him and me that at some point degenerated into a loud tirade on his part. I kept telling him that my French wasn’t that good and I couldn’t understand him, but he just kept going and going. Eventually French people around us began telling him to stop being an asshole and words to that effect, so I assume that it wasn’t very nice, but honestly, I couldn’t tell you. At some point a large and very drunk French guy got on the subway car, and started seriously getting in Mr. Pen’s face–it was clear that this was going to turn violent. Mr. Pen was a very diminutive Haitian man, and I wasn’t going to watch him get the shit beaten out of himself no matter how bizarre he was being, so I got involved. The train stopped, Mr. Pen jumped out, and Mr. Drunk Guy launched into an animated discussion with me about American heavy metal, punctuated by snatches of Metallica songs. All in all, an unusual evening on the metro, but not an unpleasant one by any means–just part of life in The Big City, as we say in English.
In the US, politics and judo have some things in common. Here’s some English vocabulary for talking about them.
France is the #2 judo country in the world, after Japan. The population of France is about 66 million people, and about 550,000 of them do judo. (For comparison: the population of the US is bout 330 million people, and about 20,000 of them do judo.) The first person I met in France was a diminutive, beautiful woman in her 50s or so who I ran into at a judo practice. She’s nowhere near my size, but can arm-bar me every 7 minutes or so, on average. She’s a great example of French judo: she beats me (over and over) not with strength, but with a subtle, contemplative approach to the sport that relies on imagination and on a deep understanding of how to move in three dimensions and apply basic principles of leverage and physics efficiently–and gently. (Sorta like the famous French diplomacy, I guess.) In judo, we would say that she has a great ground game—the ability to fight on the mat, off your feet, where we use not the throws of standing judo, but arm-bars, chokes, and pins.
The phrase ground game has been in the news quite a bit lately. We often hear about what a great ground game Bernie Sanders has, or about how Trump keeps winning state primaries despite not have a good ground game. In the context of politics, your ground game is how good your campaign is at the very local tasks that require actual personal involvement–particularly, getting your supporters to the polls. A good ground game requires two things.
You have to know who your supporters are.
You have to have engaged, committed volunteers everywhere.
Regarding the first: today, this is mostly a matter of data science. Sasha Issenberg’s book The victory lab does a very good job of telling the story of the development of today’s personalized, data-driven politics. Once, politicians and political parties put a lot of effort into trying to convince people to get behind their ideas. Today, it’s generally thought that trying to change people’s minds is expensive and inefficient; on the other hand, getting the people who already support you to actually go to their polling place and vote is relatively inexpensive, and it’s quite effective. In 2008, the Obama campaign was able to develop pretty good guesses about who was going to vote for their candidate (how they did it is really interesting, but somewhat sobering—see the above-mentioned book), and they focussed their get-out-the-vote effort on those people.
Regarding the second: this is the essence of the ground game. Cruz’s win in the Iowa primaries this nominating cycle was widely attributed to his strong ground game. One of the many, many mysteries of the Republican race for the nomination has been that Trump has done quite well despite not having much of a ground game anywhere.
Many languages have a phenomenon such that nouns belong to groups that affect things about the words with which they occur. French is such a language. You can more or less put French nouns into two groups, as follows:
For one group, the singular definite article (“the”) is le, the singular indefinite article (“a”) is un, the adjective “big” is grand, and the adjective “boring” is ennuyeux.
For the other group, the singular definite article (“the”) is la, the singular indefinite article (“a”) is une, the adjective “big” is grande, and the adjective “boring” is ennuyeuse.
When a language has two or three of these classes, the language is typically said to have a gender system. So, French has two of these classes, and we call the nouns in these classes masculine and feminine nouns. German has three of these classes, and we call them masculine, feminine, and neuter nouns. Lithuanian Yiddish has three of these classes, but most other dialects of Yiddish have two. English has basically no such classes–we have words that are sort of intrinsically masculine, like father, and words that are sort of intrinsically feminine, like mother, but since they don’t affect the forms of the words with which they appear (you say the mother and the father, with no differences in the word the), linguists wouldn’t call it a gender system. On the other hand, Old English (spoken from around 450 to around 1400) had three noun classes. (Look at the different forms of the word the in these three Old English nouns, taken from Wikipedia: sēo sunne (“the sun”), se mōna (“the moon”), þæt wīf (“the woman/wife”).) A language on which I did research in graduate school only has two such classes, but referring to anything by the wrong class is a way to insult it. It doesn’t matter which of the two classes it belongs to–if you use the wrong modifiers, it’s an insult. I was terrified to ever open my mouth, and don’t speak it at all. (My son often played in the corner of the office while I collected data. It’s quite amazing to hear dô páráná come–correctly–out of the mouth of that blond-haired, blue-eyed, video game addict today.)
There’s nothing magic about the numbers two and three–languages can have more or less arbitrary numbers of these classes. We tend to refer to them as genders when there are just two or three, and to refer to them as noun classes when there are more than that, but there is no difference between what we call the gender system in French, with two noun classes, and what we call the noun class system in Shona, which has twenty noun classes. It’s a difference of numbers, not of kind–in both cases, you have this more-or-less arbitrary slicing up of the nominal lexicon (noun vocabulary) of the language into groups of nouns that affect the forms of articles, adjectives, etc. in various and sundry ways.
I say “various and sundry” because gender/noun class systems can work out in lots of different ways. In Semitic languages, verbs agree with the gender of their subjects. For example, he studied is lamad, while she studied is lamda. In the first case, it’s the pattern of having the two a-a vowels that makes it the masculine form of the verb, and in the second case, it’s the a in the middle, the md coming together (versus mad in the masculine form), and the a at the end that make it feminine. Different verbs, tenses, and numbers (that is, singular versus plural) have different forms, so don’t get excited about the fact that there’s an a at the end of the third person singular past tense feminine form of the verb–it’s not that way all the time. For example, he goes is holekh, while she goes is holekhet.
Does having classes of nouns in your language–or not having them–make a culture more or less sexist? I only have anecdotes here, and–counter to what you might hear–anecdote is not the singular of data. For what it’s worth: my undergraduate advisor always used to point out that Hebrew is about as gendered of a language as you can get (see above–even verbs have to have gender in Hebrew), and probably close to everyone in Israel speaks either Hebrew or Arabic (which has the identical system), but Israel was the fourth country in the world to elect a woman as the head of state. In contrast, Finnish has no gender whatsoever, but has never had a female head of state, as far as I know. (This is not to imply anything bad about Finland–there are a bazillion countries with genderless languages that have never had a female head of state. I don’t know why my professor picked on the Finns.)
English note concerning the title of this post: using the word got (or gots) as the present tense of the verb to have is a social marker of class–that’s “class” in the sense of couche sociale. Lower class, specifically. Other speakers might use it for humorous effect. “To have class” means something like to have elegance of style or manners. So, if you say you got no class, man, part of the flavor of the expression comes from the fact that you’re using a “low (social) class” verb form to talk about “class” in the sense of elegance.
English has a number of words that are made of numbers. Here are some of them.
No French in this post–this is all about obscure English vocabulary that you can bet Zipf’s Law will bring into your life sooner or later.
I recently wrote a post about what we call in the US 3x5s (pronounced “three by fives”), and that got me thinking about words in English that are formed in similar ways. There are a number of them, and if you can use them, it will definitely add an American flavor to your English.
2×4 (pronounced “two by four”): a kind of wooden board that measures about 2 inches by four inches, and about six feet in length. In America, 2x4s are commonly used in the construction of homes and the like.
4×4 (pronounced “four by four”): a kind of truck or similar vehicle that can provide power to all four wheels simultaneously. (More traditionally, cars would only power the front axle or the back axle.)
8×10 (pronounced “eight by ten”): a particular size of photographic print, measuring eight inches by ten inches.
24/7 (pronounced “twenty-four seven”): absolutely constantly. 24 hours in a day, and seven days in a week, so 24/7 is all the time.
7-11 (pronounced “seven eleven”): originally the name of a convenience store that was once open from 7 AM to 11 PM. Today you can use it to refer to pretty much any 24-hour convenience store, I think.
69 (pronounced “sixty-nine”): a verb referring to a specific sexual act. Consider the relationship between the numbers 6 and 9 and you can probably figure it out for yourself, which will save me from feeling like I have to put a trigger warning on a blog post about numbers.
soixante-neuf: same thing. Yes, we can use it in English, and if you’re sleeping with people who are educated enough to know what it means, then you probably already know what it means yourself.
10-4 (pronounced “ten four”): I heard you, I got your message; there’s also some implication that you agree. Often heard in the contexts 10-4, good buddy, or that’s a big 10-4, or, if you wear a cap with the name of a feed company on the front, or just listened to a lot of AM radio in the 1970s, that’s a big 10-4, good buddy. That’s how we said it when I was a little tyke, at any rate. OK: a teenager.
.45 (pronounced “forty-five”): a kind of pistol, known for its “stopping power”–that is, if someone is charging you, a shot from one of these things will keep them from moving forward if it hits them. The projectiles are short, but very big around, and heavy. It’s not very accurate at a long distance, but at a short distance, it’s very effective at what it’s intended for.
.32 (pronounced “thirty-two”): another kind of pistol. They’re also not very accurate, as they usually have a pretty short barrel, and they really don’t have any use other than killing people at close range, as far as I know.
.36 (pronounced “thirty-six”): another kind of pistol. They sometimes have longer barrels, in which case you can use them to kill people at close range, and also a bit further away.
.25 (pronounced “twenty-five”): another kind of pistol. I don’t recall ever actually seeing one.
.22 (pronounced “twenty-two”): another kind of pistol, and also a small-caliber rifle. The bullet is quite small, and unless you get shot in some place really vital–head, heart, an artery–it may not do that much damage. On the other hand, I did once see a young guy who shot himself in the head with one of these–he didn’t die immediately, but he sure as hell died eventually. Again, there is not much that’s actually useful about these things…
You can use the number-by-number construction productively (in the linguistic sense of the word productive, which means that the construction can be used to produce new things) to talk about the sizes of pieces of wood in general. However, if you’re not talking about 2x4s specifically, the context needs to be clear if you want to be understood. The assumption is that the pieces of wood in question will be 6 feet long unless otherwise specified, so if you ask for a 3×6 (pronounced “three by six”) in a lumber yard, people will know what size board to give you. For a version of this kind of construction that is also productive, although quite obscure, see this entry from the Urban Dictionary for a description of how it is used to refer to pairs of male characters in the Gundam Wing anime series.
Having gotten the basics out of the way, here’s a useful expression: to get/be hit with a 2×4. You know what a 2×4 is by now–a solid wooden board. If someone smacks you upside the head with it, you will have been smacked really, really hard. To get/be hit with a 2×4 means to be stunned by something that you’ve learnt.
Here are some real-life examples. This woman wrote on her blog about needing to be forced to face the facts that she was (a) eating too much, and (b) not exercising enough:
This blog post suggests that if you “get hit with a 2×4,” the answer is to just surrender to whatever God’s wishes for you might happen to be:
Here’s a story from the Washington Post (a reputable and very famous American newspaper known for its coverage of national politics) about Chris Christie, describing the experience of the explosion of the Bridgegate scandal during the period before his unsuccessful run for the Republican presidential nomination:
Here, a bigwig of the investment world talks about the effects of getting two pieces of bad news about the financial world, one right after the other:
So, now you see why it would, in fact, probably be better to get shot in the leg with one of those tiny little .22s than to get hit in the head with a 2×4. Native speakers of English (or non-native speakers who just like to collect funny words), do you have any other all-number words to add to the list?