How we’re sounding stupid today: noun phrases

Screenshot 2016-06-27 19.12.23
Picture source: screen shot of zombilingo.org.

Like I always say: it’s the little things that get you.  One of the things that I love about France is that people feel totally free to correct each other’s language, and they certainly feel free to correct mine.  (Truly, I love this–it’s such a help in trying to learn the language.)  I gave a talk in French the other day.  Descriptivism versus prescriptivism, duality of patterning, how even very small choices in building computer programs for processing human languages can imply stances on very contentious issues in linguistics–all that kind of good stuff.  I had memorized the relevant French vocabulary–la référentialité (referentiality), l’épistémologie (epistemology), inné (innate).  I was about as ready as I could be.

Not ready enough, it turns out.  One of the folks in the audience came up to me afterwards to explain a not-very-subtle word choice error that I had blown.  My mistake: I said “phrase” wrong. I was talking about groups of words smaller than a sentence, and used the French word la phrase.  Not okay!  La phrase means “sentence.”  If you want to talk about phrases, you need another word.  What that word is–that’s not so clear.

Why would one want to talk about phrases, anyway?  One of Chomsky’s contributions to linguistics that didn’t suck was demonstrating that syntax isn’t about relationships between words–rather, it’s about relationships between groups of words.  Matt Willsey gives a nice example that illustrates how this works.  In English, one could say:

  • If x, then y. 
  • Either x, or y.

You can embed these:

  • If (either x or y), then (either x or y).

You can embed things in those, too:

  • If either (a or b or c or d), then either (e and f or g and h) or (i and j but k and l).

The point: you get nowhere trying to explain this kind of hierarchical structure by means of the behavior of words.  On the other hand, you can get very far by discussing this kind of hierarchical structure in terms of groups of words.

In linguistics, we tend to refer to these groups of words as phrases.  English has noun phrases, verb phrases, and prepositional phrases–maybe more, but at least these.  (At some level, a sentence is just another kind of phrase, but we do tend to maintain some notion of “sentence.”)

Phrases are typically thought of as having something called a head.  From a syntactic point of view, you could think of the head of the phrase as the thing that determines whether the phrase behaves as a noun, a verb, or whatever.  In the following phrases, I’ve bolded the head:

  • those bananas from the corner store
  • this banana that I got from my cousins

To see why I say that the head determines how the phrase behaves, consider these sentences:

  • Those bananas from the corner store are almost rotten.
  • This banana that I got from my neighbors is just about ready for the trash can.

Prior to Chomsky, the most fully elaborated theory of how syntax works is that it was about connections between sequences of words.  What you can’t explain with that kind of model is how you can have sequences like the corner store are or my neighbors is.  To account for sequences like that, you have to have some notion of structure that can let you represent the fact that it’s the head of a group of words that controls whether the verb is singular (is) or plural (are). 

So, how do you talk about “phrases” in French?  That’s where my problem came up, and how I ended up sounding stupid.  One of my ways of trying to find acceptable technical terminology is to look things up on Wikipedia in English, and then follow the link to the corresponding French-language page.  No love: there’s an English-language page for noun phrase, but no corresponding French page.  Around the lab, some of the students call them phrases–phrase nominale, phrase verbale, etc.  The issue: la phrase is typically used to refer to a sentence.  When I gave my talk, I used the word la phrase to mean “phrase,” as some folks do around the lab.  It didn’t go over well.

So, what do you call a phrase in French?  Here are some options that I’ve found.  The one that has the most support in terms of the number of places where I found it used is one that I have never actually heard!

  • le groupe nominal/les groupes nominaux (Linguee.fr)
  • la locution nominale (Linguee.fr)
  • le syntagme nominal (Linguee.fr; Denis Roycourt’s Noam Chomsky: une théorie générative du langage, in Le langage: nature, histoire et usage, edited by Jean-François Dortier; Maurice Pergnier’s Le mot)

I even came across this, in Maurice Pergnier’s Le mot:

C’est également avec ce sens qu’on rencontre le terme [syntagme] dans les traductions françaises des ouvrages de Chomsky, pour traduire le mot anglais “phrase” (Noun-Phrase; Verb-Phrase = syntagme nominal; syntagme verbal). 

Perpignon goes on to add: Il faut noter cependant que, pour cette…école, le syntagme (angl. “phrase”) ne se définit pas seulement comme ensemble d’unités minimales, il se définit surtout comme partie de phrase, puisqu’il est dégagé par découpage de la phrase (“sentence”) selon la structure arborescente. 

So, we have a very explicit contrast between le syntagme (English “phrase”) and la phrase (English “sentence”).

Now that we know how to talk about phrases, in French and otherwise: getting a computer to find the heads of phrases can be a lot harder than it is for humans to do it.  There’s a very cool web site that lets people play a game that’s designed to create data to be used to help computers learn for themselves how to find the heads of phrases in French.  It’s called Zombi Lingo: zombie, ’cause you have to find heads, and zombies like to eat brains.  (Clearly this is a pre-Walking-Dead conception of what it means to be a zombie.)  Check it out at this link–it’s quite fun.

So, yeah–I gave a talk in which I explained duality of patterning, but screwed up the word for “phrase.”  Oh, well–as Jigoro Kano, the founder of judo, would have put it: I got valuable insight into what I need to work on.

Incidentally, here are some details on some of the 85 gun deaths in the United States in the past 72 hours:

  • 3 people in one incident, Marion County, Oregon (source here)
  • 1 church deacon in Shelby County, Tennessee (source here)
  • 1 person in Houston, Texas (source here)
  • 1 person in San Antonio, Texas (source here)

I really don’t have the stomach to go through all 85 of them–sigh…  72 hours, 85 deaths…

 

Compound nouns: why my kid said friendgirl instead of girlfriend

The errors of a child learning their native language can be tremendously interesting.

french knife vocabulary 09c37ab6157f4e281abd6477065caf2fWhen my kid was about four years old, he went through a period where he switched the orders of certain kinds of words.  It wasn’t random–this happened only with a particular kind of word formed by putting two nouns together.  For example, he would say:

  • light kitchen instead of “kitchen light”
  • friendgirl instead of “girlfriend”

On the other hand, if there were a noun preceded by an adjective, he got the order right:

  • big kitchen
  • mean girl

The phenomenon has some implications for theories of how children learn language.  In particular, it’s difficult to give a simple behaviorist explanation for this phenomenon, where the kid gets exposed to stimuli, repeats them, and gets reinforced for producing them correctly: to my knowledge, the kid was never exposed to things like friendgirl.  There are also interesting things about his pronunciation of these things on a smaller scale, though, and in particular, how we make compounds–read on, if you want to know more.

One of the most difficult problems in getting a computer to understand language is understanding compound nouns.  These are nouns that are made up of two or more words in a sequence.  The toughest ones can be compounds where the words that make up the compound are both nouns. For example, in English:

  • school bus
  • kitchen cupboard
  • fire engine

I’ve given you examples where the two nouns are written with a space between them, but they might also be spelt with a hyphen, or without a space.  For example:

  • gunboat (no space)
  • timesheet (no space)
  • rainbow (no space)
  • gun-carriage (hyphen)
  • train-spotting (hyphen, and yes, you are allowed to argue about whether or not spotting is a noun)

From a theoretical perspective, there isn’t a distinction between these–they’re all compound nouns.  From the point of view of writing a computer program that deals with language, we would tend to treat the ones that are written with a hyphen or with no space as single words that don’t necessarily get analyzed further, but the ones written with a space usually need special treatment.  (In fact, amongst people who do natural language processing, there’s a whole field of research concerning what are called multi-word expressions. 

From both a theoretical and a practical perspective, the big question about compound nouns is: how can you describe, understand, and get a computer to deal with the different kinds of relationships that can exist between the nouns?  It’s not a random thing–languages tend to exploit particular kinds of relationships in compounds.  Even describing these things from the perspective of theoretical linguistics is tough, though, separately from the practical problem of getting a computer program to process them.  A classic English example (due, I believe, to the recently departed linguist Chuck Fillmore) is the names for different kinds of knives in English.

  • bread knife: a knife for cutting bread
  • butter knife: a knife for spreading butter
  • pocket knife: a knife that is carried in a pocket
  • butcher knife: a knife that is used by a butcher
  • palette knife: a knife that is shaped like a palette
  • utility knife: a knife that is used in food preparation
  • paring knife: a knife that is used for paring
  • steak knife: a knife that is used for cutting steak
  • boning knife: a knife that is used to trim meat from a bone
  • boot knife: a knife that’s meant to be carried on or in a boot

Just with this partial list, we can see some patterns of semantic relationships between the nouns in the compound:

intended material bread knife, butter knife, steak knife
used by butcher knife
used for paring knife,  boning knife
carried in pocket knife, boot knife
 shaped as  palette knife
dog bones 1003118_10201602413728925_39172732_n
Dog bones at a Hungarian butcher shop in Cleveland, Ohio. Picture source: me.

How should we classify utility knife?  Or dog bone?  I don’t know.  As I said, this is difficult–it’s not like this is something that they teach you in linguistics grad school.  And, do you get to just make these kinds of relationships up on an ad hoc basis?  If so, you’ve got descriptions that couldn’t possibly be shown to be wrong, and from a scientific point of view, that’s bad–your theories need to be testable, and falsifiable.  (Generally we assume that we can’t prove anything, but we do try to construct theories in such a way that if they’re wrong, in principle we should be able to demonstrate that.)  Some people have proposed limited sets of relationships that they hope can capture all such compound nouns–for example, the Generative Lexicon theory of James Pustejovsky.  It’s not clear that all of the issues that are involved in this are resolved, though.

Rather than this kind of noun-noun compound, French generally has nouns modified by prepositional phrases.  That is, you have the noun, then a preposition, and then another noun.  For example, compare these English and French nouns:

railroad (rail + road) chemin de fer
windmill moulin à vent
wine glass verre à vin
goods transport transport de marchandises
shaped as palette knife

For more examples, see the picture in this post, which shows the vocabulary for a variety of kinds of knives in French.

It’s not the case that all French nouns of this sort follow the prepositional phrase pattern–for example, we have homme grenouille, “frogman.”  But, the pattern with the prepositional phrase is much more common. Having said that: one of the biggest mysteries of French for me is how you know when the preposition will be de versus à.  Is there some principle that would let me know that it’s a boîte à gants (glovebox) and a cuillere à café (coffee spoon), but a animal de compagnie (pet) and a crème de cacao?  A boîte à bijoux (jewelry box), but a boîte d’allumettes (matchbox)?  A boîte à chaussures (shoebox), but a boîte de nuit (nightclub)?  I have no clue.

Some details of compound nouns in English: the pronunciation of these things is different from phrases with adjectives.  In general, in a compound noun, you’ll have the stress on the first noun, e.g.:

  • chef’s knife is pronounced CHEF’S knife, while David’s knife would usually be pronounced equal stress on both words.
  • coffee spoon is pronounced COFFEE spoon, while yellow spoon would be pronounced with stress on both words.
  •   beat box is pronounced BEAT box, while big box would be pronounced with stress on both words.

Some details of compound nouns in French: I have no clue how to pluralize these things, and I’m not sure that all French people do, either.  Here’s what the Wikipedia page on French compound nouns has to say on the topic.  It breaks the compounds down to what they’re made up of: a noun plus a  noun, a verb plus a noun, a noun plus a verb, etc.:

  • noun + noun: pluralize both.  Example: oiseau-mouche, oiseaux-mouches (hummingbird).  Exception: I don’t understand the Wikipedia explanation for this, but sometimes you only pluralize the first noun: des chefs-d’œuvre (masterpiece), des arcs-en-ciel (rainbox).
  • verb + noun: plural only at the end.  Example: cure-dent, cure-dents.  Exception: I don’t understand the Wikipedia explanation for this, either, but sometimes you don’t mark the plural at all: des chasse-neige (snowplow) (= chasser la neige, devenu variable dans l’orthographe de 1990), des trompe-l’œil… (direct quote from Wikipedia)
  • adjective + noun: pluralize both.  Example: la basse-cour, des basses-cours (farmyard; chickens and rabbits; outer courtyard).
  • verb + verb: don’t mark the plural at all.  Example: des garde-manger (pantry).

If you’d like to know more about the Generative Lexicon theory and how it accounts for these kinds of relationships between nouns, but don’t feel like you want to tackle the primary sources (I have a PhD in linguistics and I’ve never been able to finish working my way through the last chapter), there’s a book called Generative Lexicon theory: A guide, by James Pustejovsky and Elisabetta Jezek, coming out. For a detailed discussion of relationships in this kind of noun in French and Italian, see this paper by Pierrette Bouillon, Elisabetta Jezek, Chiara Melloni, and Aurélie Picton. (I got some of the examples in this post from there.)

So, back to my poor kid: why friendgirl and light kitchen, but mean girl and big kitchenHe seems to have come up with some conception of there being a difference between the compound nouns and a sequence of an adjective and a noun.  Remember that he was maybe 4 years old, so no one taught him this.  As is characteristic of kids learning their native language(s), he came up with a hypothesis about how to produce the difference between these things, and what he came up with was an ordering difference for the compound nouns.  So: don’t freak out if your kid comes up with some weird things in the language department, and be aware that it’s mostly not trying to correct them–it’s not like they’re consciously aware of these “rules,” and nothing that you can say to them is going to change them.  However: they’ll figure it out.  Keep Calm And Keep Talking.

Some French vocabulary on the topic:

  • le mot composé: compound word

It’s raining, it’s pouring, the old man is snoring: how to talk about rain in English and French

How to talk about rain in English and French.

It’s raining, it’s pouring, the old man is snoring,

He went to bed and he bumped his head and he didn’t get up ’til the morning.

–Children’s song

Adam Gopnik once described Paris as “a scowling gray universe, relieved by pastry.”  The “gray” part comes from the observation that it’s very often cloudy here.  Actually, one of the things that I love about Paris is that it rains here.  In the US, I live in a very sunny, dry part of the country–300 days of sunshine a year.  However, I grew up in a very, very wet part of the country, and I miss that.  So, coming to Paris in March and seeing flowers bursting from wet earth on my walk to work through the forest is a real treat.

Being from a very wet place, I have a large vocabulary for talking about rain in English.  Here are some examples of relevant verbs.  These are all impersonal verbs, using what linguists call a pleonastic pronoun, i.e. it’s:

  • to rain: the default verb.
  • to pour: to rain hard–see the children’s song above.
  • to rain cats and dogs: to rain hard.
  • to rain/pour buckets: to rain hard.
  • to mist: to rain very lightly.
  • to drizzle: to rain, especially if it’s cold.  (I’ve seen a couple definitions of this as “to rain lightly.”)
  • to sprinkle: to rain, especially for a short period of time.
  • to storm: to rain very hard, often with thunder and lightning.

Usage examples:

  • pleuvoir: to rain.  Il pleut: it’s raining.  (I always seem to confuse this with il pleure, “he’s crying.”
  • Il pleut à verse: it’s pouring.  (Native speakers: can we do the liaison here?, i.e. il pleu tà verse?)
  • Il pleut des cordes: it’s raining cats and dogs, it’s pouring rain.
  • Il tombe des cordes: same thing.
  • Il bruine: it’s misting.
  • Il crachine: it’s sprinkling.
  • y avoir de l’orage: to storm.
  • faire de l’orage: to storm.

I’ve focussed entirely on verbs here.  For lots of nouns and adjectives related to rain in English, see this great post from the EngVid.com web site.

 

 

Parallel corpora, collocations, and crazy people on the Métro

In which an encounter with a crazy guy on the subway leads to a statistical analysis of French adverbs.

One evening I was riding the metro home when a guy got into the car with some used books to sell.  A man sitting across the aisle from me asked to see them.  He flipped through one of them, then took a pen out of his jacket pocket and began circling words–in this book that the other guy was trying to sell.  Are you going to buy that?, the would-be bookseller asked the guy with the pen.  They exchanged words–the bookseller was not happy about having his books marked up.  The bookseller said something that Mr. Pen apparently thought was obvious or stupid.  Il est fort, lui, he snorted–he’s a sharp one. 

The central meaning of fort/forte is “strong,” but it can also be used adverbially.  You hear it a lot that way, and I’ve been trying to figure out exactly when you can use it in that way–it’s often the case that there are word combinations that are possible in a language, but that don’t sound right.  Rather, there are particular words that are conventionally used in very specific combinations.  Violeta Seretan of the University of Geneva gives some examples of English words that are used to describe the magnitude of various nouns.  The semantics of each of these is the same, but the words that are typically used are quite different.  We talk about big problems, heavy rain…  How about injury?  (Answer below.)  It would certainly be possible to say large problem, but it’s nowhere near as likely, and it sounds odd, as a native speaker.  For example, you could say large problem, but it seems odd.  I wanted to be able to demonstrate that this corresponds to some actual statistical tendency, not just my intuitions, so  I searched the enTenTen corpus, a collection of almost 20 billion words of written English, looking for big problem and large problem.  Here are the frequencies:

  • big problem: occurs 6 times per million words.
  • large problem: occurs 0.5 times per million words.

Big problem occurs twelve times more often than large problem–the latter is possible, but it’s not really what you would expect to hear from a native speaker.  We call these things like big problem “collocations”–combinations of words that occur statistically more often than you would expect by chance.

You can find collocation dictionaries for English, and they’re quite useful for second-language learners.  I don’t know of any for French, though, or at least not where to find them in the US, which is where I am at the moment.  (I’ve seen similar things in Canada.)  I additionally want to know how these adverbial uses of fort should be translated into English, so I need a way to figure this kind of thing out for myself.

First step: find a whole lot of French text in some easily searchable form.  I started with the French section of EUROPARL–a collection of documents from the European Parliament, translated to/from a wide variety of languages.  The French section of EUROPARL contains about 59 million words–so, a whole lot–and you can access it through the Sketch Engine web site–so, easily searchable.  A quick search showed me that fort is quite common in that data set:

Screenshot 2016-04-10 13.23.54
Fort shows up 17,130 times in French section of the EUROPARL corpus–257 times per million words.  That’s pretty frequent.

Once I know that, I know that there will be enough data to calculate the collocations–recall that this is a statistical thing, so you need plenty of data.  The Sketch Engine interface gives me a number of options for how to do the calculations (scroll down to get past the screen shot):

Screenshot 2016-04-10 13.26.44

…which I show you just so that you’ll see that there are a lot of approaches to doing this. I just went with the defaults.

The calculations yielded quite a few possibilities.  Here are some of them:

Screenshot 2016-04-10 13.30.59

If you’re a stickler for data, you might have noticed that the collocations are ordered by the log of the Dice coefficient, which you could think of as a measure of the statistical effect, I guess.  I am really looking for the most common collocations involving fort, though, so I’ll reorder by the cooccurrence count, i.e. the raw count of how often the collocations occurred:

Screenshot 2016-04-10 13.53.36

Crap–that basically tells me nothing.  Why not?  Zipf’s Law.  Remember that Zipf’s Law tells us not only that most words are pretty rare, but also that some words are really, really common, and in French, that certainly includes de (“of”), et (“and”), une (“a”), and the rest of what we’re seeing here.  (Moral of the story: don’t expect the most frequent things in a language to necessarily be the most revealing things in a language.)  If I scroll down a bit, though, I see bien on the list.  683 examples of this–a frequency of 10.25 per million words.  Bien is often an adjective, which would presumably make fort adverbial in these cases, so we’re on to something now.  Let’s check out some of those examples:

Screenshot 2016-04-10 13.58.14.png

So, now I have some cases where it would make sense to use fort, but I want to know how they would correspond to English, too.  This requires that I have access to the corresponding English text.  No problem–recall that the EUROPARL corpus is multilingual.  In particular, it is what is known as a parallel corpus, which means that it contains the same contents in multiple languages, not just similar contents (although that kind of corpus can be useful, too).  I searched for the phrase fort bien.  Here’s an example of the output:

Screenshot 2016-04-10 14.12.24

So, now I have some French/English equivalents for fort bien:

  • Étant donné les prévisions de la politique structurelle ­ que je connais fort bien With these forecasts of the structural policy – which I know very well
  • ce que Jean-Pierre Chevènement a fort bien nommé récemment… referred to recently, and very aptly, by Jean-Pierre Chevènement
  • C’est pourquoi, comme l’a déjà fort bien expliqué M. Kalas  Hence, as Mr Karas has stated to his credit
  • je comprends fort bien la préoccupation  … I have a great deal of sympathy for the unease
  • Vous savez fort bien que…  You know very well that
  • non seulement parce que le président le connaît fort bien…  …not only because the President is very familiar with it…
  • Il est fort bien d’ organiser des réunions, mais ce sont les résultats qui comptent.  Meetings are all very well, but it is the result that counts.
  • ils se tirent fort bien d’affaire.  …they are managing really rather well.
  • et je les comprends fort bien.   …which I fully understand.
  • Ils les connaissent fort bien et un par un.  They recognise each and every one of them very well.

I’m feeling good about how to use fort bien now, but I want to know about other ways that fort could be used with an adjective.  So, I’ll do another search of the parallel corpus (i.e. the matched French and English texts), but this time I’ll just search for fort, and I’ll specify that I want it to be an adverb.  Here are some of the results:

Screenshot 2016-04-10 13.39.56

Now I have some general examples of how to use fort:

  • Nous estimons fort positif que  We see it as a very positive sign that
  • Le rapporteur constate également fort justement que The rapporteur has also quite rightly stated that
  • Ce que nous faisons maintenant est probablement fort important…  What is being done may well be very important
  • …l’ Union européenne a fort justement octroyé  …the European Union was right to support…
  • nous entretenons des relations bilatérales fort satisfaisantes avec  …We have very satisfactory bilateral relations with

I don’t know every adjective with which it would be OK to use fort, but I know one more than I did when I got out of bed this morning, and I’m cool with that–one less time when I’ll have to use très, which is all that they teach us in school.

A colleague had some observations on this:

On top of being used in collocations, it also marks a style / genre which is somewhat formal or elevated (“soutenu”). This might explain why it remains frequent mostly in collocations and is less frequent (or more marked) in freer combinations. This gives the expression a literary turn or a pretense to a higher register.  Both in speech and in writing, it is “soutenu.”

Another native speaker had this to say about it:

“Fort” is used as a synonym of “très”, before adjectives or adverbs . You can use it in about any case, it’s just more elegant than “très”, but not really literary .

The Mr. Pen guy on the subway turned out to be pretty crazy, as far as I could tell.  At one point he snapped at my adorable cousin, who happened to be visiting, and I told him to cut it out.  This was followed by an initially amusing conversation between him and me that at some point degenerated into a loud tirade on his part.  I kept telling him that my French wasn’t that good and I couldn’t understand him, but he just kept going and going.  Eventually French people around us began telling him to stop being an asshole and words to that effect, so I assume that it wasn’t very nice, but honestly, I couldn’t tell you.  At some point a large and very drunk French guy got on the subway car, and started seriously getting in Mr. Pen’s face–it was clear that this was going to turn violent.  Mr. Pen was a very diminutive Haitian man, and I wasn’t going to watch him get the shit beaten out of himself no matter how bizarre he was being, so I got involved.  The train stopped, Mr. Pen jumped out, and Mr. Drunk Guy launched into an animated discussion with me about American heavy metal, punctuated by snatches of Metallica songs.  All in all, an unusual evening on the metro, but not an unpleasant one by any means–just part of life in The Big City, as we say in English.

Oh: it’s serious injury.

 

 

The confusion of thinking about the subjunctive

I got an email today with this question:

Screenshot from 2016-03-16 15:41:59
“Do you think that I should bring it to the English-speaking depositor’s attention that their article is a little short?” Picture source: screen shot of my email.

It’s a nice data point regarding something that’s difficult for us English speakers to remember: penser que (“to think that”) takes the indicative in the present affirmative (that is, when you’re making a statement).  However, it takes the subjunctive when it’s used in a question, and when it’s used in a negative.

The Lawless French web site has a succinct description of how it works at this web page.  Using the example of devoir that showed up in the email, we would have this (hopefully one of you native speakers out there will double-check me):

  • Je pense que tu dois…  I think that you should…  (present affirmative, takes indicative dois)
  • Je ne pense pas que tu doives…  I don’t think that you should… (present negative, takes subjunctive doives)…
  • Penses-tu que je doive…  (present interrogative, takes subjunctive doive)

I hate it when Anglophones complain about the subjunctive–I think it’s charming.  I bring this up only because it’s a corner of the grammar that puzzled the heck out of me today.  How does this work in the future tense?  I have no clue.  I’d love to be able to say “I don’t think that Trump will win the election”–present tense?  Subjunctive?  No clue.  Native speakers?

How we’re sounding stupid today III

There’s an infinite number of ways to sound stupid in French, but only one right way to say a date in French.

A friend recently wrote to ask if I were in Paris.  I answered:

Screenshot 2016-02-26 07.00.05

She answered thusly:

Screenshot 2016-02-26 07.00.15
The 8th of March (you can’t not say it).”

We learn two things from this datum:

  • How to say dates
  • How to negate an infinitive with a direct object

Regarding dates: the definite article le always has to be there, as my interlocutor said.  Be careful: you say a date with the masculine definite article le, e.g. le 8 mars, “March 8th”–but, the word “date” itself is feminine–quelle est la date?  “What’s the date?”  For more on how to talk about dates in French, see this page on the Lawless French web sit.

Regarding negating infinitives: the first thing to note is that ne pas goes in front of the infinitive, so you would say ne pas manger “not to eat,” NOT ne manger pas.  Throw in a direct object pronoun and it goes in front of the infinitive, too: ne pas le dire, “not to say it.”

What happens if you have an indirect object pronoun? A direct pronoun and an indirect object pronoun?  A direct object pronoun, an indirect object pronoun, and a reflexive verb?  Here are some examples of those, from blogger and native speaker Bea dM:

  • Direct and indirect object pronouns: ne pas le lui donner, “not to give it to him.”  Moral of the story: ne pas precedes all of the object pronouns.
  • Reflexive pronoun and direct object pronoun: ne pas se le répéter, “not to repeat it to himself.”  Moral of the story: ne pas precedes the reflexive pronoun, as well.

What’s making us sound stupid today II

linked-data-and-time-modeling-researcher-life-lines-by-events-26-638
Objects and events. Picture source: http://www.slideshare.net/c_kessler/slides-26004724, by Johannes Trame, Carsten Keßler, and Werner Kuhn.

Is an event a thing?  In traditional grammar, they are, at least on the level at which we’re taught traditional grammar in the Anglophone education system.  Events are nouns, and specifically common nouns, as far as I know.  So, we see a similarity between many dogs and many breakdowns, and a difference between many storms and a lot of juice.  Dogs and breakdowns are easily pluralizable and take many, while juice is not pluralizable (it certainly is, but with different meanings) and takes a lot of.

So: in English, events are things.  However, today I ran across some evidence that in French, they are not.  Here’s how it went, and how I sounded stupid.

I’d been trying to work out the details of some flights for the past couple days.  My host in France was the go-between between me and the person booking the travel.  Eventually the person booking the travel sent me some flights, and I wanted to write back to say that they were fine–“that works,” as you might say in English:

Screenshot 2016-02-18 13.41.50
My email.

One of the things that I really, really appreciate about France is that many French people (as you will have read in innumerable books about France) are willing to point out your errors in French.  This is how we improve, and I love it!  Here’s what I got back:

Screenshot 2016-02-18 13.43.38
(Part of) the response.

What’s going on here?  It’s as my interlocutor described it: marcher is something that can refer to a thing, but not to an event.  From a linguist’s perspective, this is fascinating, because it sheds some light on the status of a basic, very fundamental question in the semantics of a language: what are the kinds of distinctions that the language makes?  Or, from a more poetic standpoint: from the point of this language, how is the world constructed?  This is a question of ontology, the subject of this post from a couple days ago.  Questions about language can be framed as very concrete questions about statistics, and they can be framed as very abstract questions about philosophy, and both approaches have their uses.  Either way, the answer to the question should come from actual data.

Anyways: that’s how I sounded stupid today.  Or, at least, that’s one way that I sounded stupid!  Oh, and one more thing: the French word for “event” is one of the words affected by the big spelling reform coming up this fall.  It’s going from événement to évènement.  You know what this means: one more word that I’ve been pronouncing incorrectly for the past two years!

Update, March 26th, 2016

I showed this post to my interlocutor.  Here’s his response–an alternative analysis.

Screenshot 2016-02-26 15.00.37

Brigitte gets her hair cut, I say something stupid, and we explore causation in French

One day Brigitte walked into the office looking even more fetching than usual.  T’as coupé les cheveux? I asked–did you cut your hair?  Je me suis fait couper les cheveux, she corrected me–I had my hair cut.  In English, you could say either (as well as some other stuff, like I got my hair cut or I got a haircut or (for a woman, but not a man) I had my hair done, although that’s a bit different, as it could involve things like curling without actually cutting), but in French, if it’s a “caused action,” you have to use the faire construction.

This can actually be a fairly complex construction, in French as well as in English.  Laura Lawless‘s page on About.com breaks it down into four possibilities:

  1. The thing that is being acted on is being expressed, but not doing the action.
  2. The thing that is doing the action is being expressed, but not the thing that is being acted on.
  3. The thing that is doing the action and the thing that is being acted on are both expressed.
  4. The one exceptional expression faire voir, “to let someone see something” or to “show someone something.”

Let’s work through these.  They all have one thing in common: the verb faire will be followed by an infinitive.  So: Je me suis fait couper les cheveux.  If you’re doing Laura Lawless’s first option–only mentioning the thing to which the action will be done–you have this formula: faire + infinite + object.  For example:

A l’international, Interflora vous permet de faire livrer des fleurs dans plus de 140 pays grâce à un réseau mondial qui regroupe 45 000 artisans fleuristes.

“Internationally, Interflora lets you have flowers delivered in more than 140 countries, thanks to a world-wide network that brings together 45,000 florist artists.”

–Source: http://www.interflora.fr/

Let’s suppose that you’re only going to mention the person (or whatever) you you’re going to cause to do the action.  It’s actually the same formula: faire + infinitive + actor.  Google gave me this autocomplete:

Screenshot 2016-01-26 04.26.10
“How to make a teenager study.” Picture source: Google autocomplete screen shot.

What if we want to express both the person (or thing) who we’re going to make do the action, and also the thing to which the action will be done?  Now the formula gets interesting.  (OK: I freely admit that my definition of “interesting” might be a bit different from yours.)  Now we have faire + infinitive + object + à/par + actor.  What that means: we’ll have faire + infinitive, as always.  Then we have the person or thing that is being acted on.  Then we have à or par, followed by the actor.  Let’s see some examples.  I’m going to borrow/steal them from Laura Lawless’s page, because searching for these things is beastly and I prefer not to make up examples myself:

Je fais laver la voiture par/à David.
I’m having David wash the car.

Il fait réparer la machine par/à sa sœur.
He’s having his sister fix the machine.

How about if we have pronouns?  Negation?  Reflexives?  (Advice from a linguist: when you’re learning a new verbal construction, learn the negated, pronominal, and reflexive forms sooner rather than later.)

Here’s a good example of a negation.  Moral of this story: the negative goes on the verb avoir. 

En deux ans et demi mon travaille est irréprochable mais j’ai eu quelques rares absenses dut à une maladie, que je n’ai pas fait attester par un médicin.

For two and a half years my work has been flawless but I have had some occasional absences due to an illness that I didn’t have a doctor vouch for.  Note: absenses should be absences, travaille should be travail, and I think dut should be dû.

Source: http://www.lesocial.fr/forums/19-2147-5-contrat-vacataire-en-mairie

Here’s a good example of a pronominal actor, from this web page on how to apologize via text message.  Moral of the story: the pronoun goes on faire. 

Je ne voulais pas te faire souffrir. S’il te plaît pardonne moi. Je ne sais pas comment te dire que je suis vraiment désolé.

I didn’t want to make you suffer.  Please forgive me.  I don’t know how to tell you that I’m really sorry.

Here’s a good example of a pronominal acted-on.  Moral of the story: it’s a direct object pronoun–in this case, la.

Screenshot 2016-01-26 16.26.11
“Can you send me your corrected composition?  I’d like to have my students read it.” Picture source: Screen shot of an email from my French tutor.

Here’s an example of a situation where you’re going to have something done to or for yourself.  Moral of the story: the reflexive particle (in this case, me) goes on faire.

Arrête-moi, je vais me faire tatouer
Stop me, I’m going to have myself tattooed

Source: http://bescherelletamere.fr/arrete-moi-je-vais-me-faire-tatouer/

The final faire causatif construction that Laura Lawless tells us about is faire voir: let (me) see.  Faites voir!  is the formal form, and fais voir! is the informal form.  They both mean “let me see!”

Can these constructions be ambiguous?  I’ll bet they can.  Consider this example from ibo.org, which I found thanks to the marvelous linguee.fr web site:

Puis-je faire envoyer mon relevé de notes à mon adresse personnelle ?

Can I have my transcript sent to my personal address? 
Could that possibly be interpreted as “Can I have my transcript sent by my personal address?”  I’ll have to ask native speakers to jump in here, but it almost certainly can.  A human wouldn’t make this mistake, but a computer has no way to avoid it without some knowledge of the kinds of things that can send things, and the kinds of things that can be sent.  We talked about this kind of issue for computer interpretation of language here.  (I saw a great example of this with Hollande, the current president of France, but can’t find it now.)  There’s plenty more to know about the faire causatif construction–what if you’re having something done to or for yourself?  What if there are two pronouns (“I’m making her do it”)?  How about passives?  If you want to know more about how all of these things work in the faire causatif, do check out Laura Lawless‘s page on About.com–it’s really very clear.

 

Zipf’s Law, the Poisson Distribution, reflexive verbs, and terrorism in the age of social media

Screenshot 2016-01-25 23.09.23
“Islamic State dramatizes the macabre will and testament of the terrorists of the Paris attacks.” This is the mettre en scène (non-reflexive) form of the expression.  Picture source: screen shot of http://www.bfmtv.com/societe/dans-une-nouvelle-video-daesh-met-en-scene-les-auteurs-des-attentats-de-paris-946083.html.

By now, we know what goes hand-in-hand with Zipf’s Law: the Poisson Distribution.  Zipf’s Law explains why we run into words that we don’t know in a foreign language every stinking day, and the Poisson Distribution shows how even rare events can come in clusters.  Three rock stars die in one month, and the like.  This morning I ran into two occurrences of an expression that I’d never seen before at all.  There was an interesting twist, in that it’s actually two expressions, one with a regular verb, and one with the reflexive form of the same verb.  Reflexive verbs in French can refer to performing an action on oneself–je me mouche “I blow my nose,” je mouche le bébé “I blow the baby’s nose” (no, I didn’t make that up–look here).  In this case, the meaning of moucher is the same–it’s just a question of whose nose is getting blown.  However, non-reflexive and reflexive verbs can also have different meanings, and that’s the case with the expression that had me going to the dictionary before I even had breakfast this morning.

Mettre is one of those common and rather irregular verbs that shows up in a bazillion expressions.  This one has two forms.  The non-reflexive, mettre en scène, means to stage or to dramatize (definitions from WordReference.com).  The reflexive form, se mettre en scène, is to put on a performance.  I saw it on Twitter today: L’atroce vidéo de l’Etat Islamique montre que nous avons changé d’époque.  L’ultraviolence la plus sordide se met en scène façon Hollywood.  “The atrocious Islamic State video shows that we have changed eras.  The most sordid ultraviolence puts on a show Hollywood-style.”

I wish that I had some clever way to wrap up this discussion, but learning yet more vocabulary by way of terrorism just depresses me.  Yuck.  If you’re interested in theorizing about terrorism and the media in general and social media in particular, try this Wikipedia page for starters.  Sigh…

  • mettre en scène: to stage, to dramatize.
  • se mettre en scène: to put on a performance.
  • la mise en scène: staging, directing; dramatization; stageplay, stage direction.

Screenshot 2016-01-25 23.15.07

Doing computational lexical semantics with your web browser: An approach to using data to build semantic representations

Here’s how you can do computational lexical semantics in the comfort of your own home–and how to talk about it in French.

A lot of my work involves something called lexical semantics.  Lexical semantics is the study of how words mean things.  (That means that there’s some interaction with the question of how sentences mean things, since part of the meaning of a sentence comes from the words that it contains, but in lexical semantics, the focus is on the words and how they contribute to and interact with the semantics and the syntax (the phrasal relationships) of a sentence.)  In particular, I do something called computational lexical semantics.  That means that I use large bodies of data as a crucial part of my work, and I evaluate my work in part by trying to use it as the basis of computer problems.  If that doesn’t work, then I figure that what I’ve done needs to be improved.

My advisor is one of the world experts on computational lexical semantics.  (I won’t name her, since I try to keep this blog anonymous.)  As far as I know, she was the first person to demonstrate that large bodies of naturally-occurring data can, in fact, be used to test theories of lexical semantics.  This was important because semantic theories often haven’t really been tested in any way that would count as a “test” in science, and as we’ve seen in other posts, linguistics is the scientific study of language.  She often says that semantics is not a suitable subject of study for linguistics, since it’s so subjective.  I’m never sure whether or not she’s kidding; regardless of whether she is or not, one of my professional ambitions is to take the subjectivity out of computational lexical semantics.

Part of my approach to that has been to try to develop a systematic methodology for developing semantic representations of words.  In particular, I work with verbs, and with nouns that are derived from those verbs–for example, the verb phosphorylate (I specialize in biomedical language) and the related noun phosphorylation, or the verb receive and the related noun receptor.  (You’ll notice that there are different relationships between the verb and the noun in the two examples–phosphorylation is a noun that refers to the action of the verb, while receptor is a noun that refers to the thing that does the receiving.)

One of the bedrocks of my approach is that I try to base my representations of the meanings of words on data that I didn’t come up with myself.  (Note that I didn’t invent this idea, or any of the other aspects of the approach that I describe here—this is just my recipe for putting them all together.)  I mostly work with scientific journal articles.  There are two parts to what I do:

  1. Coming up with the representation of the meaning of the verb (or noun).
  2. Coming up with examples that let me test the representation, both by providing examples of the different effects that I think that the meaning of the verb has on how it behaves on sentences, and also doing a quick check to make sure that I don’t see any examples that argue against my representation of the meaning of the verb.

This is a pretty iterative, complementary process–I typically start out by looking at a bunch of examples of the verb to get a general sense of how it works, then write up a quick representation of the semantics, and then look for examples more systematically to see if my representation works.  Some of the goals that I keep in mind when I’m searching for these examples are:

  1. I want to know whether or not humans can be the agent of this verb.
  2. I want to be sure to get the full range of prepositional complements, as these can mark a variety of semantic relations.
  3. I want to get a variety of semantic classes as the subject and the object of the verb.
  4. If there is a deverbal nominalization, I want to get that, too.

Knowing whether or not humans can be the agent of the verb is important to me for a number of reasons.  People often question whether or not humans can perform the actions of particular verbs in the biomedical domain.  For example, Wikipedia describes the action of the verb phosphorylate as ” the addition of a phosphoryl group (PO32−) to a molecule. ”  That doesn’t sound like something that a human could do, right?  But, you can find sentences like this:

In order to determine the number of phosphorylated sites in human cardiac MyBP-C samples, we phosphorylated the recombinant MyBP-C fragment, C0-C2 (1-453) with PKA using (gamma32)P-ATP up to 3.5 mol Pi/mol C0-C2.

–Source: http://www.ncbi.nlm.nih.gov/pubmed/18573260

I’m trying to represent the semantics of the language, not the semantics of phosphorylation, so I need to take into account all of the data about the language, and that includes this kind of counter-intuitive use of the verb.  Why do we care about humans so much, though?  It’s because humans are the prototypical example of things that act with volition, or as a result of their will–what we call agents in the English-language terminology of linguistics–and agents get represented in a special way in lexical semantics.  So, we need to know if there can be a human agent for these biomedical verbs so that we can know if they can have agents at all, essentially.

Getting examples of the full range of prepositional complements (e.g. phosphorylate at, phosphorylate to) is important to me because different prepositions sometimes mark different aspects of the semantics of the verb.  For example, when we investigate phosphorylate at, we see that the semantics of phosphorylation involve a specific location on a molecule, and when we investigate phosphorylate to, we see that the semantics of phosphorylation involve something becoming something else–the to marks not a location at which the molecule ends up, but what the molecule becomes–like I had it converted to a round-trip ticket, in “normal” English.

Getting examples of a variety of semantic classes as the subject and the object of the verb is important to me for two reasons.  One reason is that I’m doing computational lexical semantics, specifically, which, as you might recall, means that I test my semantic representations by trying to use them as the basis of a computer problem.  I know that it can be important to know what kinds of things are taking part in the action of a verb in order to know how to interpret both the verb itself, and the sentence that it occurs in.  Imagine these situations: the author finished the book, the student finished the book, and the goat finished the book.  In the first, this means that the author completed the writing of the book.  In the second, this means that the student completed the reading of the book.  In the third, this means that the goat finished the eating of the book.  Can there be other interpretations of these sentences?  Of course–authors also read, students also write, and in a work of fiction, you could certainly imagine a goat reading a book.  But, none of these are the intuitively obvious interpretations of those sentences, and the reason for that is the expectations that the different subjects—author, student, goat–lend to our interpretations of the sentences.)  The other reason that I want to get a decent range of the types of semantic classes that can be the subjects and the objects of a verb is that I work with ontologists quite a bit.  I find that their models of the domain often don’t objectively seem to have taken full advantage of what the literature of the domain has to say about how those models would need to look if they’re going to be adequate, and collecting examples of lots of different semantic classes taking part in an action is my stab at being helpful.

So, how does one going about doing this with a minimum of subjectivity and a maximum of data-centeredness?  I follow roughly the following steps, pretty much in this order, allowing for some going back and forth between them as I fine-tune things:

  1. Look at what other people have done.  I didn’t always do this, as I wanted to see how different what I came up with was from what other people had come up with, but by now I have a decent feel for what kinds of differences there are likely to be (they’re related both to the different content matter and to the different writing styles that my work and previous work are based on), and I usually start by looking at the representations in the Unified Verb Index.  (Search for the verb of your choice.)
  2. Look at some random examples of the verb in use to get a general sense for how well the representation in the Unified Verb Index matches up with biomedical data.  I use the Sketch Engine interface to do my search for random examples, but you can use Google, specialized textual search tools, or whatever is easy for you.
  3. Look for examples of human agents.  I usually go to Google for this one, as the data that I have uploaded to Sketch Engine doesn’t have very many humans, in general.  My two tricks:
    1. I use Google’s site: operator to search just within the National Library of Medicine’s web site.  That way I can be almost positive that I’ll get examples of how the word is used in the biomedical domain.
    2. The first thing that I try is a Google exact phrase search with we plus the past tense of the verb.  You mark a phrasal search by putting the exact phrase that you’re looking for in double quotes.  So, my search for we phosphorylated looked like this: site:http://www.ncbi.nlm.nih.gov/pubmed/ "we phosphorylated"
  4. Look for the full range of prepositional complements.  I do this with Sketch Engine’s word sketch function.
  5. Look for a variety of semantic classes as the subject and the object of the verb.  Again, I use Sketch Engine’s word sketch function for this.

Then it’s time to see if the semantic representation actually covers everything that I’ve found using the strategy above.  If it does, then we’ll do a larger-scale project of marking up all of the examples of the verb in some large body of data, followed by trying to write a computer program that can make use of the representations and the examples to learn how to identify the semantics of the verb when shown new examples.

Here is some of the vocabulary that you will need if you’re going to talk about this kind of stuff in French.  Here is some data from the French Wikipedia page about semantics.  This will give us some of the vocabulary of semantics in general–then we’ll move on to lexical semantics.

La sémantique est une branche de la linguistique qui étudie les signifiés, ce dont on parle, ce que l’on veut énoncer. Sa branche symétrique, la syntaxe, concerne pour sa part le signifiant, sa forme, sa langue, sa graphie, sa grammaire, etc ; c’est la forme de l’énoncé.

  • la sémantique: semantics
  • le signifié: the “signified,” the concept or mental representation that is the locus of meaning.  (I should point out that it is unfortunately rare for English-speaking linguists to use this old Saussurean terminology, at least in my corner of linguistics.)
  • énoncer: to formulate, state, or pronounce (definition from Wikipedia.org).
  • la syntaxe: syntax.
  • le signifiant: the “signifier,” the spoken (or, in my field, written) form that corresponds to the signifié or “signified.”  (See above about unfortunate tendencies to not use Saussurean terminology.)
  • la graphie: written form (definition from WordReference.com).
  • un énoncé: in linguistics, this usually corresponds to the technical term “utterance,” but since we’re talking specifically about syntax here, it may be better translated as “wording” (see WordReference.com).

Now let’s move on to some vocabulary that’s more specific to lexical semantics.  We’ll take this material from the book Introduction au TALN et l’ingénierie linguistique, by Isabelle Tellier.

La sémantique lexicale est l’étude du sens des “mots” -ou plutôt des morphèmes- d’une langue. Cette définition est en réalité assez problématique, puisque la notion même de “sens” n’a rien d’évidente. Le problème tient précisément à ce que, pour définir le “sens” d’un mot, on recourt en général à d’autres mots. Pourtant, la consultation d’un dictionnaire d’une langue donnée est de bien peu d’utilité si on n’a pas déjà d’un minimum de connaissance de cette langue. Comment échapper à cette “circularité du sens” ? Nous evisageons dans ce chapitre (et le suivant) diverses tentatives qui peuvent être regroupées en trois familles…

  • la sémantique lexicale: lexical semantics.
  • le morphème: morpheme.
  • avoir rien d’évidente: I don’t know!  Can someone help out with this?
  • tenir à qqch:  to come from, stem from, arise from.  (Note: tenir à has a bazillion other meanings–see WordReference.com for this definition and many others.)
  • recourir à: to resort to, appeal to.  (Definition from WordReference.com.)

Want to learn more about the kind of approach to (computational) lexical semantics that I’m talking about here?  Check out my advisor’s book on the subject–Martha Palmer, Daniel Gildea, and Nianwen “Bert” Xue’s Semantic role labeling.  (I’m not telling you which of these people was my advisor–still anonymous!)

Ukrainian Humanitarian Resistance

Resisting the russist occupation while keeping our humanity

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE