Cautiously optimistic

img_0462
Graffiti that I saw on my way into a metro station this morning: “Neither Macron nor Le Pen means Le Pen.” Picture source: me.

A very good thing about France: the French don’t really do protest votes.  That’s not to say that we don’t have the ni-nis—those who say that they won’t vote ni for Macron, ni for Le Pen.  A ni-ni might abstain, or voter blanc–-submit a blank ballot.  But, it’s not exactly a common thing in France.  France has a two-round election process, with multiple candidates in the first round, and only the top two finishers in the second (except in the unlikely event where someone takes more than 50% in the first round.) People sometimes say that you vote the first round with your heart, and the second round with your head.

I would say that Americans vote 80% on emotion, and 20% on the basis of their takes on the candidates’ actual policies.  (I don’t except myself from that; “one’s take on something” explained in the English notes below.)  In contrast, I would guess that the French tend to vote 20% on emotion, and 80% on the basis of their takes on the candidates’ actual policies.  I know plenty of people who aren’t at all crazy about Macron’s proposals for the economy, but given a choice between someone about whom they’re not crazy and some Nazi sociopath, of course they’re going to vote for the guy about whom they’re not crazy.  The photo above–some graffiti that I saw as I walked into a metro station this morning–is representative of the opinion of everyone with whom I’ve talked: deciding not to vote for either of them is to vote for Le Pen.

The best lack all conviction, while the worst
Are full of passionate intensity.

–W. B. Yeats, The Second Coming, 1919

The worry of most of the people I know is that Macron is so far ahead of Le Pen in the polls that everyone will assume that he’s going to win, and too many people will decide that they don’t need to vote, and then the Le Pen voters all show up, and boum–Le Pen wins.

I was never really all that struck by Yeats’ poem The Second Coming when I was an English major in college.  We mostly contented ourselves with showing off our knowledge of what a gyre is, and moved on to Beowulf, or Salman Rushdie.  But, ever since Obama got elected and the Republican Party went insane over the sight of a black man in the Oval Office, The Second Coming has become more and more meaningful to me.  With Trump in office, it has gone past “meaningful” towards “frightening”–at the very least, foreboding.

The polls in France close in four hours.  We’ll see what happens.

The Second Coming

W. B. Yeats, 1919

Turning and turning in the widening gyre
The falcon cannot hear the falconer;
Things fall apart; the centre cannot hold;
Mere anarchy is loosed upon the world,
The blood-dimmed tide is loosed, and everywhere
The ceremony of innocence is drowned;
The best lack all conviction, while the worst
Are full of passionate intensity.

Surely some revelation is at hand;
Surely the Second Coming is at hand.
The Second Coming! Hardly are those words out
When a vast image out of Spiritus Mundi
Troubles my sight: somewhere in sands of the desert
A shape with lion body and the head of a man,
A gaze blank and pitiless as the sun,
Is moving its slow thighs, while all about it
Reel shadows of the indignant desert birds.
The darkness drops again; but now I know
That twenty centuries of stony sleep
Were vexed to nightmare by a rocking cradle,
And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?


English notes

One’s “take on” something is your opinion, or analysis of, it.  Note that this is entirely different from the verbal idiom to take something/someone on.  

  • I want to comment on Trump’s take on the Civil War and Andrew Jackson… but, seriously, it hurts me. READ A BOOK!  (Twitter) (Context: Trump recently said something about a former populist president, Andrew Jackson, that is consistent with either (a) Trump being an uneducated idiot who, in particular, doesn’t know anything about American history, or (b) Trump being a very bad person.)
  • Gr8, some sources just hav a screwed up set of priorities. Who cares about Trump’s take on med marijuana when the health care plan sucks?! (Twitter) (Context: the Republican-controlled House of Representatives just voted to repeal Obamacare and replace it with a disaster.)
  • Trump’s take on Andrew Jackson isn’t astonishing; what is astonishing is that this country elected an ignorant pussy-grabbing Richie Rich.  (Twitter)
  • My take on Trump is that he just wants to be liked by whoever is in front of him, which makes him inconsistent and unreliable.  (Twitter)
  • My take on Trump’s worse-than-worthless briefing to every senator on the North Korean problem.  (Twitter) (Context: here’s a link to the Tweeter’s article on Trump’s attempt to swing the Senate in his favor with respect to whatever crap he’s brewing concerning North Korea.)

How I used it in the post: I would say that Americans vote 80% on emotion, and 20% on the basis of their takes on the candidates’ actual policies.  

Downside/upside

On the downside: I’m in Paris, it’s a quarter to 5 in the morning, I just ate a bag of low-fat popcorn and a stroopwafel that I squirreled away on the San Francisco to Roissy flight because there is no other food in my apartment, and I still can’t go to sleep because I still haven’t even started to write something for my Friday morning French lesson.

On the upside: I’m in Paris, it’s a quarter to 5 in the morning, I just ate a bag of low-fat popcorn and a stroopwafel that I squirreled away on the San Francisco to Roissy flight because there is no other food in my apartment, and I still can’t go to sleep because I still haven’t even started to write something for my Friday morning French lesson.  This is something that no human has ever experienced before.  I am LIVING, damn it!  

(Downside and upside explained in the English notes below.) What I have done since midnight instead of writing something for my Friday morning French lesson:

  1. Investigated the verbs of which the French noun le chapitre can be the subject
  2. Investigated the polysemous nature of the French noun le chapitre–did you know that it can mean something like “one’s say”?  I certainly didn’t…
  3. Learned some new vocabulary items–we all know that the consequence of Zipf’s Law, from the perspective of a second language learner, is that you will be running into new words every single day for the rest of your life… (barbu, biniou, goujon, sentinelle, avitaillé, dessaler, cheville (petite tige qui sert à fixer), pointu)
  4.  Started a blog post about the verbs of which le chapitre can be the subject–didn’t finish it
  5. Watched a video of a Patachou song about “la chose” (http://m.ina.fr/video/I07072974), felt guilty for laughing about something so juvenile, laughed anyway
  6. Made a video about what the American English word gonna means (check it out–feedback appreciated)
  7. Started a blog post about reviewing the Methods section of a research paper–didn’t finish it
  8. Watched a lot of French-language cat videos
  9. Wrote a blog post about frame semantics–finished it
  10. Watched way too many Têtes à claques videos (thanks, Courtney of Learn French Avec Moi–those little guys have brought so much happiness into my life!)
  11. Learned some more vocabulary (chronophage, de suite, détrompez-vous, mauvaises langues, encercler, ressasser, en étais resté là, partiels, relâchement, se ressaisir, se la péter, ça tombe bien, invocatrice, quête, case)

5:23 AM… Time to write something for my French lesson…  I wonder what verbs le chapitre can be the subject of…


On the downside/on the upside… The negative aspects of something (Merriam-Webster)/the positive aspects of something

  • On the downside I lost my Healthcare. On the upside, if I get sick, AR14s are still legal and I now have a roll call of where to use them (Twitter–for context, the Republicans in the House of Representatives just voted to do away with Obamacare today)
  • On the downside, i didn’t eat today. On the upside, i made a list of everything i need to do to get myself out of this fucked up situation.  (Twitter)
  • on the upside all my finals are on monday and then i’m done…on the downside ALL MY FINALS ARE ON MONDAY WTF (Twitter)
  • Downside: just voted for . Upside: We know whose political bones to grind to powder in #2018  (Twitter)
  • Downside to being an insomniac: No sleep Upside: I get to see a beautiful sunrise from my roof as the birds sing (Twitter)

Buying or selling, all money leads to Trump: frame-based semantics

How can “buy” and “sell” have similar meanings?

Hi Kevin Zipf,

I was going through Elisabetta’s book (the one I was supposed to return you on Friday and I forgot, sorry!), there is a sentence “Typical lexical structures are, for example: morphological word families, such as book, booking, booklet, bookstore, based on presence of the word book; semantic network such as buy, acquire, purchase, sell, negotiate, pay, own, based on meaning associations; and groups of words with similar syntactic behavior, for example nouns, verbs, or adjectives”.   I was wondering how “buy, acquire, purchase, sell, negotiate, pay, own” can be combined together in a single semantic network?  Semantic network consists of words with similar meanings, right? How can “buy” and “sell” have similar meanings?

Yours,

P


Hi, P,

I LOVE it when you ask me questions like this!

In addition to their use in describing language, frames are useful in the broader context of cognitive science.  For more on how that works, see this post on the subject of linguist and cognitive scientist George Lakoff’s framing-based explanation for Trump’s takeover of the Republican Party in 2016.

Regarding the specific example: this is what is called a “frame.”  The idea is that there are some things:
– two people
– an object
– a quantity of money
You can talk about the relationships between those from different perspectives:
John sold Mary a car for $5,000.
Mary bought a car from John for $5,000.
Mary paid John $5,000 for the car.
2-figure1-1
Picture source: Agarwal, Apoorv, Daniel Bauer, and Owen Rambow. “Using Frame Semantics in Natural Language Processing.” Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore. Vol. 1929. 2014.

If you think about “semantics” as being a mapping between language and a model of the world, then the model of the world is the same in the case of all three sentences, so in some sense, the meaning is the same in all three cases.  What’s different is whether we talk about it from the perspective of John (sell), Mary (buy), or the quantity of money (pay).  You could argue about whose perspectives these are, and perspective isn’t necessarily even the best word for this, but that’s the sense in which those are related.  To get the others in there, consider, for example, that selling is about a change in ownership; selling involves a previous negotiation between the same two people (John and Mary) concerning the price that will be paid for the car; etc.

Have a Happy Friday,
Kevin Zipf

On reviewing: The summary

It takes a while to learn to review a paper. Here’s one approach, starting with…how to start your review.

You’d think that when people in my line of work–research–sit around the hotel bar at a conference swapping war stories, we’d mostly be complaining about the crappy state of research funding, pesky deans, and flying economy class–and we do.  But, what we complain to each other about the most is how much reviewing we have to do.  Peer review–the evaluation of articles by your fellow academics for suitability for publication–is a big part of being an academic.  One of the things that makes science an exciting thing to be doing right now is that it’s booming–the amount of productivity in the world of research right now is enormous.  (Booming explained in the English notes below.) In my field alone–biomedical language processing–the number of conferences has grown enormously since I got started in the field, and it shows no signs of slowing down.  The thing is: lots of research activity means lots of papers being produced, and lots of papers being produced means lots of papers to review.  Lots of papers.  Most academic conferences in any field take place either during the summer or in early January–the easiest times to travel without having to miss the classes that you’re usually teaching.  Consequently, there are a couple of periods during the year when you get slammed with a lot of reviewing requests all at once.  This is on top of the constant flow of journal articles, which can get submitted at any time, plus grant reviews, which come in thrice-a-year waves themselves.  It can get pretty overwhelming.

Reviewing is a big responsibility–a reviewer’s comments and recommendations about acceptance affect the progress of science, and the progress of people’s careers, too.  That makes it an opportunity to make a real contribution to your community.  There are some good things about the fact that you’re being asked to do it.  If you’re getting invited to review, it’s a sign that your peers hold your expertise in high enough esteem that they think it’s OK to entrust you with a job that is of some importance.  Reviewing is also part of how you stay on top of what’s hot and exciting in your field.  If you can keep that it mind as you stare at a pile of papers on a beautiful Sunday afternoon when you’d rather be sitting on the back porch with a beer and a trashy novel, it certainly helps.


e0a47bc09472f957ea2813b2acad1512
Picture source: phdcomics.com, https://goo.gl/2oGXfI

There are a lot of approaches to writing a review.  I don’t claim to have the perfect one, and the specifics of how I structure a review have certainly changed over the years.  However, there are a few such structures that clearly make sense, and that you can apply secure in the knowledge that they won’t leave the authors angry and frustrated or the editors that have to pass those reviews along to the authors feeling embarrassed, or worse.  Here’s one structure to think about.  It starts with an overview of the paper that you’re reviewing.

All quotes are from reviews of my own papers.  I was either the first author or the “senior author” (in my field, that means the person who directed the research, typically coming up with the idea and then supervising the design of the experiments and the writing of the article) of the work.

16198_10155276140165640_8482032632772318289_n
The amount of stuff available on the Internet about the pain of poorly-done reviews is not a bad indicator of … Picture source: https://goo.gl/VaUtks

A little overview of the paper at the beginning of your paper serves a couple purposes.  One is to reassure the author that you read the paper with attention.  This may sound obvious, but unfortunately, it’s not that uncommon to get a review back and wonder whether the reviewer really read it.  A research paper typically represents around a year’s work, and it’s Here’s a beautiful example of a summary of the paper at the beginning of a review:

This work presents a novel study of inter-annotator agreement when labelling semantic relations in compound nouns. The authors asked two annotators to annotate such relations in a subset of 101 Gene Ontology concepts according to two commonly used relation sets, namely the Generative Lexicon and the Rosario and Hearst sets, respectively with five and 38 relations. Cohen’s Kappa factor and F1-score are reported for both tasks, with a maximum of k = 0.774 and F1 = 0.90 in a relaxed evaluation of the Rosario and Hearst relation set.

What’s so nice about it?  Everything.  It summarizes:

  • What the paper is about (This work presents a novel study of inter-annotator agreement when labelling semantic relations in compound nouns),
  • …what was done, and with what data (The authors asked two annotators to annotate such relations in a subset of 101 Gene Ontology concepts according to two commonly used relation sets, namely the Generative Lexicon and the Rosario and Hearst sets, respectively with five and 38 relations),
  • …and what the authors found (Cohen’s Kappa factor and F1-score are reported for both tasks, with a maximum of k = 0.774 and F1 = 0.90 in a relaxed evaluation of the Rosario and Hearst relation set).

Here’s another one that was really nicely done.  The reviewer covered pretty much the same things:

The manuscript studied the ability of humans to label the semantic relations between the elements of noun compounds. Two annotators, one with a BS and the other one as a cardiovascular technologist did the annotations. The sample annotation terms were defined based on the GO. The test relations are the Generative Lexicon relations and the Rosario and Hearst relations. The F-measure and the Cohen’s Kappa value are used to measure the inter-annotator agreements. The results showed fairly high agreement even with very minimal guidelines and no real-training.

…which is to say:

  • what the paper is about (The manuscript studied the ability of humans to label the semantic relations between the elements of noun compounds),
  • …what was done, and with what data (Two annotators, one with a BS and the other one as a cardiovascular technologist did the annotations. The sample annotation terms were defined based on the GO. The test relations are the Generative Lexicon relations and the Rosario and Hearst relations. The F-measure and the Cohen’s Kappa value are used to measure the inter-annotator agreements),
  • …and what the authors found (The results showed fairly high agreement even with very minimal guidelines and no real-training).

This paper investigates on the assumption that inter-annotator agreement (IAA) can be used as an upper bound for NLP systems performance. The authors make a review of the literature to extract papers that support this assumptions and papers that instead have found opposite results, concluding that there are several works where NLP systems have demonstrated to outperform inter-annotator agreement. The authors also correlate IAA with the performance of the systems as reported on the papers, finding that in general there is a positive correlation among the two.

This very nice summary doesn’t talk about what was done, or to what data, but it goes much more than the preceding ones into what the authors found, and the reviewer’s assessment of whether or not, and why, that matters.

The manuscript titled “Translational morphosyntax: Distribution of negation in clinical records and biomedical journal articles” discusses differences in the use of negation between journal articles and clinical notes. Clinical notes are found to be much more explicit in their use of negation than journal articles, while journal articles use morphological negation significantly more often than clinical notes. The results have significant impact on mining clinical notes and combining information in clinical notes with background information found in literature.

This one takes the approach of the first summaries that we read–what the paper is about, what was done and with what data, and what was found:

 

The authors present a study on the distribution of negation (explicit at the syntactic/lexical level and morphological at the sub-word level) in two document types (clinical text and scientific journal articles). They investigate whether there are significant differences in the distribution of these two levels of negation between the two types of texts. Distributions are calculated from clinical progress notes from the MIMIC II corpus and the CRAFT corpus. The main findings are that explicit negations are more prevalent in clinical text, while morphological negation is more prevalent in scientific text.

Now, I must say: the preceding introductions are exceptionally well done.  The following is more typical for an introduction to a review–if it has one at all:

The authors compare incidence of two types of negations. They use notes on the status of patients in the Intensive Care Unit and compare these with scientific journal articles on mouse genomics.

Here’s the thing: the one that you just read is enough to make it clear that you read the article and bothered to figure out what it’s about.  Sounds pretty goddamn basic–but, unfortunately, it’s not.  Not having a summary at the beginning of a review that you’re writing really isn’t a problem if you write a well-justified review–but, if you do a shoddy job that leaves the authors wondering whether or not you read the paper with the appropriate level of care, it’s going to piss them off; if they complain to the editor, it’s going to piss off the editor, too, as well as embarrassing them for not having caught your crappy work; and you should feel guilty.  Putting a summary of the paper at the beginning of your review doesn’t just reassure the authors–it’s a good way for you to verify to yourself that you actually do have a good grasp of what’s going on in the paper.  One final note on this: if the paper is so badly written that you can’t actually tell what’s going on in it, it’s totally appropriate to say so, explicitly, and this is the point in the review where you should say it–in the introduction to your review.  Summarize what you can, and be explicit about what parts of the paper weren’t intelligible enough to summarize.

Since I started this piece with a description of complaining, I’ll close with an attempt at attitudinal adjustment.  Ashley ML Brown on her blog:

Reviewing the work of your peers should be pleasurable. Don’t laugh. I am serious. It should be a chance to see what others in your field are doing, a chance to read cutting edge research, and a chance to share your expertise (what good is knowledge if you don’t use it?)


English notes

booming: this word has at least two senses (meanings).  In the blog post, it shows up with Merriam-Webster‘s sense number 2: growing or expanding very quickly.  Here’s how I used it: .  One of the things that makes science an exciting thing to be doing right now is that it’s booming–the amount of productivity in the world of research right now is enormous.

There’s another common sense of this word, which Merriam-Webster gives as making a loud deep sound.  Their example his booming voice is totally natural. 

French notes

l’évaluation par les pairs: peer review.

On destiny

Of paper towels and 16th-century philosophers.

One of the things that makes French so fun to speak for anglophones is that many of the words that we’ve taken from French belong to a high register in English, but are everyday words in French. Case in point: the verb destiner.  In English, this is a high-register word that you probably wouldn’t use very often, meaning something like predetermined.  (Register is a technical term in linguistics that refers to something like the level of formality of usage.  In English, we basically have normal words, formal or academic words, and slang.  In francophone culture, it’s much more complicated–but, that’s a subject for another time.)  Here are the frequencies of destined (the only form that I know of for the word) and a few other words for comparison:

  • destined: 1.25 per million words
  • dog: 69 per million words
  • jump: 30 per million words

(This data is from the written section of the Open American National Corpus, a collection of 11 million words of written American English created by my colleague Nancy Ide at Vassar.  You can download it free here, if you’d like to see what a linguistic corpus looks like.)  Here are some pretty typical examples of how it’s used:

  • What more natural than that the White perception of a bird destined to become a plaything of the western world–as evidenced by another of its names, the lovebird – – should become paramount.
  • The French press gave prominence to President Jacques Chirac’s efforts to get the Russians to bring Milosevic back [to] the negotiating table, and an editorial in Monday’s Libération suggested this should be done by greatly reducing the area of Kosovo destined to become autonomous under the Rambouillet proposals.
  • The iris was more differentiated as evidenced by the fact that some of the cells destined to form the stroma had started to synthesize pigment and were, therefore, distinguishable from those of the future TM.

In contrast, in French the verb destiner means something like intended for or designed to be used as, and as far as I can tell, it’s a pretty everyday word.  Here are the frequencies of the French equivalents of the same English words that we looked at above:

  • destiner: 76 per million words (versus 1.25 per million words for the English word destined)
  • chien: 79 per million words (versus 69 per million words for the English word dog)
  • sauter: 43 per million words (30 per million words)

1.25 versus 76–that’s a pretty big difference.  It’s far more common in French, reflecting the fact that it’s a high-register word in English, but not in French.  (I got these frequencies from the Frantext corpus, a collection of 18th-20th-century French literature, which I picked because like the written section of the Open American National Corpus, it’s written language, and at 15.6 million words, it was the closest in size to it that I could find.  I searched both the Frantext corpus and the Open American National Corpus through the Sketch Engine web site, purveyor of fine linguistic data in many languages, and the tools for searching it.)

So: with destined being a high-register word in English, the sign that you see at the top of this post sounds pretty damn funny.  I ran into it in a bathroom the other day; it translates something like the toilets are routinely stuffed up by paper towels.  Please toss them in the trashcan that’s intended for them.  Americans are often attracted to the French language by way of Molière, or Rousseau, or Voltaire–but, ultimately, it’s just a hell of a lot of fun.

The title of this post is meant to be reminiscent of Michel de Montaigne, the 16th-century French essayist who is considered to be the father of all magazine writers.  Many of his essays have titles like On experience, On idleness (he was a fan), Of the arms of the Parthians, and the like.

Zipf’s Law and my walk to the lab

You know one of the consequences of Zipf’s Law, which describes one aspect of the statistical distribution of the lexicon of a language, namely that it’s a power law (a few words are very common, but most words occur only very rarely): if you’re learning a second language, it’s likely that there will never be a day of your life when you don’t come across words that you don’t know.  I took a different route up the hill to the lab today, which meant that I passed by a lot of houses, rather than walking through the woods.  With the winter at an end, there’s lots of work starting, leading me to run into a lot of large and small construction projects–and all of these new words for me.

img_0131.jpg

img_0132.jpg

 

 

Metro sight of the day


Just another beautiful spring day in Paris.  Metro sight of the day: a one-eyed min-pin (miniature Doberman pinscher) being carried by a young guy–in one hand.  In the other: a 6-pack of beer–with one missing.

le Pinscher nain: miniature Doberman pinscher.  How you pronounce pinscher in French: I haven’t a clew.  (I know how to spell, at least in English–that’s British.)