Just one little linguistics conference

Trump’s un-American attempts to block immigration on the basis of religion have far more effects than might be obvious. Here’s one of them.

walks-avm
HPSG representation of the English verb “walks”, as in “she walks”. Picture source: https://goo.gl/YRKyzJ

In the constant buzz of news about Trump’s various and sundry evils, with their implications for the entire world–Syria, Korea, China, Russia–it’s easy to lose track of the fact that all of this crap has implications for the daily lives of millions of individuals, in ways both large and small.  I came across the email that you’ll see below today while cleaning out my email inbox.  Sent in February of this year, it’s a letter from the organizers of a scientific meeting that will take place in the US this summer.  HPSG is Head-Driven Phrase Structure Grammar, an approach to modelling syntax that’s popular amongst computational linguists.  In one of those small-world things, I took my second semester of syntax from Carl Pollard, one of its creators.  He gave me an A-, which was much higher than I deserved.  Along with my grade, I got a little note, which read as follows: You were willing to ask the questions that everyone else was afraid to ask.  (Don’t get excited–my questions were primarily along the lines of What does X mean?)      

Not being sure whether or not you’ll be allowed into the country to attend an academic conference is not nearly as bad as, say, the situation of a green-card-holding Persian friend of mine who went to the airport in Tehran the day that the first executive order was signed–and wasn’t allowed to come home to the United States.  But, you heard stories like his on the news.  Here’s a view of the crappy situation that you might not have run into.  This is just one little linguistics conference–multiply it by…multiply it by a lot, and you get a view of just one of the effects of Trump’s un-American immigration-related crap.  Obscure vocabulary items explained in the English and French notes below.

Dear HPSG Colleagues,

As you are probably aware, towards the end of January the US President, Donald Trump, issued an executive order which sought to suspend entry to the United States for all refugees for 120 days; to bar Syrian refugees entirely; and to block entry to the United States to citizens of Iran, Iraq, Libya, Somalia, Sudan, Syria and Yemen for a period of 90 days. This order has been widely condemned by academic and scientific bodies, including the Linguistics Society of America (LSA, see below), and the constitutional legitimacy of this order been challenged in the courts. It is not clear what the outcome of this challenge will be.

This is directly relevant to this community because the 24th annual HPSG conference is scheduled to take place at the University of Kentucky in Lexington, in the USA (between 7-9 of July).

The main implications for this community are: first that scholars and researchers from the affected countries may not be able to attend the conference; and second that scholars and researchers from countries not directly affected by the executive order may withdraw from the conference as a form of protest against it, and in solidarity with those who will be affected by it (e.g. by not submitting papers, not attending the conference, refusing to serve on the Programme Committee, declining invitations to be guest speakers).

Given this, the Standing Committee (whose primary task “is to see that the yearly conference is organized, preferably at accessible places”) have been discussing whether the location of the conference should be changed to somewhere outside the USA.

After considerable thought, and consultation with the Local Organizer, we have decided that it should not be changed. One reason is that simply moving the conference would not avoid the problems that the executive order raises (e.g. it will prevent citizens of the affected countries who are US residents from leaving the US, because they will not be certain of their right to re-enter the US).

We appreciate that this decision will be contentious, as would the decision to move the conference, because it raises the issue of how one should respond as an individual or an academic to the actions of governments that can be seen as violations of human rights and which have the effect of inhibiting open international dialog and which are therefore damaging to the whole scientific enterprise. Questions about whether one’s opposition should take the form of continuing to engage constructively, or of boycotting? Which is more likely to be effective in a particular case such as this?

These are questions for individuals, and we think it is unlikely that a consensus will be possible, even in a community such as this. We absolutely respect the right of individuals to respond to this situation as they see fit (e.g. by refusing to serve on the Program Committee, refusing to attend, etc.).

From a practical point of view, we will be exploring the possibility of providing remote access for any potential attendees who are unable to attend in person because of this ban.

Best wishes,

Nurit Melnik
Chair, HPSG Standing Committee



English notes

contentious: “causing or likely to cause an argument; controversial” (the definition returned by Google–I don’t know the source)

How it appeared in the post: We appreciate that this decision will be contentious, as would the decision to move the conference, because it raises the issue of how one should respond as an individual or an academic to the actions of governments that can be seen as violations of human rights and which have the effect of inhibiting open international dialog and which are therefore damaging to the whole scientific enterprise.


French notes

contesté ou controversé: potential translations for “contentious” (from WordReference.com)

Microchips and lexical semantics

When’s the last time you saw a dog shoot a bunch of kids at a grade school, or post a video of someone beheading someone else on Twitter, or vote for Trump?

I’m not necessarily that crazy about people, but I like animals.  (Except for man-eating rabbits–I hate man-eating rabbits.)  Seriously, when was the last time you saw a dog or a cat sell a teen-ager drugs, or kill a bunch of kids at a grade school (yes, this happened in the US), or vote for Trump (that happens in the US, too)?   Yes, my dog bit a couple people on the croupion when they walked into the house uninvited.  Yes, my cat once pooped in my favorite sandals.  But, rip off a tourist visiting from a foreign land?  Sell someone a counterfeit Beanie Baby on eBay?  Video someone beheading another living person in the name of God, and distribute it on Twitter? Only a human would do that.

Consequently, when I’m in the US, I carry a leash and a can of cat food in my car.  Dogs love cat food, and when I see an obvious runaway/lost dog trotting down the street, I pull over and offer him a whiff.  I can usually catch them, and I’ve gotten maybe 12 or 15 dogs back to their happy homes in the 20 years (almost) that I’ve been in my current town.

how_microchip_works
Picture source: https://goo.gl/EvXwQz

Something that makes this a hell of a lot easier is if people have had their animal microchipped.  In this context, a “microchip” is a little thing about the size of a grain of long-grain rice that a veterinarian injects under a dog or cat’s skin.  They don’t notice it in the least, as far as I can tell.  A veterinarian can wave a sort of wand over it, and it will send off a signal with an identifying number.  The vet sends the number to a company, the company sends back contact information from the owner, et voilà: Spot is home in time for dinner.  It’s quite wonderful, really.


This sign’s been around for a while.  I walk by it on my way to the train station after work.  The effort to get him back to his happy home will definitely be a lot easier than it would have been otherwise: Hector has been chipped.  Check out the poster, then scroll down, and let’s talk about how it’s interesting from a linguistic point of view.

img_0442

The linguistically cool thing is at the bottom: Hector est Pucé.  What that means: Hector has been chipped.  Now, we know that that’s going to increase the chances of Hector making his way home, but it’s cool from a linguistic point of view, too.  Recall from this blog post that French has a class of verbs that relate to undoing some noxious state of infestation–dératiser (to exterminate the rats in something), dénicotiniser (to remove the nicotine from something), and the like.  The interesting thing that we noted about these verbs is that they share an odd set of characteristics:

  1. They all have an -is– added on to the end.
  2. They all describe the reversal of a state of affairs that a human could create, but wouldn’t be expected to.
  3. None of them has a corresponding verb for creating that state of affairs.  That is, there is no ratiser, nicotiniser, etc. (or that is the claim, at any rate–read the other blog post if you don’t agree).

Now, puce, the word that is being used for a microchip here (it’s also the word for the chip on your credit card), comes from puce, a flea.  There is a verb épucer, to deflea, which clearly doesn’t fit the pattern of the verbs about which we just talked.  And, here’s an example of pucer!  Certainly the meaning here is to microchip, not to infest with fleas–but, it’s worth a second look and a quick blog post anyway, right?

I hope these folks have found their rouquin, their ginger (in the sense of red-haired).   I’d like to think that he’s found his way home.  If not: I hope he’s happily shacked up with some girl cat somewhere.  It would have to be a purely platonic relationship–in addition to being pucé, he’s also been neutered–but, a lifelong flirtation can be pretty exciting in and of itself.  The French are pretty damn good at that, too.

Want to be amused/horrified by the stupidity of the world?  Go to Google Images, do a search for microchips, and check out some of the “mark of the Beast” stuff that comes up.

,

Cautiously optimistic

img_0462
Graffiti that I saw on my way into a metro station this morning: “Neither Macron nor Le Pen means Le Pen.” Picture source: me.

A very good thing about France: the French don’t really do protest votes.  That’s not to say that we don’t have the ni-nis—those who say that they won’t vote ni for Macron, ni for Le Pen.  A ni-ni might abstain, or voter blanc–-submit a blank ballot.  But, it’s not exactly a common thing in France.  France has a two-round election process, with multiple candidates in the first round, and only the top two finishers in the second (except in the unlikely event where someone takes more than 50% in the first round.) People sometimes say that you vote the first round with your heart, and the second round with your head.

I would say that Americans vote 80% on emotion, and 20% on the basis of their takes on the candidates’ actual policies.  (I don’t except myself from that; “one’s take on something” explained in the English notes below.)  In contrast, I would guess that the French tend to vote 20% on emotion, and 80% on the basis of their takes on the candidates’ actual policies.  I know plenty of people who aren’t at all crazy about Macron’s proposals for the economy, but given a choice between someone about whom they’re not crazy and some Nazi sociopath, of course they’re going to vote for the guy about whom they’re not crazy.  The photo above–some graffiti that I saw as I walked into a metro station this morning–is representative of the opinion of everyone with whom I’ve talked: deciding not to vote for either of them is to vote for Le Pen.

The best lack all conviction, while the worst
Are full of passionate intensity.

–W. B. Yeats, The Second Coming, 1919

The worry of most of the people I know is that Macron is so far ahead of Le Pen in the polls that everyone will assume that he’s going to win, and too many people will decide that they don’t need to vote, and then the Le Pen voters all show up, and boum–Le Pen wins.

I was never really all that struck by Yeats’ poem The Second Coming when I was an English major in college.  We mostly contented ourselves with showing off our knowledge of what a gyre is, and moved on to Beowulf, or Salman Rushdie.  But, ever since Obama got elected and the Republican Party went insane over the sight of a black man in the Oval Office, The Second Coming has become more and more meaningful to me.  With Trump in office, it has gone past “meaningful” towards “frightening”–at the very least, foreboding.

The polls in France close in four hours.  We’ll see what happens.

The Second Coming

W. B. Yeats, 1919

Turning and turning in the widening gyre
The falcon cannot hear the falconer;
Things fall apart; the centre cannot hold;
Mere anarchy is loosed upon the world,
The blood-dimmed tide is loosed, and everywhere
The ceremony of innocence is drowned;
The best lack all conviction, while the worst
Are full of passionate intensity.

Surely some revelation is at hand;
Surely the Second Coming is at hand.
The Second Coming! Hardly are those words out
When a vast image out of Spiritus Mundi
Troubles my sight: somewhere in sands of the desert
A shape with lion body and the head of a man,
A gaze blank and pitiless as the sun,
Is moving its slow thighs, while all about it
Reel shadows of the indignant desert birds.
The darkness drops again; but now I know
That twenty centuries of stony sleep
Were vexed to nightmare by a rocking cradle,
And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?


English notes

One’s “take on” something is your opinion, or analysis of, it.  Note that this is entirely different from the verbal idiom to take something/someone on.  

  • I want to comment on Trump’s take on the Civil War and Andrew Jackson… but, seriously, it hurts me. READ A BOOK!  (Twitter) (Context: Trump recently said something about a former populist president, Andrew Jackson, that is consistent with either (a) Trump being an uneducated idiot who, in particular, doesn’t know anything about American history, or (b) Trump being a very bad person.)
  • Gr8, some sources just hav a screwed up set of priorities. Who cares about Trump’s take on med marijuana when the health care plan sucks?! (Twitter) (Context: the Republican-controlled House of Representatives just voted to repeal Obamacare and replace it with a disaster.)
  • Trump’s take on Andrew Jackson isn’t astonishing; what is astonishing is that this country elected an ignorant pussy-grabbing Richie Rich.  (Twitter)
  • My take on Trump is that he just wants to be liked by whoever is in front of him, which makes him inconsistent and unreliable.  (Twitter)
  • My take on Trump’s worse-than-worthless briefing to every senator on the North Korean problem.  (Twitter) (Context: here’s a link to the Tweeter’s article on Trump’s attempt to swing the Senate in his favor with respect to whatever crap he’s brewing concerning North Korea.)

How I used it in the post: I would say that Americans vote 80% on emotion, and 20% on the basis of their takes on the candidates’ actual policies.  

On reviewing: The summary

It takes a while to learn to review a paper. Here’s one approach, starting with…how to start your review.

You’d think that when people in my line of work–research–sit around the hotel bar at a conference swapping war stories, we’d mostly be complaining about the crappy state of research funding, pesky deans, and flying economy class–and we do.  But, what we complain to each other about the most is how much reviewing we have to do.  Peer review–the evaluation of articles by your fellow academics for suitability for publication–is a big part of being an academic.  One of the things that makes science an exciting thing to be doing right now is that it’s booming–the amount of productivity in the world of research right now is enormous.  (Booming explained in the English notes below.) In my field alone–biomedical language processing–the number of conferences has grown enormously since I got started in the field, and it shows no signs of slowing down.  The thing is: lots of research activity means lots of papers being produced, and lots of papers being produced means lots of papers to review.  Lots of papers.  Most academic conferences in any field take place either during the summer or in early January–the easiest times to travel without having to miss the classes that you’re usually teaching.  Consequently, there are a couple of periods during the year when you get slammed with a lot of reviewing requests all at once.  This is on top of the constant flow of journal articles, which can get submitted at any time, plus grant reviews, which come in thrice-a-year waves themselves.  It can get pretty overwhelming.

Reviewing is a big responsibility–a reviewer’s comments and recommendations about acceptance affect the progress of science, and the progress of people’s careers, too.  That makes it an opportunity to make a real contribution to your community.  There are some good things about the fact that you’re being asked to do it.  If you’re getting invited to review, it’s a sign that your peers hold your expertise in high enough esteem that they think it’s OK to entrust you with a job that is of some importance.  Reviewing is also part of how you stay on top of what’s hot and exciting in your field.  If you can keep that it mind as you stare at a pile of papers on a beautiful Sunday afternoon when you’d rather be sitting on the back porch with a beer and a trashy novel, it certainly helps.


e0a47bc09472f957ea2813b2acad1512
Picture source: phdcomics.com, https://goo.gl/2oGXfI

There are a lot of approaches to writing a review.  I don’t claim to have the perfect one, and the specifics of how I structure a review have certainly changed over the years.  However, there are a few such structures that clearly make sense, and that you can apply secure in the knowledge that they won’t leave the authors angry and frustrated or the editors that have to pass those reviews along to the authors feeling embarrassed, or worse.  Here’s one structure to think about.  It starts with an overview of the paper that you’re reviewing.

All quotes are from reviews of my own papers.  I was either the first author or the “senior author” (in my field, that means the person who directed the research, typically coming up with the idea and then supervising the design of the experiments and the writing of the article) of the work.

16198_10155276140165640_8482032632772318289_n
The amount of stuff available on the Internet about the pain of poorly-done reviews is not a bad indicator of … Picture source: https://goo.gl/VaUtks

A little overview of the paper at the beginning of your paper serves a couple purposes.  One is to reassure the author that you read the paper with attention.  This may sound obvious, but unfortunately, it’s not that uncommon to get a review back and wonder whether the reviewer really read it.  A research paper typically represents around a year’s work, and it’s Here’s a beautiful example of a summary of the paper at the beginning of a review:

This work presents a novel study of inter-annotator agreement when labelling semantic relations in compound nouns. The authors asked two annotators to annotate such relations in a subset of 101 Gene Ontology concepts according to two commonly used relation sets, namely the Generative Lexicon and the Rosario and Hearst sets, respectively with five and 38 relations. Cohen’s Kappa factor and F1-score are reported for both tasks, with a maximum of k = 0.774 and F1 = 0.90 in a relaxed evaluation of the Rosario and Hearst relation set.

What’s so nice about it?  Everything.  It summarizes:

  • What the paper is about (This work presents a novel study of inter-annotator agreement when labelling semantic relations in compound nouns),
  • …what was done, and with what data (The authors asked two annotators to annotate such relations in a subset of 101 Gene Ontology concepts according to two commonly used relation sets, namely the Generative Lexicon and the Rosario and Hearst sets, respectively with five and 38 relations),
  • …and what the authors found (Cohen’s Kappa factor and F1-score are reported for both tasks, with a maximum of k = 0.774 and F1 = 0.90 in a relaxed evaluation of the Rosario and Hearst relation set).

Here’s another one that was really nicely done.  The reviewer covered pretty much the same things:

The manuscript studied the ability of humans to label the semantic relations between the elements of noun compounds. Two annotators, one with a BS and the other one as a cardiovascular technologist did the annotations. The sample annotation terms were defined based on the GO. The test relations are the Generative Lexicon relations and the Rosario and Hearst relations. The F-measure and the Cohen’s Kappa value are used to measure the inter-annotator agreements. The results showed fairly high agreement even with very minimal guidelines and no real-training.

…which is to say:

  • what the paper is about (The manuscript studied the ability of humans to label the semantic relations between the elements of noun compounds),
  • …what was done, and with what data (Two annotators, one with a BS and the other one as a cardiovascular technologist did the annotations. The sample annotation terms were defined based on the GO. The test relations are the Generative Lexicon relations and the Rosario and Hearst relations. The F-measure and the Cohen’s Kappa value are used to measure the inter-annotator agreements),
  • …and what the authors found (The results showed fairly high agreement even with very minimal guidelines and no real-training).

This paper investigates on the assumption that inter-annotator agreement (IAA) can be used as an upper bound for NLP systems performance. The authors make a review of the literature to extract papers that support this assumptions and papers that instead have found opposite results, concluding that there are several works where NLP systems have demonstrated to outperform inter-annotator agreement. The authors also correlate IAA with the performance of the systems as reported on the papers, finding that in general there is a positive correlation among the two.

This very nice summary doesn’t talk about what was done, or to what data, but it goes much more than the preceding ones into what the authors found, and the reviewer’s assessment of whether or not, and why, that matters.

The manuscript titled “Translational morphosyntax: Distribution of negation in clinical records and biomedical journal articles” discusses differences in the use of negation between journal articles and clinical notes. Clinical notes are found to be much more explicit in their use of negation than journal articles, while journal articles use morphological negation significantly more often than clinical notes. The results have significant impact on mining clinical notes and combining information in clinical notes with background information found in literature.

This one takes the approach of the first summaries that we read–what the paper is about, what was done and with what data, and what was found:

 

The authors present a study on the distribution of negation (explicit at the syntactic/lexical level and morphological at the sub-word level) in two document types (clinical text and scientific journal articles). They investigate whether there are significant differences in the distribution of these two levels of negation between the two types of texts. Distributions are calculated from clinical progress notes from the MIMIC II corpus and the CRAFT corpus. The main findings are that explicit negations are more prevalent in clinical text, while morphological negation is more prevalent in scientific text.

Now, I must say: the preceding introductions are exceptionally well done.  The following is more typical for an introduction to a review–if it has one at all:

The authors compare incidence of two types of negations. They use notes on the status of patients in the Intensive Care Unit and compare these with scientific journal articles on mouse genomics.

Here’s the thing: the one that you just read is enough to make it clear that you read the article and bothered to figure out what it’s about.  Sounds pretty goddamn basic–but, unfortunately, it’s not.  Not having a summary at the beginning of a review that you’re writing really isn’t a problem if you write a well-justified review–but, if you do a shoddy job that leaves the authors wondering whether or not you read the paper with the appropriate level of care, it’s going to piss them off; if they complain to the editor, it’s going to piss off the editor, too, as well as embarrassing them for not having caught your crappy work; and you should feel guilty.  Putting a summary of the paper at the beginning of your review doesn’t just reassure the authors–it’s a good way for you to verify to yourself that you actually do have a good grasp of what’s going on in the paper.  One final note on this: if the paper is so badly written that you can’t actually tell what’s going on in it, it’s totally appropriate to say so, explicitly, and this is the point in the review where you should say it–in the introduction to your review.  Summarize what you can, and be explicit about what parts of the paper weren’t intelligible enough to summarize.

Since I started this piece with a description of complaining, I’ll close with an attempt at attitudinal adjustment.  Ashley ML Brown on her blog:

Reviewing the work of your peers should be pleasurable. Don’t laugh. I am serious. It should be a chance to see what others in your field are doing, a chance to read cutting edge research, and a chance to share your expertise (what good is knowledge if you don’t use it?)


English notes

booming: this word has at least two senses (meanings).  In the blog post, it shows up with Merriam-Webster‘s sense number 2: growing or expanding very quickly.  Here’s how I used it: .  One of the things that makes science an exciting thing to be doing right now is that it’s booming–the amount of productivity in the world of research right now is enormous.

There’s another common sense of this word, which Merriam-Webster gives as making a loud deep sound.  Their example his booming voice is totally natural. 

French notes

l’évaluation par les pairs: peer review.

On destiny

Of paper towels and 16th-century philosophers.

One of the things that makes French so fun to speak for anglophones is that many of the words that we’ve taken from French belong to a high register in English, but are everyday words in French. Case in point: the verb destiner.  In English, this is a high-register word that you probably wouldn’t use very often, meaning something like predetermined.  (Register is a technical term in linguistics that refers to something like the level of formality of usage.  In English, we basically have normal words, formal or academic words, and slang.  In francophone culture, it’s much more complicated–but, that’s a subject for another time.)  Here are the frequencies of destined (the only form that I know of for the word) and a few other words for comparison:

  • destined: 1.25 per million words
  • dog: 69 per million words
  • jump: 30 per million words

(This data is from the written section of the Open American National Corpus, a collection of 11 million words of written American English created by my colleague Nancy Ide at Vassar.  You can download it free here, if you’d like to see what a linguistic corpus looks like.)  Here are some pretty typical examples of how it’s used:

  • What more natural than that the White perception of a bird destined to become a plaything of the western world–as evidenced by another of its names, the lovebird – – should become paramount.
  • The French press gave prominence to President Jacques Chirac’s efforts to get the Russians to bring Milosevic back [to] the negotiating table, and an editorial in Monday’s Libération suggested this should be done by greatly reducing the area of Kosovo destined to become autonomous under the Rambouillet proposals.
  • The iris was more differentiated as evidenced by the fact that some of the cells destined to form the stroma had started to synthesize pigment and were, therefore, distinguishable from those of the future TM.

In contrast, in French the verb destiner means something like intended for or designed to be used as, and as far as I can tell, it’s a pretty everyday word.  Here are the frequencies of the French equivalents of the same English words that we looked at above:

  • destiner: 76 per million words (versus 1.25 per million words for the English word destined)
  • chien: 79 per million words (versus 69 per million words for the English word dog)
  • sauter: 43 per million words (30 per million words)

1.25 versus 76–that’s a pretty big difference.  It’s far more common in French, reflecting the fact that it’s a high-register word in English, but not in French.  (I got these frequencies from the Frantext corpus, a collection of 18th-20th-century French literature, which I picked because like the written section of the Open American National Corpus, it’s written language, and at 15.6 million words, it was the closest in size to it that I could find.  I searched both the Frantext corpus and the Open American National Corpus through the Sketch Engine web site, purveyor of fine linguistic data in many languages, and the tools for searching it.)

So: with destined being a high-register word in English, the sign that you see at the top of this post sounds pretty damn funny.  I ran into it in a bathroom the other day; it translates something like the toilets are routinely stuffed up by paper towels.  Please toss them in the trashcan that’s intended for them.  Americans are often attracted to the French language by way of Molière, or Rousseau, or Voltaire–but, ultimately, it’s just a hell of a lot of fun.

The title of this post is meant to be reminiscent of Michel de Montaigne, the 16th-century French essayist who is considered to be the father of all magazine writers.  Many of his essays have titles like On experience, On idleness (he was a fan), Of the arms of the Parthians, and the like.

Zipf’s Law and my walk to the lab

You know one of the consequences of Zipf’s Law, which describes one aspect of the statistical distribution of the lexicon of a language, namely that it’s a power law (a few words are very common, but most words occur only very rarely): if you’re learning a second language, it’s likely that there will never be a day of your life when you don’t come across words that you don’t know.  I took a different route up the hill to the lab today, which meant that I passed by a lot of houses, rather than walking through the woods.  With the winter at an end, there’s lots of work starting, leading me to run into a lot of large and small construction projects–and all of these new words for me.

img_0131.jpg

img_0132.jpg