Naming of parts: the illustrated version

Japonica
Glistens like coral in all of the neighboring gardens,
And to-day we have naming of parts.

basic_rifle_parts
Picture source: https://goo.gl/b9U0dY

It’s National Poetry Month, and that means Henry Reed’s achingly beautiful and super-funny Naming of parts.  Getting the humor might require having spent some time in the military, which I did; getting the vocabulary certainly does, as it’s full of technical terms for rifle-parts.  I originally found the version that I give here, with its nice links to some of the difficult vocabularyon the Sole Arabia Tree web site.  For this year, I’ve added some additional vocabulary notes here.  Go to the Sole Arabia Tree page for a recording of Henry Reed reading the poem.

swivel
Picture source: https://goo.gl/YpZJPA

swivel: “a device joining two parts so that one or both can pivot freely”(Merriam-Webster) . The poem mentions several kinds of swivels on the British-Army-issue rifle of World War II: the upper sling swivel, the lower sling swivel, and the piling swivel.

sling: “a device (as a rope or chain) by which something is lifted or carried” (Merriam-Webster).  See the picture of a rifle above.

easily: the adverbial form of easy.  It never appears in the poem–I add it here for the benefit of the non-native speakers whose English is good enough to be puzzled by these lines in the poem:

You can do it quite easy

If you have any strength in your thumb.

Yes, that sounds weird, and you should say You can do it quite easily if you have any strength in your thumb.  Does Reed use it here to imply something about the level of education of the drill instructor?  Is it a dialectal variant in the United Kingdom?  Was it current at the time that he wrote the poem, published in 1942?  I have no clue.  I do, however, find quite striking the parallel that Lieutenant Colonel Edward Ledford (US Army) draws between the drill instructor’s deadpan “which in your case you have not got,” sometimes interpreted as prefiguring how slaughtered these kids were going to be later, part because of shortages of equipment, and notorious my-kids-won’t-go-to-war-but-let’s-send-yours Donald Rumsfeld’s dismissal of the concerns of actual American soldiers at the beginning of Bush’s Iraq War:

The scene is in Kuwait. The setting is a less and less endearing and more and more trite town-hall meeting. Soldiers are gathered around. They will move north into Iraq the next day. The soldiers, we soon discover, apparently aren’t feeling real dulce-et-decorum-est-pro-patri-mori.

Playing the role of leader, Donald Rumsfeld places himself among them. He opens the floor to questions and comments. Specialist Thomas Wilson raises his hand. He is called upon.

Wilson: A lot of us are getting ready to move north relatively soon. Our vehicles are not armored. We’re digging pieces of rusted scrap metal and compromised ballistic glass that’s already been shot up.. picking the best out of this scrap to put on our vehicles to take into combat.

Rumsfeld [in a scientific, theoretical, detached tone]: As you know, you go to war with the Army you have. They’re not the Army you might want or wish to have at a later time. [brightening, as if realizing something] If you think about it, you can have all the armor in the world on a tank and a tank can be blown up.

A female Soldier asks a next question, but the audience cannot hear it

Rumsfeld: It is something you prefer not to have to use, obviously, in a perfect world. It’s been used as little as possible.

Lieutenant Colonel Ledford continues his critique of Rumsfeld’s dismissive (and later seen to be deadly, both for us and for Iraqi civilians) words by rewriting them in the style of Naming of parts:

As you know, you go
to war

with the Army you have.

They’re not the Army
you might want

or wish to have
at a later time.

If you think
about it,
you can have
all the armor
in the world
on a tank
and a tank
can be blown
up.

It is something
you prefer not to have to use,
obviously,
in
a perfect world.

It’s been used

as little as possible.

For the rest of Lieutenant Colonel Ledford’s thoughts on the poem, see this web page.


LESSONS OF THE WAR

To Alan Michell

Vixi duellis nuper idoneus
Et militavi non sine gloria

I. NAMING OF PARTS

To-day we have naming of parts. Yesterday,
We had daily cleaning. And to-morrow morning,
We shall have what to do after firing. But to-day,
To-day we have naming of parts. Japonica
Glistens like coral in all of the neighboring gardens,
And to-day we have naming of parts.

This is the lower sling swivel. And this
Is the upper sling swivel, whose use you will see,
When you are given your slings. And this is the piling swivel,
Which in your case you have not got. The branches
Hold in the gardens their silent, eloquent gestures,
Which in our case we have not got.

This is the safety-catch, which is always released
With an easy flick of the thumb. And please do not let me
See anyone using his finger. You can do it quite easy
If you have any strength in your thumb. The blossoms
Are fragile and motionless, never letting anyone see
Any of them using their finger.

And this you can see is the bolt. The purpose of this
Is to open the breech, as you see. We can slide it
Rapidly backwards and forwards: we call this
Easing the spring. And rapidly backwards and forwards
The early bees are assaulting and fumbling the flowers:
They call it easing the Spring.

They call it easing the Spring: it is perfectly easy
If you have any strength in your thumb: like the bolt,
And the breech, and the cocking-piece, and the point of balance,
Which in our case we have not got; and the almond-blossom
Silent in all of the gardens and the bees going backwards and forwards,
For to-day we have naming of parts.

Guillaume Apollinaire: Exercice

Guillaume Apollinaire is another one of those folks who shows that you can be both a poet, and a very serious ass-kicker. 

2014-07-04 19.10.09
Street sign in the Saint-Germain-des-Prés neighborhood.

Guillaume Apollinaire is another one of those folks who shows that you can be both a poet–and a very serious ass-kicker.  Apollinaire tried to join the French army in Paris at the beginning of the First World War, but was turned down–because he wasn’t a French citizen.  (Polish, actually.)  Undaunted, he travelled south, tried again, and this time got in.  He was initially assigned to the artillery, but that wasn’t hard-core enough for him, and he asked for–and received–a transfer to the infantry.  He suffered a head wound in 1916, never really recovered from it, and in his weakened condition, died in the influenza epidemic of 1918.  Here is one of his poems, Exercice.

Exercice

Vers un village d l’arrière
S’en allaient quatre bombardiers
Ils étaient couvert de poussière
Depuis la tête jusqu’aux pieds

Ils regardaient la vaste plaine
En parlant entre eux du passé
Et ne se retournaient qu’à peine
Quand un obus avait toussé

Tous quatre de la classe seize
Parlaient d’antan non d’avenir
Ainsi se prolongeait l’ascèse
Qui les exerçait à mourir


French notes

In the last two lines, note the inversion: not L’ascèse qui les exerçait à mourir se prolongeait ainsi, but Ainsi se prolongeait l’ascèse qui les exerçait à mourir.  If you’d like to read an analysis of the various and sundry kinds of inversion that ainsi can trigger, as well as some quantitative data on ainsi-triggered inversion in Le Monde, see Lena Karssenberg and Karen Lahousse’s paper on the topic.

•    la poussière: dust.
•    la plaine: plain.
•    se retourner: (tourner la tête) turn around, do a double take; (changer de sens, de position) turn over, toss and turn; (se mettre à l’envers) turn over, overturn
•    la peine: punishment, sorrow, trouble—but, that’s not what it means here—see the next entry.
•    à peine: scarcely, hardly
•    un obus: shell (artillery).
•    tousser: to cough
•    d’antan: of yesteryear, of long ago
•    se prolonger: continue; perpetuate itself; persist; linger; go on; be continued; be extended
•    ascèse: This word is a tough one.  It’s not in any of my French-English dictionaries.  In Anne Greet’s translation (see below), it’s rendered as “ascesis.”  I found it in a monolingual (French-French) dictionary; the definition seemed to be something like asceticism.
•    exercer: to train, exercise, practice

What should we make of the past imperfect tense that is used throughout the poem?
Greet’s notes suggest that it produces a detachment between the poet and the four men: “The poet…is not part of the graphic little scene he is painting.  The verbs, in third person and imperfect tense, indicate that he is an omniscient observer.  This role produces a…fine balance in the poem between compassion and detachment.”

Towards a village in the rear
Marched four bombardiers
And they were covered with dirt
From head to foot

They stared at the vast plain
As they talked about the past
And they barely looked around
When a shell made a coughing sound

All four of class sixteen
Spoke of the past not future time
Thus the ascesis dragged on
That practiced them in dying

Translated by Anne Hyde Greet

You like Apollinaire, but like me, have trouble with the French?  I like Anne Hyde Greet’s translation of Calligrammes quite a bit.

Rest In Peace, Jacques Higelin

One of my friends once said this to me: “When I walked out of the room after finishing my bac [the French high school exit exam–your score on it determines a lot of the future course of your life], I said to myself: if I’d spent as much of the last four years studying as I did memorizing Higelin, I’d be going to a much better university.” 

One of my friends once said this to me: “When I walked out of the room after finishing my bac [the French high school exit exam–your score on it determines a lot about the future course of your life], I said to myself: if I’d spent as much of the last four years studying as I did memorizing Higelin, I’d be going to a much better university.”  Jacques Higelin died yesterday.  Go to his anglophone Wikipedia page and you’ll find a few short paragraphs–go to his francophone page, and it goes on for screen, after screen, after screen.  Here’s the most appropriate song of his that I could think of during this National Poetry Month–scroll down past the video for the lyrics.

J’suis mort qui qui dit mieux
Ben mon pauv’vieux, voilà aut’chose
J’suis mort qui, qui dit mieux
Mort le venin, coupée la rose
J’ai perdu mon âme en chemin
Qui qui la r’trouve s’la mette aux choses
J’ai perdu mon âme en chemin

Qui qui la r’trouve la jette aux chiens

J’m’avais collé avec une fumelle
Ben alors ça c’est la plus belle
J’m’avais collé avec une fumelle
L’jour où j’ai brûlé mes sabots
J’lui avais flanqué un marmot
Maint’nant qu’son père est plus d’ce monde
L’a poussé ce p’tit crève la faim
Faut qu’ma veuve lui cherche un parrain.

Elle lui en avait d’jà trouvé un
Eh j’ai pas les yeux dans ma poche

Elle lui en avait d’jà trouvé un
Dame faut prévoir, en cas d’besoin
C’est lui qui flanquera des taloches
A mon p’tiot pour qu’il s’tienne bien droit
C’est du joli, moi j’trouve ça moche
De cogner sur un plus p’tit qu’soi.

Cela dit dans c’putain d’cimetière
J’ai perdu mon humeur morose
Jamais plus personne ne vient
M’emmerder quand je me repose
A faire l’amour avec la terre
J’ai enfanté des p’tits vers blancs
Qui me nettoient, qui me digèrent
Qui font leur nid au creux d’mes dents.

Arrétez-moi si je déconne
Arrétez-moi ou passez m’voir
Sans violettes, sans pleurs ni couronnes
Venez perdre un moment d’cafard
J’vous f’rais visiter des cousins
Morts à la guerre ou morts de rien
Esprit qui vous cligne de l’oeil
Les bras tendus hors du cercueil

Aujourd’hui je vous sens bien lasse
Ne soyez plus intimidée
A mes côtés reste une place
Ne tient qu’à vous de l’occuper
Qu’est c’que tu as ? oui, le temps passe
Et le p’tit va rentrer de l’école
Dis lui q’son père a pas eu d’bol
‘L a raté l’train, c’était l’dernier

Attend un peu, ma femme, ma mie
Y’a un message pour le garçon
J’ai plus ma tête, voilà qu’j’oublie
Où j’ai niché l’accordéon
P’t’être à la cave, p’t’être au grenier
Je n’aurais repos pour qu’il apprenne
mais il est tard, sauve toi je t’aime
Riez pas du pauv’macchabé

Ceux qui ont jamais croqué d’la veuve
Les bordés d’nouilles, les tir à blanc
Qu’ont pas gagné une mort toute neuve
A la tombola des mutants
Peuvent pas savoir ce qui gigote
dans les trous du défunt cerveau
Quand sa moitié dépose une botte de rose
Sur l’chardon du terreau
Quand sa moitié dépose une botte de rose
Sur l’chardon du terreau


French notes

Je suis mort, qui qui dit mieux : This is a complicated line, combining an expression fron childish language (qui qui) with qui dit mieux, which is how an auctioneer tries to raise an amount that’s been bid.  An explanation from a friend:

“Qui dit mieux” est l’expression du commissaire priseur, mais pas “qui qui dit mieux”.   
En français commencer une phrase par “qui qui (veut des pâtes ?, chante si fort ?…etc…), est une formulation enfantine ou illettrée pour dire “Qui est-ce qui ? “.  La complexité de la construction grammaticale de ce bout de phrase non visible mentalement dans sa version orale, fait que jeunes élèves et adultes fâchés avec la belle langue le réduisent à “C’est qui qui + verbe” ou “Qui, qui + verbe”.

 

 

À Grenelle

Suddenly Bruant’s poem made sense.

For the 3rd day of National Poetry Month, here’s a slice of life from Aristide Bruant.

IMG_3629
My little bookstore.

I live in the most boring arrondissement of Paris.  The 15th typically doesn’t even show up in tourist guide-books–it’s the biggest arrondissement in the city, but it’s just a residential neighborhood, plain and simple.  (Plain and simple is an adverb, not an adjective–see the English notes below.)  The pearl of my little corner of the 15th is a small used bookstore on the boulevard Grenelle.  The floor is almost completely covered with stacks of books, to the point that if the owner ever has a heart attack in there, they will have to empty the store to get the stretcher inside–it’s adorable, and if the owner sees something that he thinks I’ll like, he puts it aside for me.

 

bruant_aristide
Aristide Bruant. He’s usually pictured wearing a wide-brimmed hat, but you can find such depictions anywhere… Source: http://www.dutempsdescerisesauxfeuillesmortes.net, https://goo.gl/rcFw9e

And yet: this being Paris, there are centuries of history everywhere around me.  An afternoon’s walk often takes me through the side streets to the west of the École militaire, the military academy that was meant to increase the size of the French officer corps by making it possible for the sons of non-aristocrats to get into it.  (Napoleon learned his craft there.)  Amongst those streets was the red-light district of this very military neighborhood, and the poet Aristide Bruant immortalized it in À Grenelle.

Much of this poem puzzled the shit out of me (see the English notes below for what that means) until the day that I walked into my little bookstore and the owner showed me something that he was saving for me.  Called Les mots et la chose, the trame (premise?) of Jean-Claude Carrière’s epistolary novel is that a retired lexicographer gets a letter in the mail from a struggling actress who pays her bills by dubbing pornographic films into French.  She’s tired of the limited vocabulary that she’s asked to use, and she requests that the lexicographer suggest some alternatives.  (Note the subjunctive: that the lexicographer suggest, not suggests.)  The rest of the book is his responses, with separate chapters for penises, breasts, la chose itself, etc.

IMG_0495Suddenly Bruant’s poem made sense.  Faire sentinelle: to stand guard, but also to have an erection.  La chapelle: chapel, but also vagina.  Other plays on words are more obvious, at least to a veteran (which I am, but Trump isn’t, having been excused from Vietnam due to a sore foot, although apparently said foot did not deter him from being an enthusiastic athlete).  Montaient à l’assaut de mes mamelons: the word le mamelon is a nipple or a small hill, and lemme tell ya, assaulting a hill is a highly technical undertaking–higher ground gives the defender a major advantage, and assaulting hills is the kind of thing that you really have to practice.  I was also impressed by the technical accuracy of this verse: …des lanciers, // Des dragons et des cuirassiers // Qui me montraient à me tenir en selle… Specifically, the fact that these soldiers who are teaching her “to stay in the saddle” (do French men all share the universally-held American man’s wish to “die in the saddle”?) are all mounted (i.e. on horseback) troops of one sort or another: lanciers and cuirassiers were cavalry troops, and dragons were “mounted infantry,” meaning that they travelled on horseback, but dismounted to fight.

There’s cool stuff in the poem for grammarians, as well–most notably, this line: J’en ai-t-y connu des lanciers…  Us anglophones struggle with both and en, and finding both of the together and with an inversion…well, good luck finding anything that complicated ever again, and if you do, please tell us about it in the comments…

738_yvette_guilbert
Yvette Guilbert in 1885. I HATE the Toulouse-Lautrec paintings of her, but looking at this photo, you can see why he did what he did with her… Source: franceculture.fr, https://goo.gl/RknwY1

Bruant’s poem was eventually recorded by Yvette Guilbert, and more recently by Patachou.  I hum it in my head whenever my train passes by the Chaussée d’Antin metro station, for reasons that will become clear when you get to the last verse.


À Grenelle

Aristide Bruant

Quand je vois des filles de dix-sept ans,
Ça me fait penser qu’y a bien longtemps
Moi aussi, je l’ai été, pucelle,
A Grenelle!Mais c’est un quartier plein de soldats,
On en rencontre à tous les pas,
Jour et nuit, ‘font sentinelles,
A Grenelle!J’en ai-t-y connu des lanciers,
Des dragons et des cuirassiers
Qui me montraient à me tenir en selle
A Grenelle!Fantassins, officiers, colons,
Montaient à l’assaut de mes mamelons!
Ils me prenaient pour une citadelle!
A Grenelle!

Moi, je les prenais tous pour amants,
Je commandais tous les régiments,
On m’appelait “Mâme la Colonelle”,
A Grenelle!

Mais ça me rapportait que de l’honneur,
Car si l’amour, ça fait le bonheur,
On fait pas fortune avec elle,
A Grenelle!

Bientôt je m’aperçus que mes beaux yeux
Sonnaient l’extinction des feux,
On se mirait plus dans ma prunelle
A Grenelle!

Mes bras, mes jambes, mes appâts,
Tout ça foutait le camp à grands pas,
J’osais plus faire la petite chapelle
A Grenelle!

Aujourd’hui que j’ai plus de position,
Les régiments me font une pension:
On me laisse manger à la gamelle,
A Grenelle!

Ça prouve que quand on est putain,
Faut s’établir Chaussée d’Antin,
Au lieu de se faire une clientèle
A Grenelle!

Scroll down for the English notes.

IMG_5803
The Chaussée d’Antin stop on line 7. Picture source: me.

English notes

plain and simpleClearly; without any complexity (Wiktionary).  Plain and simple is what linguists call a sentential or sentence-level adverb.  It describes the speaker’s attitude towards the assertion being made by the rest of the sentence: in this case, that the assertion is indisputably true.  Plain and simple is unusual in that most sentential adverbs come at the beginning of the sentence (Luckily, we didn’t miss the train); in contrast, plain and simple usually comes at the end of the sentence.  Some examples from the enTenTen13 corpus at the Sketch Engine web site, purveyor of fine linguistic corpora and the tools for searching them:

  • It doesn’t work, plain and simple.
  • Those things are just evil, plain and simple.
  • A mood disorder is an illness, plain and simple.
  • Seriously addressing the long-term fiscal problem means restraining entitlement spending growth, plain and simple.
  • That is the reason for the obesity epidemic, plain and simple.

to verb the shit out ofa delightful English adverb (well, maybe American–I don’t actually know much about British English) that intensifies the action of the verb.

  • Found : At most gay bars, probably confusing the shit out of everyone.
  • When and if it does happen it won’t freak the shit out of you…
  • The group is preparing to shock the shit out of tourists.
  • If there is one thing ATLA is overflowing with, it’s ladies absolutely walloping the shit out of everyone.
  • There’s not a critic in the world who could say anything to me, because I kick the shit out of myself way worse than anybody ever could.
  • What happened here was the jury didn’t like the victim, and so the wrong-doer got a walk, and frankly that should scare the shit out of you.
  • If you want this to be a legitimate sport, start running it like one and stop embarassing the shit out of everyone who has supported your organization since the get go.

Note that the modified verb is usually one with a negative sense–to confuse, to beat, to shock, to wallop (to hit very hard), to scare, to embarrass.  (Yes, it’s spelled wrong in the example above.)  But, it doesn’t have to be a negative verb; using it with a positive one is odd, though, and that gives a certain flavor to such uses.

  • I plan to enjoy the shit out of it.
  • I’d buy the shit out of those tickets.
  • Then go find your Peter Brand and hire the shit out of him before someone else does.
  • Choir! – but you have, right? – they are everyday people who get together on Tuesdays or Wednesdays to sing the shit out of something, usually a popular song from the last 30 years or so.
  • I want to marry the shit out of you and then I want to put a baby inside you as soon as you’ll let me.

I tried to think of a different way to say this… Variability in biomedical languages

If ambiguity is the major problem in natural language processing, variability is the second.

This post is a draft of part of a piece that I’m writing at the moment, and on which I would like your feedback.  The topic is variability in language.  I pay the rent by researching the issues involved in getting computers to understand biomedical language–for example, the language of scientific journal articles, or the language of health records.  I’m in the midst of writing a chapter about this topic for a handbook of computational linguistics.  The audience is people who are interested in computational linguistics, but don’t have any experience with the biomedical domain.  If you’re a reader of this blog, that’s probably not a bad description of you.  So, it would be super-helpful to me to have your critique of this material.  I’m looking for anything that isn’t clear, anything that makes it difficult to understand my prose–anything that you think could be improved.  My grandmother will tell me how wonderful it is, so just feel free to plow into me with both fists–seriously, you’d be surprised at how much pain you can take in your old age, and I’m getting pretty old.  


Variability is the property of being able to express the same proposition in multiple ways.  If ambiguity is the major problem of natural language processing, variability is the second.  From a theoretical perspective, the field of sociolinguistics sees the study of variation in language as the central problem of linguistics, and it makes a strong case for that claim (e.g. Labov 2004)[1].  From a practical perspective in natural language processing, the high degree of variability in natural language prevents us from ever being able to use a dictionary-like data structure (such as hash tables, B-trees, or tries) to accomplish our tasks: we will never have a “dictionary” of all possible sentences (Chomsky 1959)[2].  This kind of approach would be fast and efficient—if only it were possible (Gusfield 1997)[3].

Sources of variability

 Some of the sources of variability in language are well-known even to the casual reader—for example, synonymy, or the availability of multiple words that have the same dictionary meaning.  A kind of synonymy that is especially relevant in biomedical languages occurs when there is both a technical and a lay or common term for something, such as the lay term heart attack and the technical term myocardial infarction.  Using technical terminology is important for the precision of scientific writing and of medical records (Rey 1979)[4].  However, the use of technical terminology also can make it difficult for patients and their families to learn about their illness or to understand their own health records (Kandula et al. 2010)[5].  One way to deal with this problem is to use natural language processing techniques to replace technical terms with their lay synonyms (Elhadad 2006[6], Elhadad and Sutaria 2007,[7] Deléger and Zweigenbaum 2009[8], Leroy et al. 2013a[9], Leroy et al. 2013b[10]) or their definitions (Elhadad 2006)[11] in order to make clinical documents or scientific journal articles accessible to non-professionals.  Doing this computationally, rather than manually, allows it to be done at enormous scales, or on demand.  This is a good example of why to do natural language processing in the biomedical domain: the possibility of doing real good in the world.

funkformsParaphrase is the phenomenon of different (and typically syntactically different) expressions in language of the same meaning (Ganitkevitch et al. 2013)[12].   Where synonymy operates of the level of words, paraphrase operates at the level of the phrase, or group of words.  Paraphrasing is a source of variability that is especially interesting in the biomedical domain because of how it interacts with the technical vocabulary of the field (Deléger and Zweigenbaum 2008, Deléger 2009, Deléger and Zweigenbaum 2010, Grabar and Hamon 2014)[13],[14],[15],[16]. Funk et al. looked for possibilities to paraphrase or replace synonyms in 41,853 terms from the Gene Ontology, and found that 27,610 out of 41,852 were paraphrasable, or had synonyms, or both[17].  This indicates that the possibilities for variant forms of the same thing occurring in the biomedical literature are tremendous.

argparaphrasesBut, do those tremendous numbers of variants really occur?  It appears that they do.  Cohen et al. (2008) looked at the incidence of alternative syntactic constructions involving common nominalizations (nouns derived from verbs, such as treatment from to treat) in scientific journal articles—for example, drug treatment of cancer and cancer treatment with drugs.  Figure 1 shows a typical finding: for some nominalizations, as many as 15 out of 16 possible variants could be found even in a relatively small corpus[18].

conceptlengthsHow different can these paraphrases be from each other?  Technical terms in biomedical research can be quite long, which means that there can be multiple candidates for paraphrasing and for replacement of synonyms (see above).  This means that the number of possible paraphrases of a long term can be explosive.  Those paraphrases, even for a short term, can be quite different—for example, Cohen et al. (2017) examined the relationship between the length of terms in the Gene Ontology and the length of appearances of those terms in the CRAFT corpus of biomedical journal articles, and found that 2-word terms could show up with paraphrases as long as 15 words[19].  The high incidence of just these two forms of variability in language—synonymy and paraphrasing—as well as the large differences that can be seen in forms with the same meanings illustrate just how much of an issue variability is for natural language processing in general, and in biomedical texts in particular.


Harsh critiques in the Comments section below, please!

[1] Labov, William. “Quantitative reasoning in linguistics.” Sociolinguistics/Soziolinguistik: An international handbook of the science of language and society 1 (2004): 6-22.

[2] Chomsky, Noam. “A review of BF Skinner’s Verbal Behavior.” Language 35, no. 1 (1959): 26-58.

[3] Gusfield, Dan. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press, 1997.

[4] Rey, Alain. La terminologie: noms et notions. No. 1780. Presses Univ. de France, 1979, p. 56.

[5] Kandula, Sasikiran, Dorothy Curtis, and Qing Zeng-Treitler. “A semantic and syntactic text simplification tool for health content.” In AMIA annual symposium proceedings, vol. 2010, p. 366. American Medical Informatics Association, 2010.

[6] Elhadad, Noemie. “Comprehending technical texts: Predicting and defining unfamiliar terms.” In AMIA annual symposium proceedings, vol. 2006, p. 239. American Medical Informatics Association, 2006.

[7] Elhadad, Noemie, and Komal Sutaria. “Mining a lexicon of technical terms and lay equivalents.” In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 49-56. Association for Computational Linguistics, 2007.

[8] Deléger, Louise, and Pierre Zweigenbaum. “Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora.” In Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora, pp. 2-10. Association for Computational Linguistics, 2009.

[9] Leroy, Gondy, David Kauchak, and Obay Mouradi. “A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty.” International journal of medical informatics 82, no. 8 (2013): 717-730.

[10] Leroy, Gondy, James E. Endicott, David Kauchak, Obay Mouradi, and Melissa Just. “User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention.” Journal of medical Internet research 15, no. 7 (2013).

[11] Elhadad, Noemie. “Comprehending technical texts: Predicting and defining unfamiliar terms.” In AMIA annual symposium proceedings, vol. 2006, p. 239. American Medical Informatics Association, 2006.

[12] Ganitkevitch, Juri, Benjamin Van Durme, and Chris Callison-Burch. “PPDB: The paraphrase database.” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.

[13] Deléger, Louise, and Pierre Zweigenbaum. “Paraphrase acquisition from comparable medical corpora of specialized and lay texts.” AMIA Annual Symposium Proceedings. Vol. 2008. American Medical Informatics Association, 2008.

[14] Deléger, Louise. Exploitation de corpus parallèles et comparables pour la détection de correspondances lexicales: application au domaine médical. Diss. Paris 6, 2009.

[15] Deléger, Louise, and Pierre Zweigenbaum. “Identifying Paraphrases between Technical and Lay Corpora.” LREC. 2010.

[16] Grabar, Natalia, and Thierry Hamon. “Unsupervised method for the acquisition of general language paraphrases for medical compounds.” Proceedings of the 4th International Workshop on Computational Terminology (Computerm). 2014.

[17] Funk, Christopher S., K. Bretonnel Cohen, Lawrence E. Hunter, and Karin M. Verspoor. “Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.” Journal of biomedical semantics 7, no. 1 (2016): 52.

[18] Cohen, K. Bretonnel, Martha Palmer, and Lawrence Hunter. “Nominalization and alternations in biomedical language.” PloS one 3.9 (2008): e3158.

[19] Cohen, K. B., Verspoor, K., Fort, K., Funk, C., Bada, M., Palmer, M., & Hunter, L. E. (2017). The colorado richly annotated full text (craft) corpus: Multi-model annotation in the biomedical domain. In Handbook of Linguistic Annotation (pp. 1379-1394). Springer, Dordrecht.


Harsh critiques in the Comments section below, please!

What makes something interesting? The biomedical language version

What makes any domain an interesting one from the perspective of computational linguistics?

I pay the rent by researching the issues involved in getting computers to understand biomedical language–for example, the language of scientific journal articles, or the language of health records.  I’m in the midst of writing a chapter about this topic for a handbook of computational linguistics.  The audience is people who are interested in computational linguistics, but don’t have any experience with the biomedical domain.  If you’re a reader of this blog, that’s probably not a bad description of you.  So, it would be super-helpful to me to have your critique of my introduction.  I’m looking for anything that isn’t clear, anything that makes it difficult to understand my prose–anything that you think could be improved.  My grandmother will tell me how wonderful it is, so just feel free to plow into me with both fists–seriously, you’d be surprised at how much pain you can take in your old age.  


What makes the biomedical domain an interesting one from the perspective of computational linguistics?  Indeed, what makes any domain an interesting one from the perspective of computational linguistics?  In fact, Roger Shuy has asserted that the notion of any specific kind of data defining a particular area of linguistics is unsupportable.  As he puts it: “There is little reason for the data on which a linguist works to have the right to name that work” (Shuy 2002)[1].

Shuy’s statement is surprising, since he himself is North America’s leading forensic linguist—a linguist whose career has been defined entirely by his excellent work on language as it appears in the legal system.  And, indeed, many computational linguists describe themselves as doing biomedical natural language processing[2].

So, why study computational linguistics in the biomedical domain?  One can identify at least three primary types of reasons: theoretical, practical, and use-case-oriented.

Theoretical aspects of biomedical language

Biomedical languages are of interest to computational linguistics for two reasons: their relevance to questions about the nature and limits of grammar, and the light that they can shed on issues of reproducibility in natural language processing.

Biomedical languages and grammaticality

Biomedical languages are of interest from the perspective of computational linguistics in part because they stretch the limits of what can possibly be grammatical in a natural language.  Since the second half of the 20th century, much of linguistic argumentation has focused around grammaticality, which at a first approximation we can define as the question of whether or not an utterance is within the boundaries of some language, or not (Partee et al. 2012).  Early in the second half of the 20th century, utterances that came under discussion in linguistic debates tended to be either quite ordinary (such as the famous John loves Mary (Fowle 1850)[3]), or interestingly ambiguous—sentences like John loves his wife, and so does Tom (Duží 2012[4]) whose grammaticality (as opposed to their interpretations) was mostly not in question.  Although the discourse of that period of linguistic inquiry—particularly with respect to the development of syntactic theory—was often couched in terms of defining—and constraining—some set of sentences (“strings”), in practice it tended to be more about operations on (and to a much lesser extent, interpretation of) those strings.

This changed in the 1970s and1980s with the emergence of a research community that explored sublanguages: language associated with a particular genre and a particular kind of interlocutor[5].  Harris (1976) laid out a number of the principles of the sublanguage approach: semantics was embraced, not pushed off to some later date[6].  Although not always formalized as such, lexical preferences and statistical tendencies were taken advantage of (unusual in the era of a linguistics that had a complicated relationship with the lexicon and famously open disdain for statistics (Harris 1995)[7]). As Grishman (2001) explains, these were interesting for at least two reasons: they seemed amenable to syntactic description by reducing complex syntactic structures into simpler ones, reminiscent to the transformational analyses that were becoming dominant in linguistics, and they held the promise of mapping to a tractable model of the world, or semantics[8]—something that had largely eluded linguistics up to that point[9].

The biomedical domain seemed like a fruitful area of research to the early investigators of the topic, and it was.  Scientific journal articles were one such genre, with the interlocutors being researchers; clinical documents provided another, with the interlocutors being physicians.  Harris et al. (1989) provided an in-depth description of the language of scientific publications about immunology [10].  It set a standard for sublanguage research on biomedical languages that would remain unparalleled for years.  The usefulness of the sublanguage model can be seen in the fact that researchers continue to find it fruitful (some prominent examples in the biomedical domain are reviewed in Demner-Fushman et al. 2009)[11].  Some examples that illustrate particularly well the use of the sublanguage model for semantic representation include Dolbey (2009) in the molecular biology domain[12] and Deléger et al. (2017), which also includes a review of the basic issues and of other approaches to resolving them[13].  Clinical sublanguages soon turned out to be full of data that was ungrammatical on any standard treatment of syntax (see Table 1 for some examples), making it clear that they were good areas for investigating the limits of grammaticality at a time when grammaticality was generally considered a binary characteristic of language with strict semantic constraints .

Chest shows evidence of metastatic disease.
Examination shows the same findings.
x-rays of spine showed extreme arthritic change.
Urinalysis shows 1% proteinuria.
Brain scan shows midline lesion.

Table 1: Examples of ungrammatical sentences from radiology reports.  In English, the verb to show is usually thought of as requiring a sentient subject.  In these sentences, we see a wide range of non-sentient subjects: an anatomical organ (chest), an event (examination), x-ray films (x-rays of spine), a laboratory test (urinalysis), and the output of a computed tomography exam (brain scan).  All of the sentences have “generic” noun phrases where they would normally require an article or demonstrative (chest, examination, x-rays of spine, and brain scan).  Source: Hirschman (1986)[14].  No human subjects approval or HIPAA training is required for use of these examples.

[1] Shuy, Roger. Linguistic battles in trademark disputes. Springer, 2002.

[2] The Association for Computational Linguistics Special Interest Group on Biomedical Natural Language Processing has over 100 members at the time of writing.

[3] Fowle, William B. (1850) “English Grammar: Goold Brown.” Common School Journal, pp. 245-249.

[4] Duží, Marie (2012) “Extensional logic of hyperintentions.”  In Düsterhöft, Antje, Meike Klettke, and Klaus-Dieter Schewe, eds. Conceptual Modelling and Its Theoretical Foundations: Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday. Vol. 7260. Springer Science & Business Media, 2012.

[5] See Chapter 18, Sublanguages and controlled languages, this volume.

[6] Harris, Zellig. “On a theory of language.” The Journal of Philosophy 73.10 (1976): 253-276.

[7] Harris, Randy Allen. The linguistics wars. Oxford University Press, 1995.

[8] Grishman, Ralph. “Adaptive information extraction and sublanguage analysis.” Proc. of IJCAI 2001. 2001.

[9] Harris, Randy Allen. The linguistics wars. Oxford University Press, 1995.

[10] Harris, Z., Gottfried, M., Ryckman, T., Daladier, A., & Mattick, P. (2012). The form of information in science: analysis of an immunology sublanguage (Vol. 104). Springer Science & Business Media.

[11] Demner-Fushman, Dina, Wendy W. Chapman, and Clement J. McDonald. “What can natural language processing do for clinical decision support?.” Journal of biomedical informatics 42.5 (2009): 760-772.

[12] Dolbey, Andrew. “BioFrameNet: a FrameNet extension to the domain of molecular biology.” (2009).

[13] Deléger, Louise, Leonardo Campillos, Anne-Laure Ligozat, and Aurélie Névéol. “Design of an extensive information representation scheme for clinical narratives.” Journal of biomedical semantics 8, no. 1 (2017): 37.

[14] Hirschman, Lynette. “Discovering sublanguage structures.” Analyzing Language in Restricted Domains: Sublanguage Description and Processing (1986): 211-234.


Harsh critiques in the Comments section below, please!

The last duel in France: traces in syntax

The last duel in France leads to a discussion of syntactic theory, ’cause that’s how I roll.

Wanna watch the last duel in France?  Here you go.  Scroll down past the video for an excerpt from an article on the topic from Le monde and the definitions of some of the French vocabulary therein.

The article in Le monde: click here.  Some relevant vocabulary:

retrousser [+ sleeves or pant legs] : to roll up.  Elle avait un de mes pyjamas dont elle avait retroussé les manches.  (Camus, L’étranger)

l’hôtel particulier : like a château, but it’s in a city, versus being in the country, and it could just as well be owned by a bourgeois as an aristocrat–I think it’s actually more likely to have been owned by a bourgeois, at least in Paris.  Don’t quote me on this.

Dans un jardin ombragé par des arbustes bienveillants, enveloppé d’une douceur printanière, chemise blanche, col ouvert, manches retroussées, deux hommes, épée à la main, se jugent, se jaugent, puis, sur un signe de l’arbitre, croisent le fer. Quatre minutes plus tard, le combat cesse un des deux duellistes ayant été touché par deux fois au bras. Cette scène n’est extraite d’aucun roman ou film de cape et d’épée. Elle eut lieu il y a exactement cinquante ans, le 21 avril 1967, dans le parc d’un hôtel particulier de Neuilly-sur-Seine.


English notes

wanna: the written form of the contraction of want + to.  One of the interesting things about this contraction is that it is only possible in specific syntactic contexts, and is absolutely impossible in others.  This lets you distinguish between the following.  Suppose that the following situations exist:

  1. There is going to be a contest.  Whoever wins the contest will be awarded a horse.  There are a number of horses available, and the winner of the contest will be able to choose the horse that they will receive.
  2. There is going to be a horse race.  One of the horses will win the race.

In situation number 1, if you want to ask someone which of the horses they would choose were they to win the contest, you could ask the question in either of two ways.  The second one is more casual, but they are both completely acceptable from a linguistic point of view:

Which horse do you want to win?

Which horse do you wanna win?

In situation number 2, if you think that someone has a preference regarding the winner of the race, and you want to ask them which of the participating horses they hope will emerge the winner of the race, you only have one option:

Which horse do you want to win?


Google the quoted phrase “which horse do you wanna win” and you will get 5 results, all of them in Japanese.  WTF, you’re wondering…


Screen Shot 2018-03-07 at 10.21.41

What you’re seeing in the Google results is sentences that illustrate interesting syntactic phenomena.  Most of the literature on syntax is written about English syntax (blame Chomsky), mostly by (notoriously monolingual) anglophones, and the classic examples in the field are hence mostly in English.  (Actually, the only classic non-English examples that I can think of are in Swiss German–more on that another time, perhaps.)  The which horse do you want to/wanna win sentences are used in classic transformational-generative grammar to argue for the existence of something called a trace.  This is held to be something that is present in the structure of the sentence, but that is not observable–the claim is that you can’t “see” it, but it’s there.  What is that “it”?  The idea is that underlying those two sentences are two “deeper” forms:

  1. For situation 1 (there’s a contest, and the winner gets a horse): Which horse do you want to win [the horse]?
  2. For situation 2 (there’s a horse race, and one of the horses will win): Which horse do you want [the horse] to win?

(Linguists in the audience: yes, I am simplifying this for didactic purposes–no hate mail, please.)  In both cases, the bracketed [the horse] goes away; in the second case, the “trace” that is left behind blocks the contraction of want + to to wanna.

Screen Shot 2018-03-07 at 10.46.55

 

Now, I know what you’re thinking: It’s obsessing about things like this that keeps Zipf from ever getting a second date.  …and you’re right, I imagine.