Veterans running for office as Democrats in 2018

It irritates me when people assume that US military people all vote Republican–plenty of us are reliable Democratic voters.

It irritates me when people assume that US military people vote Republican. We’re humans, which means that we are not all the same, and plenty of us are reliable Democratic voters. Trump the war-mongering draft-dodger? A poll in October of last year by the Military Times (the most recent one I can find today) showed the following:

  • 53% of officers oppose him
  • Only 30% of officers support him
  • Only 47% of enlisted personnel support him
  • 38% of enlisted personnel view him unfavorably

I’ve written a number of times about why his approval ratings are so low in the military–today I’ll just leave you with this link to a nice article about veterans running for office in 2018 as Democrats.

apple.news/AdOf4MjO4RGCibkB0mTBW3w

I tried to think of a different way to say this… Variability in biomedical languages

If ambiguity is the major problem in natural language processing, variability is the second.

This post is a draft of part of a piece that I’m writing at the moment, and on which I would like your feedback.  The topic is variability in language.  I pay the rent by researching the issues involved in getting computers to understand biomedical language–for example, the language of scientific journal articles, or the language of health records.  I’m in the midst of writing a chapter about this topic for a handbook of computational linguistics.  The audience is people who are interested in computational linguistics, but don’t have any experience with the biomedical domain.  If you’re a reader of this blog, that’s probably not a bad description of you.  So, it would be super-helpful to me to have your critique of this material.  I’m looking for anything that isn’t clear, anything that makes it difficult to understand my prose–anything that you think could be improved.  My grandmother will tell me how wonderful it is, so just feel free to plow into me with both fists–seriously, you’d be surprised at how much pain you can take in your old age, and I’m getting pretty old.  


Variability is the property of being able to express the same proposition in multiple ways.  If ambiguity is the major problem of natural language processing, variability is the second.  From a theoretical perspective, the field of sociolinguistics sees the study of variation in language as the central problem of linguistics, and it makes a strong case for that claim (e.g. Labov 2004)[1].  From a practical perspective in natural language processing, the high degree of variability in natural language prevents us from ever being able to use a dictionary-like data structure (such as hash tables, B-trees, or tries) to accomplish our tasks: we will never have a “dictionary” of all possible sentences (Chomsky 1959)[2].  This kind of approach would be fast and efficient—if only it were possible (Gusfield 1997)[3].

Sources of variability

 Some of the sources of variability in language are well-known even to the casual reader—for example, synonymy, or the availability of multiple words that have the same dictionary meaning.  A kind of synonymy that is especially relevant in biomedical languages occurs when there is both a technical and a lay or common term for something, such as the lay term heart attack and the technical term myocardial infarction.  Using technical terminology is important for the precision of scientific writing and of medical records (Rey 1979)[4].  However, the use of technical terminology also can make it difficult for patients and their families to learn about their illness or to understand their own health records (Kandula et al. 2010)[5].  One way to deal with this problem is to use natural language processing techniques to replace technical terms with their lay synonyms (Elhadad 2006[6], Elhadad and Sutaria 2007,[7] Deléger and Zweigenbaum 2009[8], Leroy et al. 2013a[9], Leroy et al. 2013b[10]) or their definitions (Elhadad 2006)[11] in order to make clinical documents or scientific journal articles accessible to non-professionals.  Doing this computationally, rather than manually, allows it to be done at enormous scales, or on demand.  This is a good example of why to do natural language processing in the biomedical domain: the possibility of doing real good in the world.

funkformsParaphrase is the phenomenon of different (and typically syntactically different) expressions in language of the same meaning (Ganitkevitch et al. 2013)[12].   Where synonymy operates of the level of words, paraphrase operates at the level of the phrase, or group of words.  Paraphrasing is a source of variability that is especially interesting in the biomedical domain because of how it interacts with the technical vocabulary of the field (Deléger and Zweigenbaum 2008, Deléger 2009, Deléger and Zweigenbaum 2010, Grabar and Hamon 2014)[13],[14],[15],[16]. Funk et al. looked for possibilities to paraphrase or replace synonyms in 41,853 terms from the Gene Ontology, and found that 27,610 out of 41,852 were paraphrasable, or had synonyms, or both[17].  This indicates that the possibilities for variant forms of the same thing occurring in the biomedical literature are tremendous.

argparaphrasesBut, do those tremendous numbers of variants really occur?  It appears that they do.  Cohen et al. (2008) looked at the incidence of alternative syntactic constructions involving common nominalizations (nouns derived from verbs, such as treatment from to treat) in scientific journal articles—for example, drug treatment of cancer and cancer treatment with drugs.  Figure 1 shows a typical finding: for some nominalizations, as many as 15 out of 16 possible variants could be found even in a relatively small corpus[18].

conceptlengthsHow different can these paraphrases be from each other?  Technical terms in biomedical research can be quite long, which means that there can be multiple candidates for paraphrasing and for replacement of synonyms (see above).  This means that the number of possible paraphrases of a long term can be explosive.  Those paraphrases, even for a short term, can be quite different—for example, Cohen et al. (2017) examined the relationship between the length of terms in the Gene Ontology and the length of appearances of those terms in the CRAFT corpus of biomedical journal articles, and found that 2-word terms could show up with paraphrases as long as 15 words[19].  The high incidence of just these two forms of variability in language—synonymy and paraphrasing—as well as the large differences that can be seen in forms with the same meanings illustrate just how much of an issue variability is for natural language processing in general, and in biomedical texts in particular.


Harsh critiques in the Comments section below, please!

[1] Labov, William. “Quantitative reasoning in linguistics.” Sociolinguistics/Soziolinguistik: An international handbook of the science of language and society 1 (2004): 6-22.

[2] Chomsky, Noam. “A review of BF Skinner’s Verbal Behavior.” Language 35, no. 1 (1959): 26-58.

[3] Gusfield, Dan. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press, 1997.

[4] Rey, Alain. La terminologie: noms et notions. No. 1780. Presses Univ. de France, 1979, p. 56.

[5] Kandula, Sasikiran, Dorothy Curtis, and Qing Zeng-Treitler. “A semantic and syntactic text simplification tool for health content.” In AMIA annual symposium proceedings, vol. 2010, p. 366. American Medical Informatics Association, 2010.

[6] Elhadad, Noemie. “Comprehending technical texts: Predicting and defining unfamiliar terms.” In AMIA annual symposium proceedings, vol. 2006, p. 239. American Medical Informatics Association, 2006.

[7] Elhadad, Noemie, and Komal Sutaria. “Mining a lexicon of technical terms and lay equivalents.” In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 49-56. Association for Computational Linguistics, 2007.

[8] Deléger, Louise, and Pierre Zweigenbaum. “Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora.” In Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora, pp. 2-10. Association for Computational Linguistics, 2009.

[9] Leroy, Gondy, David Kauchak, and Obay Mouradi. “A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty.” International journal of medical informatics 82, no. 8 (2013): 717-730.

[10] Leroy, Gondy, James E. Endicott, David Kauchak, Obay Mouradi, and Melissa Just. “User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention.” Journal of medical Internet research 15, no. 7 (2013).

[11] Elhadad, Noemie. “Comprehending technical texts: Predicting and defining unfamiliar terms.” In AMIA annual symposium proceedings, vol. 2006, p. 239. American Medical Informatics Association, 2006.

[12] Ganitkevitch, Juri, Benjamin Van Durme, and Chris Callison-Burch. “PPDB: The paraphrase database.” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.

[13] Deléger, Louise, and Pierre Zweigenbaum. “Paraphrase acquisition from comparable medical corpora of specialized and lay texts.” AMIA Annual Symposium Proceedings. Vol. 2008. American Medical Informatics Association, 2008.

[14] Deléger, Louise. Exploitation de corpus parallèles et comparables pour la détection de correspondances lexicales: application au domaine médical. Diss. Paris 6, 2009.

[15] Deléger, Louise, and Pierre Zweigenbaum. “Identifying Paraphrases between Technical and Lay Corpora.” LREC. 2010.

[16] Grabar, Natalia, and Thierry Hamon. “Unsupervised method for the acquisition of general language paraphrases for medical compounds.” Proceedings of the 4th International Workshop on Computational Terminology (Computerm). 2014.

[17] Funk, Christopher S., K. Bretonnel Cohen, Lawrence E. Hunter, and Karin M. Verspoor. “Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.” Journal of biomedical semantics 7, no. 1 (2016): 52.

[18] Cohen, K. Bretonnel, Martha Palmer, and Lawrence Hunter. “Nominalization and alternations in biomedical language.” PloS one 3.9 (2008): e3158.

[19] Cohen, K. B., Verspoor, K., Fort, K., Funk, C., Bada, M., Palmer, M., & Hunter, L. E. (2017). The colorado richly annotated full text (craft) corpus: Multi-model annotation in the biomedical domain. In Handbook of Linguistic Annotation (pp. 1379-1394). Springer, Dordrecht.


Harsh critiques in the Comments section below, please!

What computational linguists actually do all day: the relative frequencies edition

Scroll down past the picture of the mean-looking warthog.

Hi Zipf,

I spent my first hour this morning looking for papers that describe any tools that do any kind of enrichment analysis over terms found in text, but was generally unsuccessful. Searches containing the terms “concept” “term” “enrichment analysis” “text” “natural language processing” have mainly pointed me towards GSEA and GSEA-like tools like Ontologizer that focus on gene sets. Tools that determine what a document is “about” might also be useful.”

Do you know of any tools or papers you could point me towards?

Zellig


Hey there, Zellig,

I may be mis-understanding the question, so let me clarify.  Do you want to know about terms enriched in a document, or in a set of documents?  Gimme an idea about what the input looks like, and I think I’ll have an answer.
Zipf

Hi Zipf,

I think I am interested in looking at each document individually. And I’ll also clarify that the point of the task is not to find concepts, but to determine what a concept’s presence or absence in a document has on what it is “about.”

Zellig

OK, so in that case, the easiest thing to do would be… hm… relative frequency versus a background set of documents, or else tf*idf.  Explaining relative frequencies first:

  • your document has 100 words in total
  • mouse occurs 45 times in your document, or frequency = 45/100
  • the occurs 50 times in your document, or frequency = 50/100
  • warthog (I just learned how to say it in French, so warthogs are on my mind–“le phacochère”, if you were wondering, which sounds like a lot to scream if one of those nasty things charges you) occurs 5 times in your document, or frequency = 5/100.  Scroll down past the picture of the mean-looking warthog.
Southern_warthog_(Phacochoerus_africanus_sundevallii)_male
A male southern warthog. Picture source: By Charlesjsharp – Own work, from Sharp Photography, sharpphotography, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=37065293
  • your background data has 1000 words in total
  • mouse occurs 10 times, so frequency = 10/1000
  • the occurs 500 times, so frequency = 500/1000
  • warthog occurs 490 times, so frequency = 490/1000
relative frequencies, yours : background:
  • mouse = (45/100) : (10/1000), soit 45.0
  • the = 50/100 : 500/1000, soit 1.0
  • warthog = 5/100 : 490/1000, 0.1
…from which you conclude that your corpus is about mice, or at least it’s more about mice than the background data set is (’cause the word mouse occurs in your data at a ratio of 45:1 as compared to how often it occurs in the background data set).  You conclude that “the” tells you nothing about either corpus (the ratio is 1.0, meaning that the frequency of the word is about the same in both data sets), and that “warthog” tells you nothing about your corpus, but it does tell you something about the background data (because it only occurs in your data at a ratio of once to every 10 times that it occurs in the background data set).
The other easy approach: term frequency (count of occurrences of a word in a document), normalized by inverse document frequency (1 over the number of documents in which the word occurs).  This is known as tf*idf (term frequency * inverse document frequency).
Back to relative frequencies: that analysis is due to the late Adam Kilgarriff.  (I’m proud to say that we wrote a paper together before his untimely death, and lemme tell you: he really participated!)  Here’s a link to his paper about it.  He gives details about smoothing and the like that you’ll want to know about if you pursue this approach.  I’ll say that people are more familiar with the tf*idf approach, but personally, I think that relative frequency is a lot more intuitively comprehensible.
Zipf

What makes something interesting? The biomedical language version

What makes any domain an interesting one from the perspective of computational linguistics?

I pay the rent by researching the issues involved in getting computers to understand biomedical language–for example, the language of scientific journal articles, or the language of health records.  I’m in the midst of writing a chapter about this topic for a handbook of computational linguistics.  The audience is people who are interested in computational linguistics, but don’t have any experience with the biomedical domain.  If you’re a reader of this blog, that’s probably not a bad description of you.  So, it would be super-helpful to me to have your critique of my introduction.  I’m looking for anything that isn’t clear, anything that makes it difficult to understand my prose–anything that you think could be improved.  My grandmother will tell me how wonderful it is, so just feel free to plow into me with both fists–seriously, you’d be surprised at how much pain you can take in your old age.  


What makes the biomedical domain an interesting one from the perspective of computational linguistics?  Indeed, what makes any domain an interesting one from the perspective of computational linguistics?  In fact, Roger Shuy has asserted that the notion of any specific kind of data defining a particular area of linguistics is unsupportable.  As he puts it: “There is little reason for the data on which a linguist works to have the right to name that work” (Shuy 2002)[1].

Shuy’s statement is surprising, since he himself is North America’s leading forensic linguist—a linguist whose career has been defined entirely by his excellent work on language as it appears in the legal system.  And, indeed, many computational linguists describe themselves as doing biomedical natural language processing[2].

So, why study computational linguistics in the biomedical domain?  One can identify at least three primary types of reasons: theoretical, practical, and use-case-oriented.

Theoretical aspects of biomedical language

Biomedical languages are of interest to computational linguistics for two reasons: their relevance to questions about the nature and limits of grammar, and the light that they can shed on issues of reproducibility in natural language processing.

Biomedical languages and grammaticality

Biomedical languages are of interest from the perspective of computational linguistics in part because they stretch the limits of what can possibly be grammatical in a natural language.  Since the second half of the 20th century, much of linguistic argumentation has focused around grammaticality, which at a first approximation we can define as the question of whether or not an utterance is within the boundaries of some language, or not (Partee et al. 2012).  Early in the second half of the 20th century, utterances that came under discussion in linguistic debates tended to be either quite ordinary (such as the famous John loves Mary (Fowle 1850)[3]), or interestingly ambiguous—sentences like John loves his wife, and so does Tom (Duží 2012[4]) whose grammaticality (as opposed to their interpretations) was mostly not in question.  Although the discourse of that period of linguistic inquiry—particularly with respect to the development of syntactic theory—was often couched in terms of defining—and constraining—some set of sentences (“strings”), in practice it tended to be more about operations on (and to a much lesser extent, interpretation of) those strings.

This changed in the 1970s and1980s with the emergence of a research community that explored sublanguages: language associated with a particular genre and a particular kind of interlocutor[5].  Harris (1976) laid out a number of the principles of the sublanguage approach: semantics was embraced, not pushed off to some later date[6].  Although not always formalized as such, lexical preferences and statistical tendencies were taken advantage of (unusual in the era of a linguistics that had a complicated relationship with the lexicon and famously open disdain for statistics (Harris 1995)[7]). As Grishman (2001) explains, these were interesting for at least two reasons: they seemed amenable to syntactic description by reducing complex syntactic structures into simpler ones, reminiscent to the transformational analyses that were becoming dominant in linguistics, and they held the promise of mapping to a tractable model of the world, or semantics[8]—something that had largely eluded linguistics up to that point[9].

The biomedical domain seemed like a fruitful area of research to the early investigators of the topic, and it was.  Scientific journal articles were one such genre, with the interlocutors being researchers; clinical documents provided another, with the interlocutors being physicians.  Harris et al. (1989) provided an in-depth description of the language of scientific publications about immunology [10].  It set a standard for sublanguage research on biomedical languages that would remain unparalleled for years.  The usefulness of the sublanguage model can be seen in the fact that researchers continue to find it fruitful (some prominent examples in the biomedical domain are reviewed in Demner-Fushman et al. 2009)[11].  Some examples that illustrate particularly well the use of the sublanguage model for semantic representation include Dolbey (2009) in the molecular biology domain[12] and Deléger et al. (2017), which also includes a review of the basic issues and of other approaches to resolving them[13].  Clinical sublanguages soon turned out to be full of data that was ungrammatical on any standard treatment of syntax (see Table 1 for some examples), making it clear that they were good areas for investigating the limits of grammaticality at a time when grammaticality was generally considered a binary characteristic of language with strict semantic constraints .

Chest shows evidence of metastatic disease.
Examination shows the same findings.
x-rays of spine showed extreme arthritic change.
Urinalysis shows 1% proteinuria.
Brain scan shows midline lesion.

Table 1: Examples of ungrammatical sentences from radiology reports.  In English, the verb to show is usually thought of as requiring a sentient subject.  In these sentences, we see a wide range of non-sentient subjects: an anatomical organ (chest), an event (examination), x-ray films (x-rays of spine), a laboratory test (urinalysis), and the output of a computed tomography exam (brain scan).  All of the sentences have “generic” noun phrases where they would normally require an article or demonstrative (chest, examination, x-rays of spine, and brain scan).  Source: Hirschman (1986)[14].  No human subjects approval or HIPAA training is required for use of these examples.

[1] Shuy, Roger. Linguistic battles in trademark disputes. Springer, 2002.

[2] The Association for Computational Linguistics Special Interest Group on Biomedical Natural Language Processing has over 100 members at the time of writing.

[3] Fowle, William B. (1850) “English Grammar: Goold Brown.” Common School Journal, pp. 245-249.

[4] Duží, Marie (2012) “Extensional logic of hyperintentions.”  In Düsterhöft, Antje, Meike Klettke, and Klaus-Dieter Schewe, eds. Conceptual Modelling and Its Theoretical Foundations: Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday. Vol. 7260. Springer Science & Business Media, 2012.

[5] See Chapter 18, Sublanguages and controlled languages, this volume.

[6] Harris, Zellig. “On a theory of language.” The Journal of Philosophy 73.10 (1976): 253-276.

[7] Harris, Randy Allen. The linguistics wars. Oxford University Press, 1995.

[8] Grishman, Ralph. “Adaptive information extraction and sublanguage analysis.” Proc. of IJCAI 2001. 2001.

[9] Harris, Randy Allen. The linguistics wars. Oxford University Press, 1995.

[10] Harris, Z., Gottfried, M., Ryckman, T., Daladier, A., & Mattick, P. (2012). The form of information in science: analysis of an immunology sublanguage (Vol. 104). Springer Science & Business Media.

[11] Demner-Fushman, Dina, Wendy W. Chapman, and Clement J. McDonald. “What can natural language processing do for clinical decision support?.” Journal of biomedical informatics 42.5 (2009): 760-772.

[12] Dolbey, Andrew. “BioFrameNet: a FrameNet extension to the domain of molecular biology.” (2009).

[13] Deléger, Louise, Leonardo Campillos, Anne-Laure Ligozat, and Aurélie Névéol. “Design of an extensive information representation scheme for clinical narratives.” Journal of biomedical semantics 8, no. 1 (2017): 37.

[14] Hirschman, Lynette. “Discovering sublanguage structures.” Analyzing Language in Restricted Domains: Sublanguage Description and Processing (1986): 211-234.


Harsh critiques in the Comments section below, please!

A time and a place for everything

When to correct the other guy’s grammar–and when not to.

One evening I was riding the métro home, minding my own business, when a very, very drunk man got on.  He was carrying an open bottle of some sort of hard liquor, and occasionally took a swig.  (This and other obscure vocabulary items discussed in the English notes below.)  He was so plastered that he could barely stay on his seat as the train swerved.  He ranted incoherently–really incoherently.  (After he left, I asked the guy next to me: Pardon me sir, was he speaking French?  (If it’s in italics, it happened in French.) He gave me that look that people in Paris (and New York) give you when approached by a stranger before deciding that you’re OK, and then said: Of a sort.)

A young woman got on the train and took a seat.  She had her phone to her ear, and was talking.  The drunk, ranting guy leaned over, put his fingers to his lip, and said: Shhhhhh.

Bizarre, hein?  No–in a Parisian context, this actually wasn’t surprising at all.  The general French approach to politeness is: don’t do anything that would inconvenience the other person.  A very noticeable way that this works out is that in general, the French tend to communicate more quietly than Americans do.  Indeed, the first thing that I notice when I get off the plane in the US is how loud everyone is–I clear Customs, go sit in the United club, and find myself listening to the cell phone conversations of every random stranger within earshot. In Paris, if you see someone talking on the phone on the train, the chances are excellent that they’re not French–it’s just not done.  So, it wasn’t that bizarre for a shitfaced lunatic to interrupt his raving to say shhhhh to someone talking on the phone on the métro—he might have been hammered, but she was being rude.  In America, someone would have said some equivalent of “it’s a free country, she can talk on the phone if she wants to.”  People did hush him up when he got too carried away, but no one criticized him for saying shhhhh to the girl on the phone–that’s just logical, quoi...

For an extended discussion of the “don’t inconvenience the other guy” principle in French culture, see Raymonde Carroll’s Cultural misunderstandings: The French-American experienceor the original French version, Évidences invisibles: Américains et Français au quotidien.  Carroll’s book is the uber-citation on American/French cultural differences.


I thought about the drunk guy on the train and his shhhhh just now when I stepped out on my balcony (I have the good luck to have an apartment on the étage noble) for a cigarette–and overheard a delivery guy in the street below speaking on his phone.  Avant qu’elle apparaisse, he said–before she appears.  Even though I’m in France, where correcting other people’s grammar is just part of daily intercourse, I suppressed the urge to yell avant qu’elle N‘apparaisse–who doesn’t hate to see a good opportunity to use the ne expletif be wasted?–on the theory that this guy’s day was already going poorly enough without the shame of having some random foreigner fuck with his langue de Molière.  A time and a place for everything.

For the meaning of étage noble and the significance of what floor you live on, see this post on Parisian apartment buildings.  English notes below.

 


English notes

swigthe amount drunk at one time; a gulp.  (Merriam-Webster)  Some examples from the English Preposition Corpus, courtesy of Sketch Engine, purveyor of fine linguistic data sources and search engines therefor (note the lack of an E at the end–therefor is a different word from therefore):

  • I scowled into the night, took a swig of my beer and dumped the rest over the side of the deck .
  • I picked up the bottle beside me and took another long swig.
  •  If, after a stiff swig of nectar, we were to watch further developments, we’d find that in another 100,000 years or so, or even longer, exactly the same thing would happen again, and the compass would swing back suddenly to its original position.

How I used it in the post: He was carrying an open bottle of some sort of hard liquor, and occasionally took a swig

plasteredslang for drunk.  Some examples from the enTenTen corpus (just under 20 billion words of English scraped from the Web), again courtesy of Sketch Engine:

  • Once Dolly and I got really plastered together.
  • An hour or so later, the Englishman is really plastered. 
  • Jonathan is so ugly; I could only have sex with that double bagger if I was really plastered
  • And by “former glory,” of course, we mean “a time when college-aged people used beer pong as an excuse to get so plastered they sometimes made sexual overtures toward bar stools.
  • Only to realise the switch happens yet again and you’re there staring at the mouth of Gingy the Gingerbread Man (Midgett in a triplicate role with Sugar Plum) so plastered on that baking sheet like an angry drunk.

How I used it in the post: He was so plastered that he could barely stay on his seat as the train swerved. 

shitfacedalso slang for drunk.  Don’t use this one in front of my grandmother.

  • The night ended with Patty directing my drunk ass to grab the mattress and set up the bed while I was completely stumbling and shitfaced.
  • Let me get this straight — this stuff supposedly gives you more energy … so you can stay out later, drink more and get more thoroughly shitfaced?
  • The end of the week and I’m tired, over-worked and really just in need of deep sleep so I can get to work the next day with a fresh brain that can fire on all six creative cylinders but I opt to get shitfaced on free beer instead. 
  • In the Black Forest they celebrate by getting shitfaced, setting fire to 800-lb straw-packed oak wheels, rolling them down mountainsides into sleepy villages and making bets on the fates of the panicked peasantry as they flee in terror.

How I used it in the post: It wasn’t that bizarre for a shitfaced lunatic to interrupt his raving to say shhhhh to someone talking on the phone on the métro—he might have been hammered, but she was being rude. 

hammered….and, once again: slang for drunk.  

  • By the time we got to the dessert, I was, to put it delicately, hammered , as you can see from the picture above.
  • Made me want to check out more, especially as I was so hammered that I was in danger of keeling over, and consequently remember very little, other than that it was good.
  • I think the only way I’ll ever feel the urge to try that is if I’m already so hammered that it seems like a really good idea.

How I used it in the post: It wasn’t that bizarre for a shitfaced lunatic to interrupt his raving to say shhhhh to someone talking on the phone on the métro—he might have been hammered, but she was being rude. 


Conflict of interest statement: I don’t have one.  Sketch Engine doesn’t pay me to shill their stuff–I pay them to use it.

 

 

Cursing incoherently

I’m sitting at the breakfast table one beautiful spring morning when I start cursing in some incoherent mixture of French and English.

I’m sitting at the breakfast table one beautiful spring morning when I start cursing in some incoherent mixture of French and English: fuck!  Mais c’est pas possible !  Bordel de cul ! No!!!  What had happened: I was reading a comic book, and the ending touched me, deeply.  A comic book.  A COMIC BOOK.  I read Céline, and he mostly makes me laugh; I read Jean Genet, and he makes me laugh even more; reading Les liaisons dangereuses, I often shut the book just to let the beauty of a sentence that I had just read sink in.  But, what led me to break out in inarticulate multilingual shouts of rage and sadness was a comic book.  A fucking COMIC BOOK.


I’m hanging out in a bookstore not far from my little deux-piece (a two-room apartment, very common in Paris).  I’m browsing through a book, and all of a sudden I have to put it down and dash to a quiet, hidden corner of the store, where I burst into sobs.  (For context: I am an American male in his 50s, and American men of my generation do not, not, not cry.)  What caused this sudden storm of emotion: a comic book.  A comic book.  A fucking COMIC BOOK.  


Comic books–les bandes dessinées–are considered literature in France, like any other high-brow written form.  It’s not unusual to see men and women in business suits or stereotypically academic clothes (which is to say, blue jeans and a backpack full of journal articles on math or literature) reading one on the train on the way to work in the morning, and comic books can get literary prizes just like anything else.  The series that had me screaming over my breakfast was this one by Peru and Cholet:

Zombies, Tome 1 : La divine comédie


My Uncle John immigrated to the US from the UK as a young man and promptly joined the Army, which sent him to Korea.  Before he died, some oral history project sent someone to interview him about the experience, and we learned things that he had never, ever talked about, like the time that he had to pile up a couple bodies of his dead pals so that he could shelter behind them while he shot at the North Koreans (or Chinese, or whoever it was that was actually behind the triggers on the other side).  When I was a little tike, he made me solemnly swear to never read a comic book.  I still feel a little guilty every time I pick one up–I feel exempt from fulfilling that particular oath, since I made it as a small child, but as an adult, I take promises super-seriously, and rarely make them.  Hopefully, the quality–and power–of this particular one takes it out of the realm of the kinds of comic books that Uncle John was talking about.  Yes: I was moved to rage and sadness by a comic book.  comic book.  A fucking COMIC BOOK.


To my surprise, I notice that this is the 500th post on the Zipf’s Law blog.  It’s super-amazing to me that this thing that started out as a way to publicize information about the judo clubs of Paris, and then evolved into a way to keep my family and friends up to date on Parisian adventures that were too long for Facebook posts, has  become something else entirely, with as of today, more than 45,000 page views and just under 28,000 visits.  I thank Ellen Epstein for suggesting the blog in the first place, and all of you who comment on the posts–you give me the positive feedback of knowing that someone out there listens to what I say, and the helpful guidance of pointing out my errors in French, explaining French history and culture to me, and the like.  Even beyond the relief of getting the shit that grouille dans ma tête out of it and “on the page,” you folks who leave comments make this an enriching experience for me.  Thank you again.

English notes:

tike/tyke: a small child.  “When I was a little tike” is a common way of introducing something that you’re going to say about your early childhood.

French notes:

la bande dessinée : comic book, graphic novel.

How to abandon ship

The most important thing is to look before you leap: you have to expect the water to be full of debris, as well as your shipmates, and you don’t want to land on either of them.

May 19th, 2018

United States

Dear Zipf,

Chlöé says that she and her uncle both passed the highest ARC water-safety tests, but that her uncle, who got his cert a generation earlier, had to learn to jump into the water from destroyer-height, wearing a Mae West, without having the vest break his neck on hitting the water.
She wondered whether you’d learned how to do this, and if so, how to do it.
Reynaud
March 20, 2018
Zurich
Dear Reynaud,

Yep, sure did. The most important thing is to look before you leap: you have to expect the water to be full of debris, as well as your shipmates, and you don’t want to land on either of them. The vest thing makes perfect sense, but I don’t remember what to do about it–the old kapok vests have a high collar, which is meant to keep your face out of the water if you lose consciousness, and indeed, if forced straight upwards hard enough, it could probably take out your cervical spine. What I do remember how to do is that when you jump, you hold your balls.  And, no: I’m not kidding about your balls.  The idea is to avoid them getting racked up when you hit the water.  Today there are women on board ship, but I don’t know what they’re told to do.  You’re also taught to use a hat, your shirt, or your pants as flotation devices.  That last one is effective, but fucking HARD to do–I got worn out the first time I tried, and had to do it again to pass the test.

The basic thing once you’re safely off of the vessel is to get as far from the ship as quickly as possible: you don’t want to get sucked down when it sinks, and depending on how deep it is when (if) the engines explode, you could get injured by the shock.
The thing that they didn’t have us practice is swimming with burning oil on the surface.  They told us that at night, the burning oil lights up the water underneath it, so you look for a shaft of darkness, swim up to the surface through it, take a breath, and then submerge again to find your way away from the oil.
Zipf
Here’s a video showing how to use your pants as a flotation device.  This is actually better than what they were teaching when I was a squid (slang for “sailor”), in that we were taught to tie each pants leg individually, which is a hell of a lot harder than what this guy does: tie them together.  Note that this guy is using a floating technique, so he’s not expending very much energy while he prepares his pants–we just treaded water, which is exhausting when you can’t use your arms to help ’cause they’re occupied trying to get your pant legs tied and the @#$% things inflated.
uss_biddle
My ship, the USS Biddle. It’s a cruiser, formerly called a destroyer escort–bigger than a destroyer, but smaller than a lot of other things. Picture source: https://www.helis.com/database/unit/1068_USS_Biddle/  Hey, guess who didn’t serve?  Donald Trump–multiple draft deferments for college, and then a claim of a bone spur in his foot.  Snowflake.

English notes

 

  • ww11-raf-pilot-in-full-flying-clothing-with-mae-west-lifejacket-DXB07W
    Royal Air Force pilot wearing an inflated “Mae West” flotation device. Note how it comes behind the pilot’s neck–that is meant to keep his head out of the water if he’s unconscious. Picture source: http://www.alamy.com/stock-photo/mae-west.html

    ARC: American Red Cross.

  • cert: certification.
  • destroyer: a small ship, mostly used to screen big ships from submarines and aircraft.
  • Mae West: a kind of life vest.  It’s named after Mae West, a film star of the epoque known for playing super-sexy roles.
MaeWest
Mae West showing how you get a life vest named after yourself. Picture source: http://www.selenie.fr/2014/04/mae-west-la-sandaleuse-de-hollywood.html
French notes
This vocabulary comes up in Jean Genet’s lyrical Le miracle de la rose, in the occasional flights of fancy about shipboard promiscuity.
la frégate: frigate.
le destroyer or le contre-torpilleur: destroyer.
le croiseur: cruiser.