Ukraine Notebook: Abandon ship protocol

On January 6th, 2023, two British volunteers, Andrew Bagshaw and Christopher Parry, went missing near Soledar in the Donetsk region of Ukraine. As I write this, in late January 2023, there have been no verified sightings of them, and they are presumed dead or captured. Their vehicle was found soon after their disappearance–locked. Presumably they had to abandon it and head out on foot. You should prepare and practice for this situation. The protocol that I am suggesting for you here is based on the US Navy’s procedure for abandoning a sinking ship. Air crew members: if you can add something, please tell us about it in the comments.

  1. Count to ensure that everyone is present or accounted for.
  2. Check that all survival equipment is on someone’s back or in their pockets: BOB, medical bag, communication tools, escape route maps, water and food.
  3. Destroy all sensitive information.
  4. Consider destroying or disabling the vehicle.
  5. Notify someone.

Everyone present or accounted for: “everyone” means all team members and any passengers. Passengers are most often civilians who are being evacuated from the front (with attendant communication problems related to lack of a shared language), but may also be journalists (who can present their own set of challenges). If you must leave behind bodies, note their location. Take their passport, wallet, telephone, and any other personal effects or useful items (for example, individual first aid kit (IFAK), helmet, body armor, water). Consider leaving identification of some kind with them, or writing identifying information (including nationality so that the appropriate embassy can be contacted) on their clothes or body. Also consider taking a lock of their hair, some bloody clothing, or a cheek swab for later DNA matching. (Back in the day, American medics took fingerprints of bodies that would be buried overseas. See below for a link to an article on digital collection of fingerprints–no pun intended…)

Bug-out bag (“BOB”): this is an easily carryable container, typically a small- or medium-sized backpack, containing everything that you would want to have with you if you had to abandon your home (or office, or vehicle) in an emergency. You can find plenty of advice elsewhere on what a BOB should contain. Customize it for your operating environment, and update it as that changes, as the weather changes, as your crew size changes, etc.

Medical bag: this contains more things than one would have in an individual first aid kit (IFAK, or аптечка). Ours have the usual things, plus a splint, extra bandaging materials, and a radiation monitor.

Destroy all sensitive information: Destroy information on evacuees you have picked up or were on the way to pick up. Delete from your phone/destroy paper maps that show your safehouse, checkpoints, military units, humanitarian centers, routes… You should have left for your day’s mission with a separate map that shows the area that you would have to walk through to reach safety, and nothing else.

Destroying or disabling the vehicle: Even a vehicle that no longer runs is a valuable source of spare parts for the enemy in a war where logistics has been a/the major struggle for both sides. It also might contain sensitive information that you missed when you left it. I don’t know shit (to “not know shit” means to not know anything at all) about destroying vehicles–if you do, dear reader, please tell us about it in the Comments section. (In the Navy, you place explosive charges in relevant places so that the ship is guaranteed to go down.)

Notify someone: Tell them who is with you, where you are, and where you are heading.

The British reporter Tom Mutch knew one of the two missing British volunteers. He describes them as brave guys doing life-saving work–and as under-equipped and ill-prepared for working in a combat zone. You do not have to be ill-prepared. Practice this protocol before you need it. Andrew Bagshaw and Christopher Parry: I hope we see you again some day.

Want to support my work as a medic in Ukraine? If you know me, you can send money to me via PayPal using my email address. $2.99 will buy a pack of gauze, $20 will buy a pair of tactical glasses, $30 will buy a tourniquet, $125 will buy an individual first aid kit, $400 will buy a medical bag, $800 will buy a set of body armor. If you don’t know me, you can donate through any of these organizations. I work with all of them, and they’re all quite good.

Bryan T. Johnson & John A. J. M. Riemen (2019) Digital capture of fingerprints
in a disaster victim identification setting: a review and case study
, Forensic Sciences Research, 4:4,
293-302, DOI: 10.1080/20961790.2018.1521327

Picture at top of page: van stuck in the mud in the middle of an artillery duel, east side of Bakhmut. Photo by Ori Aviram.

The two books about writing that every grad student should read

Read these two books and graduate school will be much easier for you.

In my previous life, I was a pretty good linguist, although I was also a pretty crappy person. My name was John Peabody Harrington. In addition to be a crappy person (see here for details), I was also a crappy poet. This is because I avoided reading other poets, on the theory that doing so would contaminate my own innate style.

That was stupid of me. In fact, to become a good writer, the single most helpful thing that you can do is to read other writers. Read enough, and you will not only recognize the good ones and be able to take from them what works well, but you will also recognize bad ones, and can try to avoid doing what they do.

The second most important thing you can do: practice. Practice, practice, practice, and practice. In fact, this blog often serves as a place for me to find the rough spots in my writing techniques. For example, I think I’m OK at writing beginnings, but I suck at endings.

The third most important thing you can do: get feedback from other people. The most direct way to do this is to ask them to read, and to comment critically on, your stuff. The indirect way to do this is to read good discussions of writing by people who have done a lot of it. Having read a lot of that kind of stuff, I am going to suggest to you my favorites: two books that every grad student should read. I admire them so much that I keep multiple copies of both of them in my office, and if a student approaches me about doing a project in my group, I hand them a copy of each; part of the deal for me saying “yes” is that they commit to reading them. Ready? Here goes.

  1. They Say/I Say: a hugely popular guide to “argumentative writing,” now in its 5th edition for very good reason. Authors: Gerald Graff and Cathy Birkenstein.

“Argumentative writing” is writing that tries to convince someone of something. In scientific writing, you are trying to convince your reader that (1) your topic is worth the trouble, (2) your data is appropriate for exploring it, (3) your methods could answer the question that your paper is asking, and (4) your results mean what you think they mean.

2. How to complete and survive a doctoral dissertation: this book is probably older than you are, and some of its technical aspects are charmingly out of date, but much of its advice is timeless: the pluses and minuses of choosing a dissertation advisor who is an internationally-known scholar, versus going with a freshly-minted PhD; how to demonstrate the novelty of your work even in what you believe to be the worst of circumstances; the worst thing you could possibly do in the midst of writing your dissertation; whose responsibility it actually is to get you through graduate school (spoiler alert: yours)… Author: David Sternberg.

Remember what I said about sucking at endings? This is what me sucking at an ending looks like. Read the books, and if you’re too poor to buy them, do a rotation in my lab–I hand them out like candy.

Picture source: the Hyperrhiz 21 blog.

Ukraine Notebook: What medications to bring when you volunteer

Bring these medications to Ukraine when you come to volunteer and your contribution will be even bigger.

  1. Moxifloxacin is what an American combat medic will give you if you have a penetrating wound. I have no idea how to find it in Ukraine, but your doctor can give you a prescription for it. I was very, very happy to have some with me here when a frightened cat sunk a fang very, very deep into my arm. (I was also very, very happy to have clear ballistic glasses with me when I was trying to get her out from under a bathtub while she was trying to scratch me to death, but that’s a topic for a post about ballistic glasses, right? Fang explained in the English notes at the end of the post.)
  2. Do not bring aspirin or ibuprofen. US Department of Defense guidelines say not to take them for a week before entering a war zone. And, yes: since Putin deliberately targets civilian targets of no military value, all of Ukraine is a war zone.
  3. Meloxicam is what an American combat medic will give you for any battlefield injury. See above regarding the situation in Ukraine.
  4. Acetaminophen (sold in the US as Tylenol or in generic form) is the third thing that an American medic will give you if you are injured.
  5. The antidiarrheal medication of your choice. You shouldn’t travel ANYWHERE without this anyway.
  6. All medications that you normally take. Bring more than you think you will need. All problems in Ukraine are supply chain problems, so do not assume that you will be able to buy ANYTHING wherever it is that you happen to find yourself. Yes, I do understand that it is difficult to get more than your allotted quantity of prescription medications in the US, since your insurance company rations your health care.

Want to help the situation in Ukraine? Base UA/База ЮА is an excellent organization doing evacuation of civilians from the front lines (and a bunch of other stuff). I vouch for them completely. Send PayPal contributions to, and please mention that you found us through the Zipf’s Law blog.

Photo source:

English notes

fang: “A fang is a long, pointed tooth.” (Wikipedia) Fang often occurs with the verbs to bare and to sink into. Examples from Sketch Engine, purveyor of fine linguistic corpora and tools for searching them:

  1. He sank his fangs into her shoulder.
  2. Spike longed to sink his fangs into Xander’s hot flesh.
  3. “Then you deserve this,” he said as he sunk his fangs into the man’s throat and drank hungrily.
  4. Sink those fangs into one of our mini milk chocolate caskets.
  5. How I used it in the post: A frightened cat sunk a fang very, very deep into my arm.

To bare means to uncover completely:

  1. He raised his lips, baring his fangs.
  2. His ears lay back and his fangs were bared.
  3. But a beaten dog will bare its fangs eventually.

Ukraine Notebook: What clothes should I bring?

Volunteering in Ukraine? Choose your clothes carefully and your life will go smoother.

No English notes for this post, sorry… For those of you who read this blog to learn new English-language vocabulary, you will find links to the definitions of the words that I would have talked about if I had the time.

“Hot extractions:” rescuing civilians from the front line. This is a common job for foreign volunteers in Ukraine.

  1. Don’t bring cammies unless you are pretty sure you will be joining a military unit. There is currently a regulation against wearing camouflage unless you are officially associated with the military. Also, if you are doing hot extractions, it scares the civilians who you most want to reassure.
  2. Do bring obviously American t-shirts, a ballcap with an American flag or similar insignia, and whatever else you wear that marks you as obviously American. It is a big morale boost for the locals, and they will be very nice to you. (Note that many men wear a ballcap and a “tactical beard” here, so don’t rely on those alone to communicate to people that you are an American.)
  3. Don’t bring 100% cotton t-shirts. You will be hanging up your clothes to dry, and 100% cotton t-shirts take way too long to dry.
  4. Do bring kneepads. Whether you’re fighting, doing hot extractions, or teaching TCCC, you’re going to spend a lot of time kneeling on the ground. Elbow pads might be less necessary–depends on what you’re doing.
  5. Do bring shower shoes–not for showering, but to wear in your living quarters. Ukrainians tend to be maniacal about keeping floors clean, and they don’t wear outdoor shoes inside except in public places. I have even seen guys wearing shower shoes in military headquarters! One US military veteran doing hot extractions here with me commented that he had never seen more Crocs in his life than in Ukraine–this is why. Shower shoes can be bought in Ukraine, but not necessarily in the areas where there’s fighting (most stores are closed there).
  6. Do bring hearing protection. Disposable ear plugs will probably suffice. You might not wear them in the field, but if you find yourself on a firing range without them, you’ll wish you’d found room for a couple pairs…
  7. Do bring a Camelbak or canteen. Bottles of water are currently easy to find except in the areas with the most active fighting, but they are hard to carry. Personally, I have never used my Camelbak because (a) I feel guilty having it when none of my Ukrainian buddies do, and (b) a lot of the time, there isn’t enough drinking water available to fill one anyway. But, my canteen does travel with me–it’s wearable, and small enough that I can usually fill it.
  8. Fireproof/fire-resistant clothes are always a good idea… That said, I don’t know how to buy them without spending phenomenally large amounts of money, and I have exactly no fireproof clothing whatsoever… If you have some insight into this, please tell us about it in the Comments section.
  9. Heavy gloves, tactical or otherwise. No matter what your job is, you’re likely to be moving large amounts of humanitarian aid, and if you are doing hot extractions, there will be rubble everywhere. Hand injuries are actually the thing that I have treated the most here–avoid them if you can.
  10. Do bring tactical pants. You will want to carry some things on your person at all times, such as your passport. My 5.11 Defender jeans have held up OK here, but my tactical pants are definitely more practical.
  11. Do bring shirts that you can wear comfortably under body armor. That means a pullover shirt with a smooth front and no pockets. Rationale: things on the front of your shirt can get pretty uncomfortable when your armor has been pressing on them for a few hours. Probably obvious to you younger kids, but definitely not to us old guys who grew up wearing Vietnam-era jungle fatigues with those four big, baggy pockets on the front.
  12. Do invest in some merino wool clothing. See above regarding how long it takes for 100% cotton garments to dry here…
  13. Do bring clear ballistic glasses. Everyone knows to bring ballistic sunglasses, but clear glasses are crucial in unlit buildings (there is highly unlikely to be electricity in buildings from which people need to be evacuated) and at night. Ophthalmological surgery materials are in short supply here… Much wiser to just wear your fucking eye protection.
  14. Do bring your heavy-duty belt. We might need to drag you to safety by it… If you have a shooting belt with MOLLE attachments, it wouldn’t be a terrible idea to bring it–who ever has enough space on their plate carrier to clip gloves, ballistic glasses, a Leatherman, a magazine pouch full of snacks and cigarettes…?
  15. Do bring full kit! Be careful about export restrictions, e.g. for night vision goggles. The Polish Customs officials are absolute assholes–they even confiscated a buddy’s IFAK (tactical first aid kit).
  16. Do bring patches. People love to know that Americans support Ukraine, and of course soldiers will want to trade them.
  17. Do bring a shemagh if you normally wear one. I often give mine away as a present, and then I miss it until I can find another one… If you don’t know what a shemagh is: you don’t need to bring one.
  18. Do avoid bulk whenever possible. Depending on what your job is, you might spend a lot of time getting in and out of tight vehicles, spaces, etc.
  19. Do bring a poncho/poncho liner.
  20. Do inform yourself about the typical weather during the period when you will be incountry, but don’t assume that you know where you’ll be. Life is unpredictable here, and many foreign volunteers find themselves in multiple regions of the country even during a relatively short stay.

Ukraine: The Russian army against a bunch of grandmothers

This feels less to me like a war of one country against another than of the Russian army versus a bunch of defenseless grandmothers…

This message from a friend who is volunteering as a medic in Ukraine showed up in my Inbox the other day…

Yes, I’ve been in Ukraine since the beginning of May. I’m a medic, as I was in the Navy. You asked about my safety. “Safe” is a very relative concept… When I’m not in the field, my morning routine consists of making a cup of coffee, grabbing a pack of cigarettes, and then sitting on the balcony to watch the morning rocket attack on my entirely residential neighborhood. I say “morning rocket attack” because there has been one almost every day since I got here. Probably sounds horrifying, but since the only people carrying guns around here are soldiers, I actually feel safer in the city than I do in the US.  When I’m in the field, it’s a different story. I mostly do evacuations of civilians from the front. The Russians enthusiastically shell refugee collection points and clearly marked emergency vehicles, and evacuating civilians from the front means going to refugee collection points in clearly marked emergency vehicles. As it happens, I have a relatively high tolerance for danger, so although it’s certainly not “safe,” that’s fine. What’s not OK is that because the Russians hit those places and vehicles so hard, and by this point a large proportion of the people who have not yet left the front are old folks, this feels less to me like a war of one country against another than of the Russian army versus a bunch of defenseless grandmothers…

English notes

in the field: In a military context, this means being out doing whatever it is that you do. Examples:

  • Your service member is headed out into the field and it looks like the entire military gear issuing office is located in your living room. No matter what their training mission might be, they will want to prepare and pack a few things for the field that will make things a little bit easier while they are “work camping.” These are things that have been suggested by actual service members who have been in the field for countless hours, days, weeks, and even months. Source: 35 things every service member needs for the field, from the Daily Mom web site.
  • A good razor with a shave gel that protects throughout the day is key for a service member who is shaving in the field. Source: the Daily Mom web site.


The picture at the top of this post shows old women sheltering in the basement of the refugee collection point in Lysychansk, in the Donbass region, during an artillery strike. My friend spent the attack outside, listening to debris bounce off of his helmet with every hit. The Russians eventually flattened the collection point.
The picture is a screen shot from this video by documentary filmmaker Marharyta Kurbanova, a member of my friend’s team.


I received this photo and the accompanying note from a friend who has been evacuating civilians in the Donbas region of Ukraine.

I’m not much of a picture-taker— pulling out a camera here feels sordid, and I’m not sure that I would want to remember anything about this experience anyway. But, this struck me. In a small town looking for a place to pee—there’s been no running water here for a long time, so it’s better to do your business outside—I came across a bunch of rusting barbed wire with vines grown around it. I saw those vines as a material reflection of just how many years the people in the region have been under attack: not since February 24th, but for eight years now.

It’s one thing to do something bad—we all fuck up sometimes. It’s another thing altogether to PERSIST in doing something bad. Eight years—that’s a lot of persistence in doing something bad.

Putin has to be stopped.

Envoyé de mon iPhone

Heading childhood anxiety off at the pass

Untreated anxiety in children is associated with all sorts of bad things later in life. The good news is, if you do treat it, it can usually clear up.

Untreated anxiety in children is associated with all sorts of bad things later in life–mood disorders, alcohol and drug abuse, suicidality, underachievement in school, and low earning potential. The good news is, if you do treat it, it can usually clear up.

It’s usually easier to prevent something than it is to treat it, so it would be great if we could predict which kids are likely to develop chronic problems with anxiety, and head it off at the pass. That might actually be plausible, since anxiety has a trajectory of development. As Strawn et al. put it, “…the adolescent with panic and generalized anxiety disorders was once a boy with separation anxiety disorder and…a toddler with extreme shyness…”

But, how would one do that prediction?

(If you’re mainly here to improve your English, you will find an explanation of head it off at the pass in the English notes at the end of this post.)

I am a happy practitioner of the write-about-what-you-don’t-know approach to scribbling. Right at this moment I am realizing that I don’t know very much about the development of anxiety in children and adolescents. So, I am reading the paper Research Review: Pediatric anxiety disorders–what have we learnt in the last 10 years?, by Jeffrey Strawn, John Walkup, and a bunch of other folks. It describes a number of risk factors for the development of a variety of anxiety disorders. The risk factors fall into categories of cognitive bias, behavioral tendencies, family environment, parental disorders, substance abuse, and environmental exposure.

When I’m trying to understand a new disease, I sometimes play this game: walk into a restaurant, look around, and pick out the person most likely to suffer from it. (At my advanced age, that is often me, but that’s another story.) For the kinds of risk factors that are related to pediatric and adolescent anxiety disorders, that’s not really an option, so instead, I’m trying something different: I’m writing about some kids who I would expect (based on my limited knowledge) to develop anxiety–or not. So, here’s what we’re gonna do today: we’ll look at my little vignettes, say for each one whether we think the kid is at high risk or low risk of developing an anxiety disorder, and then explain why. I have written these vignettes in the style of the notes that a physician (doctor) writes when they examine a patient. That’s why it sounds odd. (We’ll talk about some of those oddities in the English notes.) Ready? C’est parti.

John is a 5y3m-old male with an uneventful medical history. He is referred to the psychiatry clinic because of bursts of tears and screaming when dropped off at school, persisting into the third week of the school year. He is in his third foster home since his mother was hospitalized for anxiety and depression six months ago.

Reminder: this is not the story of a real child. I have made it up myself for educational purposes.

Likely outcome? John is at high risk of developing an anxiety disorder. His behavior at school–not wanting to be dropped off even after three weeks from the beginning of the academic year–is a kind of behavioral inhibition, and behavioral inhibition is a risk factor for later development of anxiety disorders. Childhood separation events are, too, and John has experienced these multiple times–first with the long and continuing hospitalization of his mother, and then due to repeated changes of foster care placements. Having a parent with an anxiety disorder or depression also increases a child’s risk of developing an anxiety disorder, and John’s mother has both of those–that’s why she’s been hospitalized for so long.

Mary is an 8 year old healthy-appearing female. She is referred to the environmental health clinic after routine screening at her school suggested a high probability of lead and mercury exposure.

Mary lives with her paternal grandmother since her mother was imprisoned for sale of controlled substances and child endangerment and her father died in an automobile accident soon after. The grandmother seems quite controlling, but reports that she has changed household routines in response to Mary’s fear of sleeping alone.

Reminder: this is not the story of a real child. I have made it up myself for educational purposes.

Probable outcome? Mary is at high risk of developing an anxiety disorder. Exposure to environmental contaminants increase a child’s likelihood of developing subsequent problems with anxiety. Like John, she has experienced multiple separation events, with her mother being sent to prison and her father dying soon after. Over-controlling parenting, such as she is getting from her grandmother, also increases the risk of developing an anxiety disorder. “Family accommodation,” or changes made to group behavior in response to a child’s early anxiety symptoms, also increase the likelihood of developing an anxiety disorder, so despite the use of “but” in the example, this is not a good thing for this kid.

Harry is a cheerful, outgoing adolescent who presents in the Emergency Department with exquisite point tenderness in the right femoral area after a fall experienced while practicing his newly-discovered passion for rock-climbing. His father died in Iraq when John was six months old. His mother remarried two years later, and she reports that John has been close to his stepfather ever since. He denies alcohol, drug, or tobacco use.

Reminder: this is not the story of a real child. I have made it up myself for educational purposes.

What does the future hold? Harry is at low risk of developing an anxiety disorder. He clearly does not have a fear of trying new things, and is not shy. Although he lost his father, it happened so early in his life that he probably did not experience it as a separation event–recall that his father was deployed to Iraq–and he has always had a close relationship with his stepfather. The lack of alcohol, drug, or tobacco use is relevant in that these kinds of substance abuse (and I say that as someone who cheerfully enjoys fine American tobacco products) are often associated with anxiety disorders.

English notes

To head something off at the pass: to take action in order to prevent something from happening. This is a cowboy thing: a pass is a narrow path at a low point in a mountain range that lets you get through the mountains without having to climb them. You can prevent someone from getting somewhere that they’re trying to go if they have to go through a pass to get there–they’re narrow, and therefore easy to block. Some examples of the use of this expression:

  • Dilbert tries to head off criticism of Trump at the pass by defining it as coming from some “other side.” (Source: Twitter)
  • The apologists for Trump & Trumpies sure managed to tone down his rhetoric & head off his bigotry at the pass, didn’t they. (Source: Twitter)

How I used it in the post: It’s usually easier to prevent something than it is to treat it, so it would be great if we could predict which kids are likely to develop chronic problems with anxiety, and head it off at the pass.

Now let’s look at some of the odd aspects of medical English:

to refer to: in this sense, to refer someone to a treatment facility is to send them to that facility for consultation by a specialist. Your insurance company will usually require you to have a referral from your primary care provider for this kind of thing. How I used it in the post:

  • He is referred to the psychiatry clinic because of bursts of tears and screaming when dropped off at school
  • She is referred to the environmental health clinic after routine screening at her school

to present with something: this expression is used to describe a patient’s state when first meeting with a health care provider. How I used it in the post:

Harry is a cheerful, outgoing adolescent who presents in the Emergency Department with exquisite point tenderness in the right femoral area after a fall experienced while practicing his newly-discovered passion for rock-climbing.

Burstiness in language and the liberation of Paris

In which language displays interesting statistical properties, some people get fired, and I learn a few words about the Army.

Twenty-plus years ago, I got my first job as an actual, card-carrying linguist, working for a company that did things with big collections of linguistic data, using them to improve computer programs that did speech recognition, i.e. figuring out what words a person is saying.

One fine day the people that gave us the vast majority of our income sent their big-collection-of-linguistic-data specialist to visit us. We demonstrated to him the computer program that we had built to answer the question how can you tell when a big collection of linguistic data is big enough? We pointed out how to spot the tell-tale sign on a graph that means “it’s big enough.” “Oh, that just means that linguistic data is bursty.”

The blue line shows a big collection of linguistic data that is not nearly big enough. The other lines show big collections of linguistic data that are big enough. The telltale sign: a line that has gotten flat. Picture source: Irina Temnikova, Negacy Hailu, Galia Angelova, and K. Bretonnel Cohen. “Measuring closure properties of patent sublanguages.” In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 659-666. 2013.

What did he mean by “bursty?” We had a guess, but weren’t exactly sure, and given that his company paid us a lot of money and he was their expert, my boss thought it best not to push back. A few months later, they declined to renew our contract, and our owner laid everyone off and went away to do something else. Was it because we didn’t push back on the big-collection-of-linguistic-data expert’s dismissiveness? Probably not–our little company committed far bigger errors, and on a sadly regular basis. Whatever–the job market for computational linguists was not terrible in those days (it’s pretty wonderful now), and I found my second job as an actual, card-carrying linguist pretty quickly. But: burstiness is pretty important, and it continues to bump into my life today, in various and sundry ways, some of which will be of interest to readers of this blog.

What burstiness means: per Wikipedia,

In statisticsburstiness is the intermittent increases and decreases in activity or frequency of an event.

Wikipedia, Burstiness

In plain English: burstiness is present when something doesn’t happen for long periods of time, but then happens a lot, and then goes back to not happening very often. Some things that have this characteristic: hurricanes, and pandemics. Statisticians care about burstiness because bursty things are difficult to characterize with normal statistics, so you have to come up with new techniques to work with them; people like disaster planners and public health experts care about those statistics because it is difficult to predict, and therefore to plan for, things that have weird statistical properties.

From a computational linguist’s perspective, burstiness is important because in big collections of language, you don’t see new words very often, but when you do, sometimes you see a lot of them at once. If you’re trying to do something like build a dictionary for a computer program, you typically do that by finding all of the words in a big collection of linguistic data. But, how do you know when your collection of linguistic data is big enough? See above; the problem is that if you kept growing the collection, you know that there will be bursts of new words, but you can’t keep growing your collection forever–at some point, you have to stop and work with what you have at hand.

Many of our dear fellow readers are engaged in learning a language that they don’t already speak. I am one of them–if you have been reading this blog for a few years, you have followed my feeble attempts to learn la langue de Molière, also known as “French.” By now I know the language well enough that I can pick up a book in it and not have to turn to a dictionary very often. But, when I do, it typically happens like this…

Right at this moment, I’m reading Paris brûle-t-il ?, “Is Paris burning?,” the work of reference on the liberation of Paris. I typically get through about three pages before I have to look up a word. But, then this morning, I’m reading about the French 2nd Armored Division rolling from Normandy to Paris when I come across this sentence. I had to look up all of the words in bold face:

Glissant en silence sur leurs six roues de caoutchouc, les automitrailleuses des spahis à calots rouges, “chiens de chasse” de la division, ouvraient la marche.

Dominique Lapierre and Larry Collins, Paris brûle-t-il ?, published by Robert Laffont in 1964.
  • l’automitrailleuse: a light armored vehicle.
  • le spahi: native cavalry trooper of the Maghreb.
  • le calot: garrison cap in English; when I was in the Navy, we called them “cunt caps.” A calot has no brim or visor, and therefore can be folded flat and tucked under the epaulet of a military jacket.
A Russian garrison cap, or calot in French, pilotka in Russian.

After that, it was back to my normal rate: about one word every three pages. That certainly counts as “not very often,” and is pretty good for a non-native speaker. To then jump to three words in a single sentence, and then go back to my base rate of one word every three pages, is a good example of burstiness. Once again, we see why one might right a blog like this one–a blog about the statistical properties of language and their implications for people who are trying to learn one. What happened to the dismissive big-collections-of-linguistic-data expert? I don’t know for a fact, but I do know that people who are dismissive of the opinions of others don’t typically have much professional success. Personally, I took what I learned from the experience of working at a failed software start-up to do a better job of being a computational linguist, and have had a wonderfully fun time with it. Want to try a career in computational linguistics yourself? Start here if you are not a graduate student, or here if you are, and I hope you have as much fun with it as I have!

French notes:

Despite what its name would lead one to think, an automitrailleuse does not necessarily carry a machine gun. Here some pictures of modern automitrailleuses. You’ll notice that some of them look a lot like tanks. The salient differences are that (1) they weigh less, and (2) they have wheels, not treads.

How we're sounding stupid today: On the propriety of examples

My first language is (American) English, but I speak French well enough that if I want French people to believe that I’m an American, I have to convince them of it. Comparing and contrasting French and American political appartenances helps, as does my ability to explain the difference between felonies and misdemeanors and how they affect the length of your prison sentence. Why it doesn’t occur to me to just speak English with them, I couldn’t tell you–I’ll have to try it some time…

My ability to speak French well doesn’t mean that I don’t make absolutely stupid mistakes, though. Case in point: propreté and propriété. One means “cleanliness” and one means “property,” but if I need to say either “cleanliness” or “property” in French, which of those two words propreté and propriété will come out of my mouth is pretty random. How random? I’d guess a 50/50 chance for either of them. So, how often do I say the right/wrong word? Let’s figure it out.

First, we have to make some assumptions. Assumption #1: the probability of me needing to say cleanliness and the probability of me needing to say property are equal. If we don’t make that assumption, then we have to adjust the calculation of how often I say the wrong word to account for how often each of those two words get said. By me. In French. Complicated? Yes. Hence: Assumption #1.

Assumption #2: the probability of me saying the right word and the wrong word are equal. Otherwise, we have to adjust our calculations of how often I say the wrong word to account for different probabilities for each. By me. In French. Complicated? Yes. Hence: Assumption #1, and Assumption #2.

With those assumptions in place, let’s figure out the possible outcomes in a situation where I need to say one of those words: “cleanliness:”

  1. I need to say “cleanliness” and I say propreté (the right word)
  2. I need to say “cleanliness” and I say propriété (the wrong word)

We have two possible outcomes (that’s the technical term), so the probability of either of them is 1/2, or 0.50, or 50%.

It works the same way if I need to say “property”–there are two outcomes:

  1. I need to say “property” and I say propreté (the wrong word)
  2. I need to say “property” and I say propriété (the right word)

Back to our original question: how often do I say the right/wrong word? Well… we need to change the question. To wit: to know how often I say the right/wrong word, we would need to know the probability of me saying every word that I say, and calculate the probabilities of me getting them right/wrong.

However: I don’t give a fuck about that. What seems funny to me about the fact that I am equally as likely to fuck up the words cleanliness and property is that they’re so fucking…common. I mean, I don’t have a problem with the vocabulary for talking about, say, why we have the Electoral College or why Beaux Arts Victorian houses aren’t built any more, but I can’t talk about the fact that if my little corner of New Orleans gets flooded in the next couple days, I am going to have some hot, sweaty, bug-infested work ahead of me as soon as I can get a plane ticket back there. Yes, friends and family: I am safe and sound in Colorado.

Note to self: propreté is pretty close to propre, “clean”–maybe that can help me remember? And for practice (just in case writing this blog post wasn’t enough), here are some sentences to practice with, courtesy of the Sketch Engine web site, your home for fine linguistic corpora and the tools for searching them. Scroll down for the answers:

  1. Je suis en train de vendre ma ______.
  2. Il y a des efforts à faire concernant la ______ de la piscine.
  3. Comment conserver la ______ d’une salle de bain ?
  4. Dans les années 60, on a étudié les ______ de trous noirs.
  5.  La couleur blanche est rattachée généralement à la pureté et à la ______ .
  6. Balai vapeur hyper polyvalent – pour plus de ______ dans la maison !
  7. Ton corps même n’est pas ta ______ ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
  8. Les jeux vidéo ne sont pas la ______ exclusive de ces hommes blancs cishétéros.
  9. Les oreilles : vérifiez régulièrement la ______ des oreilles de votre chien.
  10. Actuellement la ______ appartient à la commune.
  11. Tous les jeux flash présent sur le site restent la ______ de leurs auteurs respectifs.
  12. Votre langue, cher monsieur Walder, est révélatrice de l’état de ______ du sexe de votre femme, point barre.
  13. Les sanitaires sont d’une ______ immaculée et il y a même des machines à laver.
  14. …les métaphores qui transposent certaines ______ d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
  15. Entretien des trottoirs : Chaque Soiséen est responsable de l’état de ______ du trottoir qui borde sa ______.
  16. Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de ______ de la ______.
  17. Un matériau aux multiples ______ – résistance, ultra______ – et qui s’adaptent aux dimensions de vos projets.
  18. Il n’a cependant pas les ______ ou la ______ du biométhane naturel.

Picture source: Scroll down for the answers to the exercise!

  1. Je suis en train de vendre ma propriété.
  2. Il y a des efforts à faire concernant la propreté de la piscine.
  3. Comment conserver la propreté d’une salle de bain ?
  4. Dans les années 60, on a étudié les propriétés de trous noirs.
  5.  La couleur blanche est rattachée généralement à la pureté et à la propreté .
  6. Balai vapeur hyper polyvalent – pour plus de propreté dans la maison !
  7. Ton corps même n’est pas ta propriété ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
  8. Les jeux vidéo ne sont pas la propriété exclusive de ces hommes blancs cishétéros.
  9. Les oreilles : vérifiez régulièrement la propreté des oreilles de votre chien.
  10. Actuellement la propriété appartient à la commune.
  11. Tous les jeux flash présent sur le site restent la propriété de leurs auteurs respectifs.
  12. Votre langue, cher monsieur Walder, est révélatrice de l’état de propreté du sexe de votre femme, point barre.
  13. Les sanitaires sont d’une propreté immaculée et il y a même des machines à laver.
  14. …les métaphores qui transposent certaines propriétés d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
  15. Entretien des trottoirs : Chaque Soiséen est responsable de l’état de propreté du trottoir qui borde sa propriété.
  16. Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de propreté de la propriété.
  17. Un matériau aux multiples propriétés – résistance, ultrapropreté – et qui s’adaptent aux dimensions de vos projets.
  18. Il n’a cependant pas les propriétés ou la propreté du biométhane naturel.

English notes: In my defense, a big part of my problem, I would guess, comes from the fact that English has the word propriety. The Merriam-Webster web site gives these synonyms for it: decencydecorumform. Examples:

  1. Zipf, I’m not sure about the propriety of that example about the cleanliness of Mr. Walder’s tongue.
  2. President Obama was the very *model* of propriety. Never once did he say or do anything to make America ashamed of him. (Source: Twitter)
  3. Even inside the nation’s prominent law firms preparing to help President Trump wage a legal war challenging the results of the election, concerns are intensifying about the propriety and wisdom of working for Trump, the New York Times reports. (Source: a tweet from the San Francisco Chronicle)

Computational linguistics and misinformation

Computational linguistics takes on the infected swamp that the World-Wide Web has become

In the late 1990s, I worked at a start-up. At the time, it was one of the 25 largest web sites in the world.

Why “largest web site,” and not “biggest web site?” English tends to use “big” to refer to physical objects, and “large” to refer to abstract concepts. Note that I said “tends to”–this is a statistical tendency, not an absolute.

Like a lot of people working for internet-related businesses or causes, we thought that we were making the world a better place. The World-Wide Web was going to democratize so much–access to information, democratization of everyone’s ability to communicate their message to a broader world.

20+ years later, we all realize that everyone includes a lot of assholes. From a former president of the United States to random evil-doers in the former Soviet Union, there are people who use the technologies that so many of us well-intentioned people worked so hard on to spread hate, to attack democracy, to spread lies.

Misinformation: things that are not true. Disinformation: deliberately created untrue things. Unlike a simple mistake, misinformation is widely spread about. Unlike a lie, disinformation is widely spread about, too. “Diffused,” if you prefer a technical term. “Propagated.”

People like me who had a hand in developing the kinds of technologies that assholes use to propagate misinformation and disinformation have–belatedly, I would say–begun to try to address the kinds of problems that we helped create. One of these is a shared task on detecting online misinformation. A “shared task” involves a bunch of computer-sciencey-types getting together to define a task–say, finding emails that would be relevant to a court case. They come to an agreement about the definition of the task, about the right contents for a shared data set on which to evaluation performance on that task, and a metric for evaluating performance on it. You put together a schedule, everybody goes off and builds a computer system for doing the task, you distribute the data, and on some agreed-upon date, everybody submits their systems’ output to the people who organized the task. Then everyone gets together for a workshop in which we compare systems, compare outputs, and see what we can learn from those comparisons.

A day or two ago, an email appeared in my inbox about just such a shared task. Its goal is to deal with misinformation on the Internet. That’s a pretty goddamn big thing to take on, though, isn’t it? So, the participants agreed on a subpart of the misinformation problem that is a bit more tractable:

The TREC Health Misinformation track fosters research on retrieval methods that promote reliable and correct information over misinformation for health-related decision making tasks.

Right away, we know some of the ways that the organizers have defined their hopefully-tractable task definition:

  1. The word retrieval suggests that participants will be given a set of documents, and that their output should be documents from that set. This mimics the basic structure of the World-Wide Web: a set of documents (on a loose definition of the term “document”) that users search in order to find information.
  2. The word health-related suggests that participants will not need to be able to deal with every possible kind of misinformation–only health-related misinformation. This makes the task considerably more (potentially) achievable, and given the amount of misinformation that has recently been spread on health-related issues such as the current global COVID19 pandemic, there is potential benefit to the world as a whole if it can be accomplished. (Notice how I snuck in there the inference that health-related is a word, not a something…more than a word? I don’t actually think that–just showing you how discourse works.)
  3. Promote reliable and correct information over misinformation refers to a common aspect of any “retrieval” task (see #1 above): your system is expected to present not just a set of documents, but a ranked list of those documents. Think about it like the page of results that Google gives you when you do a search: you want the most relevant web page to be at the top of the page, not at the bottom, right? So, that’s what the shared task organizers are asking your system to do: rank correct information over misinformation. Of course, if all of the web pages that your system presents to the user are correct, then that is wonderful. (Normally only the top results are considered in terms of scoring your system’s performance.)

Want more details? See the TREC Health Misinformation Track web page. Note that all opinions expressed in this post are mine, and they especially do not represent those of TREC, the Text Retrieval Conference, an organization that has run shared tasks for…over twenty years now, wow… And if you feel like slapping a computational person of my advanced age for having helped to create the stinking swamp that the World-Wide Web has become: go for it. But, also recognize that computational linguists are trying to do something to…wait for it…drain that swamp.

The picture at the top of this post is from an article published by the New York Times on April 13th, 2020.

