Roads near the areas of combat in Ukraine have typically been heavily damaged by shelling, tracked vehicles, and the like, and are hell on the vans that extraction teams rely on. If you do not take care of yours, it will not be able to take care of you–or of the civilians for whose lives you are responsible…
Many foreign volunteers in Ukraine spend their time doing “hot extractions:” evacuation of civilians from the front line.That means driving into an area under fire, quickly loading little old grandmas into your vehicle, and getting out of there–fast. The typical crew will include a driver, a navigator, and a medic, and each of them has a crucial role to play in getting everyone to safety. The following list will help you make sure that your vehicle contains everything that you need. Following the list, you will find the rationale for each item, as well as explanations of obscure words (replacing our usual English notes). The driver should go through this list daily, and the team leader should verify that you did so. Have something to add? Tell us about it in the Comments section.
Daily vehicle checklist
Oil and fluids checked
Battery expiration date checked
Lights and turn signals checked
Spare tire air pressure checked
Jack in place
Medical bag in place
Tool kit in place
Crew snacks and water
Water for civilians
Litter present (it’s not what you think–see below, or this video)
Phone cables present
Fire extinguisher present
Jumper cables present
Problems noted:Here you should document anything that you need to take care of before departing on a mission.
Roads near the areas of combat in Ukraine have typically been heavily damaged by shelling, tracked vehicles (tanks, armored personnel carriers, etc.), and the like, and are hell on the vans that extraction teams rely on. If you do not take care of yours, it will not be able to take care of you–or of the civilians for whose lives you are responsible… Hence this checklist. Here are explanations of some of the words that appear on the list.
Spare tire: This is the extra tire that you will use if your tire is damaged. “Spare” means an extra thing that you have in case you need it. “Spare tire” is also a slang word for the fat hanging off of the waist of a man. (On a woman, it’s a “muffin top.”)
Jack: This is the mechanical device that you use to raise a vehicle in order to change a flat tire. This video will help you learn this obscure English word.
Medical bag: This is an easily identifiable bag containing more than you carry in your individual first aid kit (IFAK). In my group, the medical bags include everything that goes in our IFAKs, plus a splint, eye covers, a windlass for improvising junctional tourniquets, a radiation monitor, trauma scissors, and extra gauze. Lots of extra gauze.
Litter: This is a device for carrying an injured person, or more often, an old person who cannot walk. Speed is of the essence if you want to avoid Russian artillery figuring out exactly where you are, so anyone with limited mobility needs to be moved by you, not hobble along at their own speed. This video will help you learn this obscure meaning of the word litter.
On January 6th, 2023, two British volunteers, Andrew Bagshaw and Christopher Parry, went missing near Soledar in the Donetsk region of Ukraine. As I write this, in late January 2023, there have been no verified sightings of them, and they are presumed dead or captured. Their vehicle was found soon after their disappearance–locked. Presumably they had to abandon it and head out on foot. You should prepare and practice for this situation. The protocol that I am suggesting for you here is based on the US Navy’s procedure for abandoning a sinking ship. Air crew members: if you can add something, please tell us about it in the comments.
Count to ensure that everyone is present or accounted for.
Check that all survival equipment is on someone’s back or in their pockets: BOB, medical bag, communication tools, escape route maps, water and food.
Destroy all sensitive information.
Consider destroying or disabling the vehicle.
Everyone present or accounted for: “everyone” means all team members and any passengers. Passengers are most often civilians who are being evacuated from the front (with attendant communication problems related to lack of a shared language), but may also be journalists (who can present their own set of challenges). If you must leave behind bodies, note their location. Take their passport, wallet, telephone, and any other personal effects or useful items (for example, individual first aid kit (IFAK), helmet, body armor, water). Consider leaving identification of some kind with them, or writing identifying information (including nationality so that the appropriate embassy can be contacted) on their clothes or body. Also consider taking a lock of their hair, some bloody clothing, or a cheek swab for later DNA matching. (Back in the day, American medics took fingerprints of bodies that would be buried overseas. See below for a link to an article on digital collection of fingerprints–no pun intended…)
Bug-out bag (“BOB”): this is an easily carryable container, typically a small- or medium-sized backpack, containing everything that you would want to have with you if you had to abandon your home (or office, or vehicle) in an emergency. You can find plenty of advice elsewhere on what a BOB should contain. Customize it for your operating environment, and update it as that changes, as the weather changes, as your crew size changes, etc.
Medical bag: this contains more things than one would have in an individual first aid kit (IFAK, or аптечка). Ours have the usual things, plus a splint, extra bandaging materials, and a radiation monitor.
Destroy all sensitive information: Destroy information on evacuees you have picked up or were on the way to pick up. Delete from your phone/destroy paper maps that show your safehouse, checkpoints, military units, humanitarian centers, routes… You should have left for your day’s mission with a separate map that shows the area that you would have to walk through to reach safety, and nothing else.
Destroying or disabling the vehicle: Even a vehicle that no longer runs is a valuable source of spare parts for the enemy in a war where logistics has been a/the major struggle for both sides. It also might contain sensitive information that you missed when you left it. I don’t know shit (to “not know shit” means to not know anything at all) about destroying vehicles–if you do, dear reader, please tell us about it in the Comments section. (In the Navy, you place explosive charges in relevant places so that the ship is guaranteed to go down.)
Notify someone: Tell them who is with you, where you are, and where you are heading.
The British reporter Tom Mutch knew one of the two missing British volunteers. He describes them as brave guys doing life-saving work–and as under-equipped and ill-prepared for working in a combat zone. You do not have to be ill-prepared. Practice this protocol before you need it. Andrew Bagshaw and Christopher Parry: I hope we see you again some day.
Bring these medications to Ukraine when you come to volunteer and your contribution will be even bigger.
Moxifloxacin is what an American combat medic will give you if you have a penetrating wound. I have no idea how to find it in Ukraine, but your doctor can give you a prescription for it. I was very, very happy to have some with me here when a frightened cat sunk a fang very, very deep into my arm. (I was also very, very happy to have clear ballistic glasses with me when I was trying to get her out from under a bathtub while she was trying to scratch me to death, but that’s a topic for a post about ballistic glasses, right? Fang explained in the English notes at the end of the post.)
Do not bring aspirin or ibuprofen. US Department of Defense guidelines say not to take them for a week before entering a war zone. And, yes: since Putin deliberately targets civilian targets of no military value, all of Ukraine is a war zone.
Meloxicam is what an American combat medic will give you for any battlefield injury. See above regarding the situation in Ukraine.
Acetaminophen (sold in the US as Tylenol or in generic form) is the third thing that an American medic will give you if you are injured.
The antidiarrheal medication of your choice. You shouldn’t travel ANYWHERE without this anyway.
All medications that you normally take. Bring more than you think you will need. All problems in Ukraine are supply chain problems, so do not assume that you will be able to buy ANYTHING wherever it is that you happen to find yourself. Yes, I do understand that it is difficult to get more than your allotted quantity of prescription medications in the US, since your insurance company rations your health care.
fang: “A fang is a long, pointed tooth.” (Wikipedia) Fang often occurs with the verbs to bare and to sink into. Examples from Sketch Engine, purveyor of fine linguistic corpora and tools for searching them:
He sank his fangs into her shoulder.
Spike longed to sink his fangs into Xander’s hot flesh.
“Then you deserve this,” he said as he sunk his fangs into the man’s throat and drank hungrily.
Sink those fangs into one of our mini milk chocolate caskets.
How I used it in the post: A frightened cat sunk a fang very, very deep into my arm.
This feels less to me like a war of one country against another than of the Russian army versus a bunch of defenseless grandmothers…
This message from a friend who is volunteering as a medic in Ukraine showed up in my Inbox the other day…
Yes, I’ve been in Ukraine since the beginning of May. I’m a medic, as I was in the Navy. You asked about my safety. “Safe” is a very relative concept… When I’m not in the field, my morning routine consists of making a cup of coffee, grabbing a pack of cigarettes, and then sitting on the balcony to watch the morning rocket attack on my entirely residential neighborhood. I say “morning rocket attack” because there has been one almost every day since I got here. Probably sounds horrifying, but since the only people carrying guns around here are soldiers, I actually feel safer in the city than I do in the US. When I’m in the field, it’s a different story. I mostly do evacuations of civilians from the front. The Russians enthusiastically shell refugee collection points and clearly marked emergency vehicles, and evacuating civilians from the front means going to refugee collection points in clearly marked emergency vehicles. As it happens, I have a relatively high tolerance for danger, so although it’s certainly not “safe,” that’s fine. What’s not OK is that because the Russians hit those places and vehicles so hard, and by this point a large proportion of the people who have not yet left the front are old folks, this feels less to me like a war of one country against another than of the Russian army versus a bunch of defenseless grandmothers…
in the field: In a military context, this means being out doing whatever it is that you do. Examples:
Your service member is headed out into the field and it looks like the entire military gear issuing office is located in your living room. No matter what their training mission might be, they will want to prepare and pack a few things for the field that will make things a little bit easier while they are “work camping.” These are things that have been suggested by actual service members who have been in the field for countless hours, days, weeks, and even months. Source: 35 things every service member needs for the field, from the Daily Mom web site.
A good razor with a shave gel that protects throughout the day is key for a service member who is shaving in the field. Source: the Daily Mom web site.
Untreated anxiety in children is associated with all sorts of bad things later in life. The good news is, if you do treat it, it can usually clear up.
Untreated anxiety in children is associated with all sorts of bad things later in life–mood disorders, alcohol and drug abuse, suicidality, underachievement in school, and low earning potential. The good news is, if you do treat it, it can usually clear up.
It’s usually easier to prevent something than it is to treat it, so it would be great if we could predict which kids are likely to develop chronic problems with anxiety, and head it off at the pass. That might actually be plausible, since anxiety has a trajectory of development. As Strawn et al. put it, “…the adolescent with panic and generalized anxiety disorders was once a boy with separation anxiety disorder and…a toddler with extreme shyness…”
But, how would one do that prediction?
I am a happy practitioner of the write-about-what-you-don’t-know approach to scribbling. Right at this moment I am realizing that I don’t know very much about the development of anxiety in children and adolescents. So, I am reading the paper Research Review: Pediatric anxiety disorders–what have we learnt in the last 10 years?, by Jeffrey Strawn, John Walkup, and a bunch of other folks. It describes a number of risk factors for the development of a variety of anxiety disorders. The risk factors fall into categories of cognitive bias, behavioral tendencies, family environment, parental disorders, substance abuse, and environmental exposure.
When I’m trying to understand a new disease, I sometimes play this game: walk into a restaurant, look around, and pick out the person most likely to suffer from it. (At my advanced age, that is often me, but that’s another story.) For the kinds of risk factors that are related to pediatric and adolescent anxiety disorders, that’s not really an option, so instead, I’m trying something different: I’m writing about some kids who I would expect (based on my limited knowledge) to develop anxiety–or not. So, here’s what we’re gonna do today: we’ll look at my little vignettes, say for each one whether we think the kid is at high risk or low risk of developing an anxiety disorder, and then explain why. I have written these vignettes in the style of the notes that a physician (doctor) writes when they examine a patient. That’s why it sounds odd. (We’ll talk about some of those oddities in the English notes.) Ready? C’est parti.
Likely outcome? John is at high risk of developing an anxiety disorder. His behavior at school–not wanting to be dropped off even after three weeks from the beginning of the academic year–is a kind of behavioral inhibition, and behavioral inhibition is a risk factor for later development of anxiety disorders. Childhood separation events are, too, and John has experienced these multiple times–first with the long and continuing hospitalization of his mother, and then due to repeated changes of foster care placements. Having a parent with an anxiety disorder or depression also increases a child’s risk of developing an anxiety disorder, and John’s mother has both of those–that’s why she’s been hospitalized for so long.
Probable outcome? Mary is at high risk of developing an anxiety disorder. Exposure to environmental contaminants increase a child’s likelihood of developing subsequent problems with anxiety. Like John, she has experienced multiple separation events, with her mother being sent to prison and her father dying soon after. Over-controlling parenting, such as she is getting from her grandmother, also increases the risk of developing an anxiety disorder. “Family accommodation,” or changes made to group behavior in response to a child’s early anxiety symptoms, also increase the likelihood of developing an anxiety disorder, so despite the use of “but” in the example, this is not a good thing for this kid.
What does the future hold? Harry is at low risk of developing an anxiety disorder. He clearly does not have a fear of trying new things, and is not shy. Although he lost his father, it happened so early in his life that he probably did not experience it as a separation event–recall that his father was deployed to Iraq–and he has always had a close relationship with his stepfather. The lack of alcohol, drug, or tobacco use is relevant in that these kinds of substance abuse (and I say that as someone who cheerfully enjoys fine American tobacco products) are often associated with anxiety disorders.
To head something off at the pass:to take action in order to prevent something from happening. This is a cowboy thing: a pass is a narrow path at a low point in a mountain range that lets you get through the mountains without having to climb them. You can prevent someone from getting somewhere that they’re trying to go if they have to go through a pass to get there–they’re narrow, and therefore easy to block. Some examples of the use of this expression:
Dilbert tries to head off criticism of Trump at the pass by defining it as coming from some “other side.” (Source: Twitter)
The apologists for Trump & Trumpies sure managed to tone down his rhetoric & head off his bigotry at the pass, didn’t they. (Source: Twitter)
How I used it in the post: It’s usually easier to prevent something than it is to treat it, so it would be great if we could predict which kids are likely to develop chronic problems with anxiety, and head it off at the pass.
Now let’s look at some of the odd aspects of medical English:
to refer to: in this sense, to refer someone to a treatment facility is to send them to that facility for consultation by a specialist. Your insurance company will usually require you to have a referral from your primary care provider for this kind of thing. How I used it in the post:
He is referred to the psychiatry clinic because of bursts of tears and screaming when dropped off at school
She is referred to the environmental health clinic after routine screening at her school
to present with something: this expression is used to describe a patient’s state when first meeting with a health care provider. How I used it in the post:
Harry is a cheerful, outgoing adolescent who presents in the Emergency Department with exquisite point tenderness in the right femoral area after a fall experienced while practicing his newly-discovered passion for rock-climbing.
In which language displays interesting statistical properties, some people get fired, and I learn a few words about the Army.
Twenty-plus years ago, I got my first job as an actual, card-carrying linguist, working for a company that did things with big collections of linguistic data, using them to improve computer programs that did speech recognition, i.e. figuring out what words a person is saying.
One fine day the people that gave us the vast majority of our income sent their big-collection-of-linguistic-data specialist to visit us. We demonstrated to him the computer program that we had built to answer the question how can you tell when a big collection oflinguistic data is big enough? We pointed out how to spot the tell-tale sign on a graph that means “it’s big enough.” “Oh, that just means that linguistic data is bursty.”
What did he mean by “bursty?” We had a guess, but weren’t exactly sure, and given that his company paid us a lot of money and he was their expert, my boss thought it best not to push back. A few months later, they declined to renew our contract, and our owner laid everyone off and went away to do something else. Was it because we didn’t push back on the big-collection-of-linguistic-data expert’s dismissiveness? Probably not–our little company committed far bigger errors, and on a sadly regular basis. Whatever–the job market for computational linguists was not terrible in those days (it’s pretty wonderful now), and I found my second job as an actual, card-carrying linguist pretty quickly. But: burstiness is pretty important, and it continues to bump into my life today, in various and sundry ways, some of which will be of interest to readers of this blog.
What burstiness means: per Wikipedia,
In plain English: burstiness is present when something doesn’t happen for long periods of time, but then happens a lot, and then goes back to not happening very often. Some things that have this characteristic: hurricanes, and pandemics. Statisticians care about burstiness because bursty things are difficult to characterize with normal statistics, so you have to come up with new techniques to work with them; people like disaster planners and public health experts care about those statistics because it is difficult to predict, and therefore to plan for, things that have weird statistical properties.
From a computational linguist’s perspective, burstiness is important because in big collections of language, you don’t see new words very often, but when you do, sometimes you see a lot of them at once. If you’re trying to do something like build a dictionary for a computer program, you typically do that by finding all of the words in a big collection of linguistic data. But, how do you know when your collection of linguistic data is big enough? See above; the problem is that if you kept growing the collection, you know that there will be bursts of new words, but you can’t keep growing your collection forever–at some point, you have to stop and work with what you have at hand.
Many of our dear fellow readers are engaged in learning a language that they don’t already speak. I am one of them–if you have been reading this blog for a few years, you have followed my feeble attempts to learn la langue de Molière, also known as “French.” By now I know the language well enough that I can pick up a book in it and not have to turn to a dictionary very often. But, when I do, it typically happens like this…
Right at this moment, I’m reading Paris brûle-t-il ?, “Is Paris burning?,” the work of reference on the liberation of Paris. I typically get through about three pages before I have to look up a word. But, then this morning, I’m reading about the French 2nd Armored Division rolling from Normandy to Paris when I come across this sentence. I had to look up all of the words in bold face:
l’automitrailleuse: a light armored vehicle.
le spahi: native cavalry trooper of the Maghreb.
le calot: garrison cap in English; when I was in the Navy, we called them “cunt caps.” A calot has no brim or visor, and therefore can be folded flat and tucked under the epaulet of a military jacket.
After that, it was back to my normal rate: about one word every three pages. That certainly counts as “not very often,” and is pretty good for a non-native speaker. To then jump to three words in a single sentence, and then go back to my base rate of one word every three pages, is a good example of burstiness. Once again, we see why one might right a blog like this one–a blog about the statistical properties of language and their implications for people who are trying to learn one. What happened to the dismissive big-collections-of-linguistic-data expert? I don’t know for a fact, but I do know that people who are dismissive of the opinions of others don’t typically have much professional success. Personally, I took what I learned from the experience of working at a failed software start-up to do a better job of being a computational linguist, and have had a wonderfully fun time with it. Want to try a career in computational linguistics yourself? Start here if you are not a graduate student, or here if you are, and I hope you have as much fun with it as I have!
Despite what its name would lead one to think, an automitrailleuse does not necessarily carry a machine gun. Here some pictures of modern automitrailleuses. You’ll notice that some of them look a lot like tanks. The salient differences are that (1) they weigh less, and (2) they have wheels, not treads.
My first language is (American) English, but I speak French well enough that if I want French people to believe that I’m an American, I have to convince them of it. Comparing and contrasting French and American political appartenances helps, as does my ability to explain the difference between felonies and misdemeanors and how they affect the length of your prison sentence. Why it doesn’t occur to me to just speak English with them, I couldn’t tell you–I’ll have to try it some time…
My ability to speak French well doesn’t mean that I don’t make absolutely stupid mistakes, though. Case in point: propreté and propriété. One means “cleanliness” and one means “property,” but if I need to say either “cleanliness” or “property” in French, which of those two words propreté and propriété will come out of my mouth is pretty random. How random? I’d guess a 50/50 chance for either of them. So, how often do I say the right/wrong word? Let’s figure it out.
First, we have to make some assumptions. Assumption #1: the probability of me needing to say cleanliness and the probability of me needing to say property are equal. If we don’t make that assumption, then we have to adjust the calculation of how often I say the wrong word to account for how often each of those two words get said. By me. In French. Complicated? Yes. Hence: Assumption #1.
Assumption #2: the probability of me saying the right word and the wrong word are equal. Otherwise, we have to adjust our calculations of how often I say the wrong word to account for different probabilities for each. By me. In French. Complicated? Yes. Hence: Assumption #1, and Assumption #2.
With those assumptions in place, let’s figure out the possible outcomes in a situation where I need to say one of those words: “cleanliness:”
I need to say “cleanliness” and I say propreté (the right word)
I need to say “cleanliness” and I say propriété (the wrong word)
We have two possible outcomes (that’s the technical term), so the probability of either of them is 1/2, or 0.50, or 50%.
It works the same way if I need to say “property”–there are two outcomes:
I need to say “property” and I say propreté (the wrong word)
I need to say “property” and I say propriété (the right word)
Back to our original question: how often do I say the right/wrong word? Well… we need to change the question. To wit: to know how often I say the right/wrong word, we would need to know the probability of me saying every word that I say, and calculate the probabilities of me getting them right/wrong.
However: I don’t give a fuck about that. What seems funny to me about the fact that I am equally as likely to fuck up the words cleanliness and property is that they’re so fucking…common. I mean, I don’t have a problem with the vocabulary for talking about, say, why we have the Electoral College or why Beaux Arts Victorian houses aren’t built any more, but I can’t talk about the fact that if my little corner of New Orleans gets flooded in the next couple days, I am going to have some hot, sweaty, bug-infested work ahead of me as soon as I can get a plane ticket back there. Yes, friends and family: I am safe and sound in Colorado.
Note to self: propreté is pretty close to propre, “clean”–maybe that can help me remember? And for practice (just in case writing this blog post wasn’t enough), here are some sentences to practice with, courtesy of the Sketch Engine web site, your home for fine linguistic corpora and the tools for searching them. Scroll down for the answers:
Je suis en train de vendre ma ______.
Il y a des efforts à faire concernant la ______ de la piscine.
Comment conserver la ______ d’une salle de bain ?
Dans les années 60, on a étudié les ______ de trous noirs.
La couleur blanche est rattachée généralement à la pureté et à la ______ .
Balai vapeur hyper polyvalent – pour plus de ______ dans la maison !
Ton corps même n’est pas ta ______ ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
Les jeux vidéo ne sont pas la ______ exclusivede ces hommes blancs cishétéros.
Les oreilles : vérifiez régulièrement la ______ des oreilles de votre chien.
Actuellement la ______ appartientà la commune.
Tous les jeux flash présent sur le site restentla ______ de leurs auteurs respectifs.
Votre langue, cher monsieur Walder, est révélatrice de l’état de ______ du sexe de votre femme, point barre.
Les sanitaires sont d’une ______ immaculée et il y a même des machines à laver.
…les métaphores qui transposent certaines ______ d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
Entretien des trottoirs : Chaque Soiséen est responsable de l’état de ______ du trottoir qui borde sa______.
Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de ______ de la ______.
Un matériau aux multiples ______ – résistance, ultra______ – et qui s’adaptent aux dimensions de vos projets.
Il n’a cependant pas les ______ ou la ______ du biométhane naturel.
Il y a des efforts à faire concernant la propreté de la piscine.
Comment conserver la propreté d’une salle de bain ?
Dans les années 60, on a étudié les propriétés de trous noirs.
La couleur blanche est rattachée généralement à la pureté et à la propreté .
Balai vapeur hyper polyvalent – pour plus de propreté dans la maison !
Ton corps même n’est pas ta propriété ; comment pourrais-tu posséder le Tao ? (See Taoist scripture for an explanation.)
Les jeux vidéo ne sont pas la propriété exclusivede ces hommes blancs cishétéros.
Les oreilles : vérifiez régulièrement la propreté des oreilles de votre chien.
Actuellement la propriété appartient à la commune.
Tous les jeux flash présent sur le site restent la propriété de leurs auteurs respectifs.
Votre langue, cher monsieur Walder, est révélatrice de l’état de propreté du sexe de votre femme, point barre.
Les sanitaires sont d’une propreté immaculée et il y a même des machines à laver.
…les métaphores qui transposent certaines propriétés d’une catégorie à une autre : “l’homme est un loup pour l’homme”…
Entretien des trottoirs : Chaque Soiséen est responsable de l’état de propreté du trottoir qui borde sa propriété.
Nettoyage: Les frais de nettoyage (50,00 Euros) vous seront rendus à la fin de votre séjour selon l’état de propreté de la propriété.
Un matériau aux multiples propriétés – résistance, ultrapropreté – et qui s’adaptent aux dimensions de vos projets.
Il n’a cependant pas les propriétés ou la propreté du biométhane naturel.
English notes: In my defense, a big part of my problem, I would guess, comes from the fact that English has the word propriety. The Merriam-Webster web site gives these synonyms for it: decency, decorum, form. Examples:
Zipf, I’m not sure about the propriety of that example about the cleanliness of Mr. Walder’s tongue.
President Obama was the very *model* of propriety. Never once did he say or do anything to make America ashamed of him. (Source: Twitter)
Even inside the nation’s prominent law firms preparing to help President Trump wage a legal war challenging the results of the election, concerns are intensifying about the propriety and wisdom of working for Trump, the New York Times reports. (Source: a tweet from the San Francisco Chronicle)
Computational linguistics takes on the infected swamp that the World-Wide Web has become
In the late 1990s, I worked at a start-up. At the time, it was one of the 25 largest web sites in the world.
Why “largest web site,” and not “biggest web site?” English tends to use “big” to refer to physical objects, and “large” to refer to abstract concepts. Note that I said “tends to”–this is a statistical tendency, not an absolute.
Like a lot of people working for internet-related businesses or causes, we thought that we were making the world a better place. The World-Wide Web was going to democratize so much–access to information, democratization of everyone’s ability to communicate their message to a broader world.
20+ years later, we all realize that everyone includes a lot of assholes. From a former president of the United States to random evil-doers in the former Soviet Union, there are people who use the technologies that so many of us well-intentioned people worked so hard on to spread hate, to attack democracy, to spread lies.
Misinformation: things that are not true. Disinformation: deliberately created untrue things. Unlike a simple mistake, misinformation is widely spread about. Unlike a lie, disinformation is widely spread about, too. “Diffused,” if you prefer a technical term. “Propagated.”
People like me who had a hand in developing the kinds of technologies that assholes use to propagate misinformation and disinformation have–belatedly, I would say–begun to try to address the kinds of problems that we helped create. One of these is a shared task on detecting online misinformation. A “shared task” involves a bunch of computer-sciencey-types getting together to define a task–say, finding emails that would be relevant to a court case. They come to an agreement about the definition of the task, about the right contents for a shared data set on which to evaluation performance on that task, and a metric for evaluating performance on it. You put together a schedule, everybody goes off and builds a computer system for doing the task, you distribute the data, and on some agreed-upon date, everybody submits their systems’ output to the people who organized the task. Then everyone gets together for a workshop in which we compare systems, compare outputs, and see what we can learn from those comparisons.
A day or two ago, an email appeared in my inbox about just such a shared task. Its goal is to deal with misinformation on the Internet. That’s a pretty goddamn big thing to take on, though, isn’t it? So, the participants agreed on a subpart of the misinformation problem that is a bit more tractable:
Right away, we know some of the ways that the organizers have defined their hopefully-tractable task definition:
The word retrieval suggests that participants will be given a set of documents, and that their output should be documents from that set. This mimics the basic structure of the World-Wide Web: a set of documents (on a loose definition of the term “document”) that users search in order to find information.
The word health-related suggests that participants will not need to be able to deal with every possible kind of misinformation–only health-related misinformation. This makes the task considerably more (potentially) achievable, and given the amount of misinformation that has recently been spread on health-related issues such as the current global COVID19 pandemic, there is potential benefit to the world as a whole if it can be accomplished. (Notice how I snuck in there the inference that health-related is a word, not a something…more than a word? I don’t actually think that–just showing you how discourse works.)
Promote reliable and correct information over misinformation refers to a common aspect of any “retrieval” task (see #1 above): your system is expected to present not just a set of documents, but a ranked list of those documents. Think about it like the page of results that Google gives you when you do a search: you want the most relevant web page to be at the top of the page, not at the bottom, right? So, that’s what the shared task organizers are asking your system to do: rank correct information over misinformation. Of course, if all of the web pages that your system presents to the user are correct, then that is wonderful. (Normally only the top results are considered in terms of scoring your system’s performance.)
Want more details? See the TREC Health Misinformation Track web page. Note that all opinions expressed in this post are mine, and they especially do not represent those of TREC, the Text Retrieval Conference, an organization that has run shared tasks for…over twenty years now, wow… And if you feel like slapping a computational person of my advanced age for having helped to create the stinking swamp that the World-Wide Web has become: go for it. But, also recognize that computational linguists are trying to do something to…wait for it…drain that swamp.
The picture at the top of this post is from an article published by the New York Times on April 13th, 2020.