February 2020 – Zipf's Law

Seeing the complexity of the simple: Comparative anatomy of the scapula

We can do a lot of things with our arms that a quadruped can’t do with theirs. Throw spears at edible quadrupeds. Throw tomatoes at sopranos. Throw bums out of office.

Being a scientist means finding delight in things that look complicated but are actually governed by pretty simple principles, as well as in things that look pretty simple but are actually pretty complex. Case in point: the scapula.

Common English: shoulder blade.

Technical term: scapula, plural scapulae or scapulas.

French: l’omoplate (nf), la scapula (WordReference)

A scapula looks simple: they’re mostly flat, with a protruberance here and there. Unlike closely associated bones, they don’t get broken very often–a Swedish study found a rate of scapular fractures of 10 per 100,000 people, while another Swedish study found 50 clavicular fractures per 100,000 people.

And yet: comparing the scapula across species, you see all kinds of interesting shit. The point of comparative anatomy is that you can understand something better if you compare it to other ways that it could be, but isn’t. So: let’s compare some scapulae.

The most obvious thing about the scapula is that it is positioned differently in different species. The basic situation for most living things with limbs is this: you’re a quadruped (i.e. have four legs), and the scapula is located on the side of the trunk. In contrast: look at a human, and the scapula is on its back. Compare the position of the scapula in this lovely picture of a horse:

horse skeleton — The axis of the scapula is on the line between points A and B. Picture source: http://www.horsecoursesonline.com/college/conformation/lesson_two_893.htm

…with the position of the scapula in this lovely picture of a human:

0d2300b469dd6a2efc1ba75783a74ef5 — Picture source: https://www.pinterest.com/pin/547891110909216204/?lp=true

…and you see the difference. It’s even more striking when you look at our closer relatives. We are primates, and specifically, members of a group of primates known as apes, and even more specifically, of the great apes. One of the biggest differences between us and our various and sundry primate relatives is that we are full-time bipeds. Autrement dit: we walk upright, all the time. In contrast, monkeys–which are primates, but not apes–are full-time quadrupeds. Going along with this difference in locomotion is a difference in the position of the scapula: it’s on our back, but on a monkey’s side.

Here’s a really nice view of a primate (left) and human (right) trunk. Looking at the left side, labelled “monkey,” you see the typical quadruped architecture: the scapula is on the side of the chest cage. On the right side, labelled “human,” it’s a different story: the scapula is on the back.

The arrows in this illustration make an important point: primates also have the typical quadruped chest cage, which is relatively narrow in comparison to its depth. In contrast, the human chest cage is sorta flat–relatively wide in comparison to its depth. (Remember that the human skeleton in the picture is viewed from above, as if you had ripped someone’s head off in order to shit down their neck. (Sorry–a little sailor-talk there. Unlike Trump, I served my country.) In contrast, the monkey is being viewed from the front–I have no great analogy for you here.)

chest_compar (1) — Picture source: https://evolution.berkeley.edu/evolibrary/article/0_0_0/lines_05

Amongst primates in general, there is quite a bit of variability. Why? Well, there’s quite a bit of variability in the extent to which they are quadrupedal versus bipedal. There’s quite a bit of variability in the capacity of the creature to do stuff with its hands over its head. Here’s a nice layout that shows aspects of the shoulder anatomy across a range from true monkeys, to great apes, ending up with the ape-iest of us all: the anatomically modern human. Start with the sacred monkey in the upper-left corner, and the scapula is clearly on the side of the thoracic cage. End with the human in the bottom-right corner, and it’s clearly on the back. In between…well, gibbons are (lesser) apes, while chimps and gorillas are great apes, like us.

324233_1_En_1_Fig6_HTML — Relative positions of the scapula in monkeys versus apes: the scapula is on the side in monkeys, on the back in apes. Note the angle of the clavicle, too: the more dorsal the scapula is, the more perpendicular the clavicle is to the midline. Picture source: https://link.springer.com/chapter/10.1007/978-3-662-45719-1_1

What functional difference goes along with this structural difference? Well: the quadrupeds are really good at locomotion–it’s difficult to think of a quadruped that can’t outrun a human. Try to catch your dog or your cat for a trip to the vet–good fucking luck, buddy. But, quadrupeds also tend to have a big limitation: although their front limbs are very good at moving back and forth–see above about moving fast–they suck at anything else. For example, we can make big circles with our arms; we can spread them. We don’t have the speed of a quadruped, but we can do a lot of things with our arms that a quadruped can’t do with theirs. Throw spears at edible quadrupeds. Throw tomatoes at sopranos. Throw bums out of office.

There are two broad families of quadrupeds that have their scapula on their back, and they’re pretty fucking interesting. Come back next time for the details.

ResizedImage300286-Skeleton-Leg-Bones — Position of the scapula in the dog. Picture source: https://janedogs.com/dog-anatomy-terminology/

English notes

There are a lot of expressions that involve the shoulder. For example, to stand shoulder to shoulder with someone has a literal meaning of standing close to someone (Merriam-Webster), and a figurative one of being united with someone, sharing a goal with them (Merriam-Webster). Example:

This resolution was offered in response to President Trump standing shoulder to shoulder with Putin while the Russian President offered the Special Counsel a chance to interview twelve Russian Military Intelligence Officers who’ve been indicted for conducting “large scale cyber operations to interfere with the 2016 presidential election” in exchange for politically-motivated Russian interrogations of U.S. citizens. President Trump initially endorsed Putin’s cynical ploy as “an incredible offer” and during yesterday’s White House press briefing President Trump’s spokesperson said he was still considering it. https://www.reed.senate.gov/news/releases/us-senate-votes-98-0-to-tell-trump-not-to-hand-over-american-citizens-to-putin

To give someone the cold shoulder is to intentionally treat them in a way that is cold or unsympathetic (Merriam-Webster); I would add the meaning of intentionally avoiding or ignoring them. Example:

Most Twitter reactions seem to compare the president’s behavior to that of a child—which is pretty much on the money with what we’ve been saying since the start of this tale. Sure, it’s an improvement over the cold shoulder he gave German Chancellor Angela Merkel when she extended her hand for a handshake during her March White House visit. If the president doesn’t pout does he still earn, like, a half-star on the behavior chart? Even still, pushing is not encouraged, young man. https://www.vanityfair.com/style/2017/05/trump-nato-shove

If you are the kind of person who would watch and carefully re-watch Dubstep Cat videos to see just how far back his arms do and don’t go: we should probably be planning our wedding.

https://youtu.be/aQeIDhz-_eg

These are all true stories: or, Language Generation For Dummies

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer.

In the 1990s, a surgeon was about to amputate a woman’s foot. To his surprise, he found that there were other ways to treat her condition, and the woman’s foot was saved. The thing that made the difference was a new technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, and one could be forgiven for assuming that there is nothing left to be learned about it.

Over the course of the past year or two, I’ve been writing a book about writing about what I do for a living. Call it data science, call it natural language processing, call it machine learning–any way you slice it, the structure of the papers that we write in order to spread our little discoveries around is always pretty much the same.

In the 1990s, an emergency room physician noticed the outbreak of what became an epidemic of venereal disease in his American city. He was able to find a new treatment for the disorder, a bacterial infection known as chancroid, and the spread of the painful genital sores was halted. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, to the point that one might think that there are no open research questions left in the field.

For people like me, that always inspires the same question: could I write a REALLY simple computer program to do this for me?

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved. The thing that made the difference was a new technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. We know that technology as information retrieval. By now it is a familiar technology, to the point that one might think that there are no open research questions left in the field.

One of my major obsessions in life is this: how do you START a research paper? More precisely: how do you start a research paper in a way that isn’t BORING?

In the 1990s, an emergency room physician noticed the outbreak of what became an epidemic of venereal disease in his American city. He was able to find a new treatment for the disorder, a bacterial infection known as chancroid, and the spread of the painful genital sores was halted. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, and one could be forgiven for assuming that there is nothing left to be learned about it.

Well, I do what I do in the medical field, specifically, and in the medical field, we have a saying: if no one dies in the first two sentences, your work is going to be ignored.

In the 1990s, a surgeon was about to amputate a woman’s foot. To his surprise, he found that there were other ways to treat her condition, and the woman’s foot was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. It is now an old technology, to the point that one might think that there are no open research questions left in the field.

I’m a little bit less pessimistic than that. Inspired by openings like this favorite from a paper by Daniel Gildea and Daniel Jurafsky,

Recent years have been exhilarating ones for natural language understanding. The excitement and rapid advances that had characterized other language-processing tasks … have finally begun to appear in tasks in which understanding and semantics play a greater role. For example, …

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, to the point that one might think that there are no open research questions left in the field.

…I wonder if it couldn’t work as well to save someone’s life in the first two sentences?

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. We know that technology as information retrieval. Today, it is a familiar technology, to the point that one might think that there are no open research questions left in the field.

So, when I set out to write a simple computer program to generate the openings of research papers on a topic called information retrieval, I went looking for stories where someone landed in a doctor’s office–and came out of it better than one might have expected.

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, and one could be forgiven for assuming that there is nothing left to be learned about it.

Finding those happy endings was the hardest part of this whole little after-dinner project.

In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. We know that technology as information retrieval. Today, it is a familiar technology, and one could be forgiven for assuming that there is nothing left to be learned about it.

From there, it was a simple matter of putting together a set of reasonable first sentences:

my @firstSentences = (“In the 1990s, a physician examining an odd wart realized that it was, in fact, cancer. Surgery was done, and the patient’s life was saved.”,

“In the 1990s, a surgeon was about to amputate a woman’s foot. To his surprise, he found that there were other ways to treat her condition, and the woman’s foot was saved.”,

“In the 1990s, an emergency room physician noticed the outbreak of what became an epidemic of venereal disease in his American city. He was able to find a new treatment for the disorder, a bacterial infection known as chancroid, and the spread of the painful genital sores was halted.”);

In the 1990s, a surgeon was about to amputate a woman’s foot. To his surprise, he found that there were other ways to treat her condition, and the woman’s foot was saved. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. By now it is a familiar technology, and one might think that there was nothing more to be learned about it.

…and a set of reasonable second sentences…

my @secondSentences = (“The thing that made the difference was a new technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer.”,

“The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer.”);

…and so on. I use a command called rand to pick a random sentence from the sets of possible first sentences, possible second sentences, and so on…

my $first_sentence = rand @firstSentences;

my $second_sentence = rand @secondSentences;

…and then I just glue my randomly selected first, second, third, fourth, and fifth sentences together…

$beginning_of_article = $first_sentence . $second_sentence . $third_sentence . $fourth_sentence . $fifth_sentence;

…et voilà! With only two options at each position for five different “sentence positions” (first sentence, second sentence, etc.), I have 2 to the 5th power (or 5 to the second power–I can never remember) possible openings that will work for any paper on information retrieval, ever. That’s more papers on information retrieval than I will write between this evening and the day that I retire or die!

In the 1990s, an emergency room physician noticed the outbreak of what became an epidemic of venereal disease in his American city. He was able to find a new treatment for the disorder, a bacterial infection known as chancroid, and the spread of the painful genital sores was halted. The new information came from a novel technology that allowed the health care provider to search all of the articles in the National Library of Medicine via a computer. Known today as information retrieval, that technology is arguably the “killer app” that makes the Internet as we know it today useful in the daily life of much of the world. Today, it is a familiar technology, and one could be forgiven for assuming that there is nothing left to be learned about it.

My VERY simple little program does something called language generation. That means that it produces output in “natural,”—i.e., human—language. You can do REALLY fancy things with it–Google can now use its super-sophisticated language generation technology to produce entirely bogus news stories, novels, letters—or scientific articles, for that matter.

So, two differences:

Google’s shit is super-complicated, and mine is super-simple.

Google’s shit is completely made up, and mine is completely true.

Am I fucking kidding? Is Donald Trump beholden to Vladimir Putin???

Technical geekery

Yes, I omitted the detail that rand() returns an integer that you then use as an index into the array of sentences, not a random sentence.
Yes, I omitted the whitespace that you have to put between the sentences.
Yes, I omitted the my in front of the output variable.
Yes, I will submit one of those as the opening of a paper on information retrieval–how the fuck could I not????

Academic writing and how not to start a paper: Episode 2

Nobody gives a shit about medical terminology. Rethink your opening sentence.

This post is part of an occasional series on writing about academic research. Writing about writing about academic research on my blog allows me to avoid writing my book about writing about academic research. Techniques that other authors have used to avoid actually working on what they’re supposed to be working on include doing the laundry, abusing alcohol, and committing suicide. Personally, I think that writing about academic research on my blog is more adaptive than those techniques. Well, obviously it’s not more adaptive than doing the laundry…

Introduction

Medical terminology is one of the best-studied aspects of the English language. This is important because… But, the complicated structure of its words makes it difficult to translate into other languages. This is a problem because… To address this issue of decomposition, this paper describes a simple parser for biomedical terminology. This would allow…

To: Zipf

Subject: Comments on draft

Zipf, nobody gives a shit about medical terminology. Rethink your opening sentence.

Introduction

When the first author’s grandmother had to visit her doctor–an increasingly common occurrence as she grew older–she understood more or less nothing that she was told. But, she would take careful notes–and then call the first author to find out what it all meant.

What was the problem here? The first author’s grandmother was not an educated woman, but she was no dummy–a quick-witted and articulate woman who loved jokes of considerable linguistic sophistication. The issue here was not the first author’s grandmother. Rather, it was the language used by her doctor–specifically, the highly specialized vocabulary of medicine. The first author was a medic in the military, and subsequently was awarded a doctoral degree in linguistics, writing a dissertation on biomedical language. He has no problem understanding medical terminology. But, for a normal person, the language with which their physician communicates with them can be every bit as much of an obstacle to their treatment as the rationing of care that characterizes the American health care system.

To: Zipf

Subject: Comments on draft

I wish I’d known your grandmother.

The oldest paper on medical terminology in PubMed/MEDLINE, the National Library of Medicine’s database of 27 million scientific publications. The piece was published in 1911–well over a hundred years ago as I write this–and the problems that it raises still have not been solved. No author is listed.

For more Zipfian ravings on the topic of writing about academic research, see here. Or, buy my book. Oh, wait–I’m writing this blog instead of working on the book… Damn it…

	Anonymous on The many ways to spell “…
	Anonymous on Nightmare after nightmare: How…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …