Humans are so good at “resolving” ambiguities that they usually don’t even notice them. Computers, though–computers have no such abilities, unless their designers give them to them.
One of the properties of every known human language is that they are ambiguous. Being “ambiguous” means that something can have more than one interpretation. Humans are so good at “resolving” ambiguities (i.e., figuring out the intended interpretation) that we rarely notice them, but in fact almost everything that you will hear/read or say/write today will be ambiguous in some way or another.
Humans are indeed quite good at resolving ambiguities. If you want to get a computer program to do anything whatsoever with language, though, you have to give it the ability to deal with ambiguity–computer programs are just as incapable of ignoring ambiguity as humans are capable of resolving it. So, one of my standard exercises for students in natural language processing (treatment of language by computers) courses is to have them go through some texts and find the ambiguities. I typically have them do that with cartoons, since their humor is often based on playing with ambiguities. Tomorrow, though, I’ll be teaching at the EUROLAN “summer school” on biomedical natural language processing, so I feel obligated to give the students a biomedical example. Here’s what it’ll be. It’s a text that would be completely typical in a health record (but it is not from an actual patient). I read through it until I found 10 ambiguities, and then stopped–so, you should be able to find at least 10 points of ambiguity here–in just the first two sentences:
CLINICAL HISTORY: This prolonged video/EEG was performed on a 17 year and 4 month-old female. This study was done to completion of Phase I surgical evaluation
TECHNICAL SUMMARY: The patient underwent…
Now, if you’re a normal human, you will not, in fact, be able to find 10 ambiguities in this text–we just don’t notice them, for the most part. And that, in fact, is the point of the exercise. I’ll follow the exercise with an illustration of those 10 points of ambiguity, many–or most–of which the students won’t have noticed. Their computer programs, though–their computer programs won’t be able to miss them, and it’s their very ubiquity that beginning researchers need to have pounded into their heads.
See how many you can come up with, and then watch this space for the (or, at least, some) answers!
At some point in their life, everyone should spend some time in a place where they’ll be stared at
It’s 1981, and my ship has pulled into Istanbul for a week. Being a stupid young sailor, I’m wandering around alone. I pass some old men sitting on a stoop drinking tea (a common pastime for old men in Turkey). One of the old men gets up, walks over to me, spits on a finger, and tries to rub one of my many tattoos off. When he can’t, he shakes his head in disgust and sits down again.
It’s 1981, and my ship has pulled into Istanbul for a week. Being a stupid young sailor, I’m wandering around alone. Going down a busy street, I suddenly find myself surrounded by a crowd of young men. One of the guys emerges from the crowd, and in broken English starts translating for the rest of the crowd, telling me everything that they have to say about how much they love my tattoos.
It’s 2016. I’m waiting in line at an art show in China. A guy walks up to me: excuse me, I can take picture of you with my children? Sure, why not? Smiles all around as pictures are snapped, and we all go back to waiting in line.
My job and my pastimes take me far and wide, and in some of the places that they take me, I look unlike anyone else. Japan, Guatemala, China, Mexico, Turkey–in all of them, I am a “white guy,” a light-skinned, blue-eyed guy in a country where everyone else is brown-skinned, with black hair and brown eyes. In some of those countries, I go places where I may be the only “white guy” that I see all day, and in those countries, I get stared at–a lot. It’s not just me–it’s the experience of any Westerner in those places.
What I’ve learnt in those countries: how good it can feel to be smiled at. This morning I took a walk along the riverfront in Hangzhou, China. Men (and a couple women) did tai chi alone. Women (and a couple men) did synchronized dancing to music. Grandmothers pushed strollers, and grandfathers jogged–often in business casual–occasionally omitting a loud yell or two. (I have no clue what the purpose of the yells is–native speakers, do you have any insight into this?) For 45 minutes, I was the only “white guy” that I saw.
It was unusual for people not to stare at me. Sometimes out of the corner of their eyes, and sometimes quite openly, but almost everyone stared. Some of them, though–some of them smiled at me, too. 你好, they might say. 你好, I would answer. I waved at little kids, and their grandmothers smiled–and made them wave back at me if they were too shy to do it on their own. Not big-deal interactions–but, it always felt so good. What it cost them: nothing. What it gave me: a lot, actually.
I maintain that at some point in their life, everyone should spend some time in a place where they’ll be stared at. It’ll teach you the value of a smile for someone who doesn’t seem to fit. Lots of people get stared at in today’s America–Muslim women in hijab. Black men in nice hotels/white neighborhoods/academic conferences. Any woman at all in a computer science department. A smile at someone else costs nothing–and can give a lot.
on being stared at: I include this one in the English notes because of the commonly-taught, commonly-believed old bullshit that there’s something wrong with ending a sentence with a preposition. Is on being stared at English? Absolutely. Is there any other way to say it? Not that I know of.
their life: This is a good example of the use of a third-person plural pronoun to refer to a singular person. Since there is no reason to assume any particular gender here, some dialects of English use their gender-neutral pronoun, which looks like the plural pronoun, but in this context is not. You can read more about this phenomenon here.
stoop: Besides being a verb with a number of different meanings, stoop can also be a noun. Merriam-Webster defines it as a porch, platform, entrance stairway, or small veranda at a house door. How I used it in the post: I pass some old men sitting on a stoop drinking tea (a common pastime for old men in Turkey).
Once a year I spend a week in Antigua, Guatemala, where I interpret for a group that does free surgeries for people for whom even the almost-free national health care system is too expensive. I spend a lot of time in the recovery room. It’s a challenge–you’re interpreting for people who are half-asleep, and often wearing an oxygen mask–and I do like a challenge. (This use of do explained in the English notes below.) Sometimes the challenges are unexpected ones, though.
One day last year a recovery room nurse asked me to tell a little boy to cough. That’s not unusual in a recovery room–sometimes post-operative secretions in your lungs cause a minor drop in the amount of oxygen that you’re getting, and a cough or two will clear them right up.
Tosa, I said. The kid looked at me uncomprehendingly. Hmmm, I thought to myself–does the kid not speak Spanish? That’s not uncommon in Guatemala, where 70% of the population is indigenous and over 20 Mayan languages are spoken.
The father looked at me and smiled. Tosá, he said. The kid coughed. So: no cough when I said tosa, but tosá elicited the desired response.
The father was using a verbal form that’s used in Guatemala and a few other places in Central and South America. Indeed, it’s probably the most distinctive thing about Guatemalan Spanish. However, although I know a few local regional nouns and usually get a happy laugh when I use them, I had never learnt this particular verbal form–Americans would rarely have an occasion to use or to hear it, as it’s used only in the context of particular social relationships, and it wouldn’t be at all typical for a foreigner to have one of those.
The verbal form in question is called voseo. It’s used in very close relationships–between friends of long duration is the typical one. In Guatemala, the tu form of verbs is used in many situations in which the usted form would be used anywhere else in the Spanish-speaking world–for example, waiters in restaurants and the ubiquitous vendedores ambulantes (people who stroll constantly through the tourist areas selling stuff, primarily Mayan women of a variety of ethnicities from the surrounding pueblos) will typically address you with the formal terms señor or señora (sir or ma’am)–and then use the tu form of verbs with you, which even on my fifth time in-country sounds weird.
So, you’re wondering: how does one form this mysterious conjugation? For starters, let’s go over the present indicative. It’s almost entirely regular, and very easy to relate to the three classes of Spanish verbs.
Spanish verbs end with either -ar, -er, or -ir, with the -ar verbs mostly being homologous with the French -er verbs. (Sorry–I havent even thought about the others!) To form the voseo present indicative of almost all verbs, you keep the vowel of the infinitive, add the -s that you would expect in the tu form of the verb, and put the stress on the final syllable. So:
escribir – escribís
decir – decís
venir – venís
tener – tenés
comer – comés
volver – volvés
tomar – tomás
buscar – buscás
caminar – caminás
Of course, just because I‘ve learnt the voseo forms doesn‘t mean that I have anyone with whom to use them–as I said, there are only some relationships in which it‘s OK. I did use them with the dog at my host family‘s apartment. I listened carefully, and they use the formal usted form with him, but he didn‘t seem to mind my voseo–although I was sneaking him treats, so who knows…
Enjoying these posts from Guatemala? Why not make a small donation to Surgicorps International, the group with which I come here? You wouldn t believe how much aspirin we can hand out for the cost of a large meal at McDonald‘s–click here to donate. Us volunteers pay our own way–all of your donations go to covering the cost of surgical supplies, housing for patients’ families while their loved one is in the hospital, medications, and the like. Scroll down for the English notes, per usual.
I do like… This use of do emphasizes something. As far as I can tell, the primary use, although not the only one, is to emphasize something that is contrary to expectations. For example, in this Dashiell Hammett quote
I do like a man that tells you right out he’s looking out for himself. Don’t we all? I don’t trust a man that says he’s not. And the man that’s telling the truth when he says he’s not I distrust most of all, because he’s an ass and ass that’s going contrary to the laws of nature.
…you wouldn‘t expect anyone to like a person who is looking out for himself (a very Trumpian behavior, particularly if you‘re only looking out for yourself)–hence the do. How I used it in the post:
It’s a challenge–you’re interpreting for people who are half-asleep, and often wearing an oxygen mask–and I do like a challenge. Liking a challenge is presumably at least somewhat contrary to expectations–hence, the do.
In-country:being or taking place in a country that is the focus of activity (such as military operations or scientific research) by the government or citizens of another country (Merriam-Webster)
For me, it became clear that we had crossed some horrible line between sanity and madness when journalists started laughing during news stories. On the plus side, this leads to a discussion of the role of recursion in language.
For me, it became clear that we had crossed some horrible line between sanity and madness when journalists started laughing during news stories. Leaks of stories of hallucinatory misbehavior, treason, criminality, and just plain evil have been coming out of the Trump government so fast that it’s become surreal. Potential reasonable reactions include despair, and humor. Taking the second option, the New York Times web site recently published a satire piece called The White House Leak Template for Journalists. You click on various and sundry choices…
…and it generates a little news story about a leaked Trump administration scandal for you.
Scroll to the bottom of this page, and you’ll find screen shots of the whole thing. It’s sadly hilarious, but behind the hilarity is an important point about how language works.
One of the things that’s interesting about language is that every human language (what we call in my line of work “natural” languages, as opposed to computer languages) is capable of saying an infinite number of things. “Infinite” is a big claim, and you’re right to be skeptical about it. So, let me just show you that with even a very small amount of knowledge of a language, you can say an enormous number of things–much more than you might ever have thought–and as you’ll see at the end of the post, this is a fact that has important implications for the many people reading this blog who are trying to learn a second language.
Let’s suppose that you know how to say a simple declarative sentence in some language or another–my dog ate my shoes. You’ve got a subject, a verb, and an object. Suppose that you know 10 nouns and 10 verbs. You can now say the following number of sentences:
10 nouns * 10 verbs * 9 nouns = 900 sentences
Why only 9 nouns in the object position? Because I’m assuming that you won’t use the same noun for the object as you did for the subject. So, whichever noun you pick for the subject, you now have nine choices left for the object, rather than the 10 that you started with.
Let’s suppose that you have a language–like French or Spanish–that inflects all verbs differently for singular versus plural subjects. Let’s also suppose that in our calculation above, we included only singular forms of the verbs. Add the plural form of the nouns and the plural form of the verbs, and now you have the following additional sentences:
To recap: 900 sentences if you only know the singulars, plus another 1800 if you add the plurals, so you’ve got 2700 sentences that you can say.
Note: this post relies heavily on a branch of math called combinatorics. I stink at combinatorics, so please be kind! Corrections are welcome in the Comments section.
To this point, we’ve only been using nouns and verbs. Let’s add a new kind of word: and. Even if we didn’t know the plural forms of the verbs, and lets us say a truly remarkable number of sentences with just our 10 singular nouns and our 10 singular verbs. Recall how many simple declarative sentences we could say with just 10 nouns and 10 verbs:
10 nouns * 10 verbs * 9 nouns = 900 sentences
Once you’ve picked a noun for the subject, you have 9 nouns left for your object, leaving eight unused nouns. Suppose that you’re going to use and in your object: you have 9 possibilities for the first noun (since you used 1 for the subject) , and 8 possibilities for the second one (since you used one for the subject, and you’ve already used one in the object). So, with and, you have the following number of possibilities:
…and if you’re keeping track, that’s 900 + 1800 + 1700 sentences, or 4,400 sentences.
Of course, we’re not done with and yet–since you’ve learnt to use the plural forms of verbs, you can use and in the subject, too. The calculation of the number of sentences that you can make with and in the subject (but just a single noun in the object) is similar to what we just did:
Of course, you can have two nouns in the subject and two nouns in the object, as well–you can do the math. What’s cooler is that you can use and to join together two sentences, too. Let’s take the “formula” that gave us the smallest number of sentences: singular subject, singular verb, singular object. Remember how we calculated the number of sentences that we could make with only 10 nouns and 10 verbs:
10 nouns * 10 verbs * 9 nouns = 900 sentences
How many sentences can you make by joining two sentences together with and? The possible assumptions are numerous. Can you repeat the subject? Why not? (Dogs chase cats and dogs chase balls.) Can you repeat the object? Why not? (Dogs chase cats and children chase cats.) Certainly those are weird, though, so let’s estimate that maybe 10% of our possibilities aren’t going to be OK, and just calculate from the numbers that we used for the simple declarative sentences. That gives us this:
10 nouns * 10 verbs * 9 nouns and 10 nouns * 10 verbs * 9 nouns = 1800 sentences; subtract 10% of that for the ones that repeat too much and you still went from 900 sentences to 1620 sentences with just one additional word.
…in other words: as soon as you throw and into the mix at the level of sentences, you double the number of sentences that you can make. (The last time we tried to total how many sentences we could make, we had 5,920. Double that with and, subtract 10% for the sentences that repeat too much, and you have 10,656 sentences.)
What happens if you add or to your armamentarium? You just doubled the number of sentences that you can make again. How about throwing but in there? You just doubled it again. (We’re around 40,000 sentences right now, even with our 10% adjustment for repeated things.) Add one more tense and you just…well, it just got really, really big. And let’s review what you know–it’s very little:
10 nouns, singular and plural
10 verbs, singular and plural
For those of us who are as math-challenged as I am: that’s 23 words and two tenses to give you around 40,000 sentences. Throw in some adjectives… Learn how to turn a simple declarative sentence into a question… Learn a few names… Learn to say he, she, and it… Add because…
Now, I know what you’re thinking: I know a hell of a lot more than 10 nouns and verbs in French, but it sure doesn’t feel like I know how to say very many things. Remember, though: as we discussed recently, you can get a surprisingly long way on a pretty small amount of a language. This is a skill that you can develop with practice: think about simple ways to communicate your wants and needs, and I bet you’ll come up with creative ways to work around your lack of knowledge of a language.
A technical excursus: recursion
When we got into and, we touched on an important mechanism of language that leads to the fact that every human language is capable of saying an infinite number of things. Called recursion, it has a specific definition in mathematical formalism that you can find here; for our point of view, it means that some things in language that we care about, such as sentences, can be made up of other things of the same type. For example, we used recursion when we made the sentence Dogs chase cats and dogs chase balls out of two sentences: dogs chase cats and dogs chase balls. We could also use recursion to make noun phrases (the groups of words that make up the subjects and objects in our examples): the noun phrase my dog and your cat is made up of the noun phrase my dog and the noun phrase your cat. In principle, is there any limit to this? No, actually. You would die before you could say an infinitely long sentence, and even if you could live long enough to hear one, by the time you got to the end you would most likely have forgotten the beginning. But, that doesn’t change the fact that the language, by virtue of having this fundamental property of recursion, can produce an infinite number of things to say.
If no one could ever say an infinitely long sentence, who cares about understanding how and why languages can produce the things? For one thing, infinity is a pretty big deal, and if you’re dealing with a system of any sort that’s capable of infinity, then if you want to be able to understand how it works, you need to understand that aspect of it. I believe it was Chomsky (who in many ways was a horrible thing to have happen to linguistics) who made the analogy that just because no marathon runner can run forever doesn’t mean that it’s not useful and important to understand the physiological mechanisms that let them do it.
You made it this far? Great! Your reward is the New York Times Leak Template. Read it and laugh–then go subscribe to a newspaper. Keeping journalism alive is essential to getting the traitors that are currently running our federal government out of the White House. Feeling geeky? Calculate how many news stories about Trumpworld scandals this would generate–and ask yourself if that would be enough…
One of the disorienting things about being in a foreign country is that you often find that you’re incapable of doing the simplest things–things that you could do without really having to think about them in your country of origin. Getting and maintaining cell phone service? I have spent weeks of my life struggling with that in France. Where to buy a breadbox? No clue–one of the charms of France is that stores are pretty specialized here, but you have to find the right kind for whatever it is that you’re looking for. Fastoche for a French adult, but often baffling for me. Using a credit card? The stories I could tell…
Case in point: I struggle with grammatical points of listening to the news here. I am completely addicted to listening to and reading the news, and one of the nice things about having a bit of familiarity with French is that I can consume news from a whole nother perspective. (A whole nother explained in the English notes.) What throws me off is the use of the conditional mood in French news reporting. (The term mood, as opposed to tense, refers to something like a grammatical structure that communicates something about the reality of a situation, as opposed to the time of its occurrence–the latter is tense. The conditional and the subjunctive are usually described as moods, while the past and the present are tenses (usually–it gets complicated in Bulgarian and other languages in which verbs are inflected for evidentiality, or whether and how the speaker knows something to be true). The future? It varies from language to language. See irrealis if you’re interested.)
In French, one use of the conditional is to convey something like the as-yet-unverified status of something that you’re saying. Here’s an extract from the Tex’s French Grammar description of how this works:
The conditional is also used to give information whose accuracy cannot be guaranteed. Journalists often use it to report events which are not [yet verified].
‘Une tornade vient de s’abattre sur Hubbard, Texas. Il y aurait plusieurs victimes. Un tatou et un écureuil seraient gravement blessés. Restez avec nous, nous devrions avoir plus de détails d’ici quelques secondes …’
‘A tornado just struck in Hubbard, Texas. Allegedly, there are several casualties. An armadillo and a squirrel seem to be seriously wounded. Stay with us, we should have more details in a few seconds …’
Here’s an example of journalistic use of the imperfect, from a news story in Le monde about persecution of gays in Chechnya. (I picked Le monde because it’s pretty middle-of-the-road.) Look for auraient été arrêtées:
Here you see it in the title of a web page–note serait, in place of est:
La Tchétchénie serait-elle en train de se «débarrasser des homosexuels» en les torturant dans des camps ? La communauté internationale s’interroge
What’s the point of the torture? To get you to give up the names of other gays. In this news story, watch for aurait procédé and serait ensuite soumis:
Selon ces témoignages de rescapés, la police tchétchène aurait procédé à une vague d’enlèvement de membres de la communauté LGBT ou de personnes soupçonnées d’en faire partie. Les détenus seraient ensuite soumis à des tortures et des interrogatoires pour dénoncer d’autres personnes ayant les mêmes orientations sexuelles.
Just how thoroughly tortured can you be if you’re gay in Chechnya? To death–look for auraient été tuées in this sentence from the same article:
Trois personnes au moins auraient été tuées, selon des sources au sein de la police et du gouvernement.
You’ll notice a repeated pattern in these examples–it’s made explicit that what’s being reported is something that was initially said by someone else:
Selon ces témoignages de rescapés, la police tchétchène aurait procédé à une vague d’enlèvement…
Selon un témoin, il s’agirait de “voyageurs d’Europe de l’Est” qui se sont montrés “incroyablement agressifs”. (Not from a story about gays being tortured in Chechnya–see here)
I’ve heard the construction used in spoken language without that kind of reference to a third party who was the origin of the information, in situations like reporting on something that had just happened, e.g. when reporting on the number of deaths in a big traffic accident while it still wasn’t clear if the final number of deaths were known, so it’s clearly not necessary–but, it’s probably not an accident that we’re seeing this co-occurrence of source and conditional mood in written news stories.
Want to do something to help? Slacktivism is always an option–click “like” on a Facebook post, or retweet something, and go on about one’s business. Give 20 euros or 20 bucks, though, and you’ve already done more than most people ever will–and maybe help save a life in the process. For the cost of a pizza… 5 euros/bucks would still be more than most people do, and for the cost of a cup of coffee and a croissant. Here are some places where you can make donations:
a whole nother: this means something like an entirely different. It’s so uncommonly used in writing that native speakers typically aren’t even sure how to spell it–WordReference’s spell checker doesn’t recognize it. I was pleasantly surprised to find an entry for it on the Merriam-Webster web site.
When’s the last time you saw a dog shoot a bunch of kids at a grade school, or post a video of someone beheading someone else on Twitter, or vote for Trump?
I’m not necessarily that crazy about people, but I like animals. (Except for man-eating rabbits–I hate man-eating rabbits.) Seriously, when was the last time you saw a dog or a cat sell a teen-ager drugs, or kill a bunch of kids at a grade school (yes, this happened in the US), or vote for Trump (that happens in the US, too)? Yes, my dog bit a couple people on the croupion when they walked into the house uninvited. Yes, my cat once pooped in my favorite sandals. But, rip off a tourist visiting from a foreign land? Sell someone a counterfeit Beanie Baby on eBay? Video someone beheading another living person in the name of God, and distribute it on Twitter? Only a human would do that.
Consequently, when I’m in the US, I carry a leash and a can of cat food in my car. Dogs love cat food, and when I see an obvious runaway/lost dog trotting down the street, I pull over and offer him a whiff. I can usually catch them, and I’ve gotten maybe 12 or 15 dogs back to their happy homes in the 20 years (almost) that I’ve been in my current town.
Something that makes this a hell of a lot easier is if people have had their animal microchipped. In this context, a “microchip” is a little thing about the size of a grain of long-grain rice that a veterinarian injects under a dog or cat’s skin. They don’t notice it in the least, as far as I can tell. A veterinarian can wave a sort of wand over it, and it will send off a signal with an identifying number. The vet sends the number to a company, the company sends back contact information from the owner, et voilà: Spot is home in time for dinner. It’s quite wonderful, really.
This sign’s been around for a while. I walk by it on my way to the train station after work. The effort to get him back to his happy home will definitely be a lot easier than it would have been otherwise: Hector has been chipped. Check out the poster, then scroll down, and let’s talk about how it’s interesting from a linguistic point of view.
The linguistically cool thing is at the bottom: Hector est Pucé. What that means: Hector has been chipped. Now, we know that that’s going to increase the chances of Hector making his way home, but it’s cool from a linguistic point of view, too. Recall from this blog post that French has a class of verbs that relate to undoing some noxious state of infestation–dératiser (to exterminate the rats in something), dénicotiniser (to remove the nicotine from something), and the like. The interesting thing that we noted about these verbs is that they share an odd set of characteristics:
They all have an -is– added on to the end.
They all describe the reversal of a state of affairs that a human could create, but wouldn’t be expected to.
None of them has a corresponding verb for creating that state of affairs. That is, there is no ratiser, nicotiniser, etc. (or that is the claim, at any rate–read the other blog post if you don’t agree).
Now, puce, the word that is being used for a microchip here (it’s also the word for the chip on your credit card), comes from puce, a flea. There is a verb épucer, to deflea, which clearly doesn’t fit the pattern of the verbs about which we just talked. And, here’s an example of pucer! Certainly the meaning here is to microchip, not to infest with fleas–but, it’s worth a second look and a quick blog post anyway, right?
I hope these folks have found their rouquin, their ginger (in the sense of red-haired). I’d like to think that he’s found his way home. If not: I hope he’s happily shacked up with some girl cat somewhere. It would have to be a purely platonic relationship–in addition to being pucé, he’s also been neutered–but, a lifelong flirtation can be pretty exciting in and of itself. The French are pretty damn good at that, too.
Want to be amused/horrified by the stupidity of the world? Go to Google Images, do a search for microchips, and check out some of the “mark of the Beast” stuff that comes up.