December 2018 – Zipf's Law

Châteaux forts: How do French children learn vocabulary?

How do you learn vocabulary in a language with gender if the gender is not marked?

The Christmas holidays took me to the Loire Valley. That’s an area that’s famous for chateaus (châteaux, n.m.pl), and that meant new vocabulary–Zipf’s Law and all that…

…which brings me to a mystery: how are French kids supposed to learn new words correctly when the graphics, diagrams, and the like from which they learn them don’t include the genders of the words? In this post I’ve included four pictures showing terminology related to châteux forts–what we call “castles” in English. Notice that in only one of the four is the gender of the words marked, and even in that diagram the gender is marked only inconsistently–gender is given here by the form of the definite article, and for terms that are given in the plural (les douves, the moats; les créneaux, crenellations; and les remparts, ramparts), you can’t tell the gender from the definite article.

chateau-fort-46500 — Source: http://www.ikonet.com/fr/ledictionnairevisuel/arts-et-architecture/architecture/chateau-fort.php

Chateau fort de coucy — Source: http://rozsavolgyi.free.fr/cours/Premiere%20partie/Annexes/05-02-03.htm

chateaufort — Source: https://www.mireille33.fr/articles.php?lng=fr&pg=1215 (great page, BTW)

bibliddoc_030i03 — https://www.iletaitunehistoire.com/genres/documentaires/lire/le-chateau-fort-bibliddoc_030

9782742794317_31_1 — http://www.actes-sud-junior.fr/9782742794317-l-silke-moritz-isabelle-liber-achim-ahlgrimm-enigmes-au-chateau-fort.htm

This is not just an idiosyncracy of medieval vocabulary for castles–it’s a very general phenomenon in French-language educational materials. For example, here’s a diagram of a representative insect from Le grand livre marabout de la nature, edited by Fanny Delahaye:

…a representative bird from the 2004 version of Le petit Larousse compact:

…and one from the 3rd edition of Pierre Kamina’s Petit atlas d’anatomie:

…a non-representative sample chosen by scanning my bookshelf for educational materials with diagrams in them.

How about it, native speakers? (Phil d’Ange, I’m lookin’ at you…) How does a French student learn vocabulary without having the gender of the terms listed on diagrams that are intended to teach them? Concretely: you’re a kid. You’ve got a diagram like the ones shown on this page, and you need to learn the terms thereon. How do you do so, given that the gender is not labelled?

English vocabulary

Idiosyncracy: From Merriam-Webster: a peculiarity of constitution or temperament; an individualizing characteristic or quality . First known use: 1604. Other words first observed in that year: appreciation, black eye, blotch, and chinchilla. https://www.merriam-webster.com/dictionary/idiosyncrasy

Ducklings and goslings and inklings, oh my

That moment when the elves take your baby and leave one of theirs in its place.

I dragged myself out of bed at 8:30 AM today. Under normal circumstances, if I’m still in bed at 5:45 AM, it means that I had a rough night–I am most definitely both a morning person, and an early riser. Seulement voilà (“the thing is”):

At this time of year, it doesn’t get light outside in Paris until about 8:30 in the morning.
At 2 AM I got obsessed with the need to learn all of the words for baby animals in French.

Morphemes are the things that words are made of. For example, the plural cats has two morphemes: cat, and the s that carries the meaning of plurality. (This happens to be the example from which my child learned what a morpheme is–as a young child, and as we did the dishes together. Must suck to be a linguist’s kid…)

English has an odd little morpheme that refers to things that are small. Like the s of cats, it is what is called a bound morpheme, meaning that it cannot be a word on its own–it has to be attached to something else. (Contrast that with the cat in catnap (a short, light nap), catnip (a plant–it’s basically pot for cats), and cathouse (a brothel–archaic)). Here are a couple of examples:

duckling: a baby duck.
inkling: a small hint, or a small piece of knowledge. (I’ll give some examples of its use later.)

Changeling-fée-irlande-légende-mythologie-3 — Source: http://www.vivre-en-irlande.fr/culture-irlandaise/changeling-fee-legende. See the site for helpful information about how to recognize a foundling, return a foundling, etc.

The -ling morpheme is also not productive: that means that you can’t really use it freely to make “new words.” For example, it’s not clear that anyone would know what you meant if you casually threw the words waterling (parallel to inkling) or penling (parallel to duckling) into a conversation. (Contrast that with -gate, which over the course of my lifetime has become applicable to practically anything, with the meaning of “a scandal related to:” Bridgegate, Pizzagate, etc.) Because it’s not productive, one could list all of the words in English in which it occurs. Limited only by my memory, of course. My best shot at doing so:

duckling: baby duck
gosling: baby goose
foundling: a child who has been found after having been abandoned
changeling: when the elves take away your baby and leave one of their own in its place
inkling: a small hint, idea, trace, piece of knowledge, clue

In the Foundling Hospital grounds, London, c1901 (1901) — The London Foundling Hospital in 1901, from an article about a 1911 foundling lottery in Paris at http://time.com/4433717/paris-baby-raffle-history/.

Now, I know what you’re thinking: Zipf, you’re a drooling idiot. There are lots of words in English that end with -ling: for example, DROOLING. Feeling, wheeling and dealing (French: mic-mac or micmac), healing…

Well… I may be an idiot, but I’m not a drooling one. Here’s the thing: a morpheme is defined by its sound (or spelling)–in our case, ling–and by its meaning. Drooling and gosling (baby goose) contain the same sounds/letters, but not the same meaning of smallness, so it’s not the case that they share the same morpheme. -ling is a pretty textbook (French: typique) example of a non-productive morpheme.

So, yeah: I don’t sleep much, and I’m trying to learn to speak French, so at 2 AM I got obsessed with learning the names of baby animals in French. This web page got me started, and then I started searching WordReference.com for weird English-language baby animal names (say, gosling), and here you see the results. (Yes, some occur more than once.) At 2 AM, I only knew chiot (puppy), chaton (kitten), and veau (calf)–how about you? And, native speakers (Phil d’Ange, I’m lookin’ at you)–can you add some more?

Adult animals:

Juvenile animals:

English-language example sentences

Foundling:

The Steel Riders Saga is a sci-fi/fantasy novel about Free Wheeler, a foundling discovered by the legendary Steve Thompson during a deep terrain ATV ride. Thompson leads an ATV pack known as the “Steel Riders.” In their fantastical journeys Free Wheeler finds true love and home. (Twitter, @quantum_tide)
Meanwhile, in Australia, there’s a National #GravyDay. I have never heard of anything so glorious! (Nobody in my family cares about gravy as much as I do. I… might be a foundling?) (Twitter, @VG28263355)
@Decervelage Can I just say…Baby Faced Finster. A foundling!! You Naughty Baby!! Hahaha! (Twitter, @TheSuperAmanda)

Inkling:

I’ve mentioned this numerous times on the podcast but… I have an inkling that Nintendo will use Smash DLC to promote upcoming (inc third-party) Switch releases. (Twitter, @pixelpar)
My new resolution is to not read the thread of comments of tweets where I know or have an inkling that it’s not going to be a good thing. (Twitter, @valparkie)
You are a gem of a friend and you don’t have an inkling of how much i appreciate your ignorance of my vices. (Twitter, @Shakti_Shetty)
I don’t have an inkling of what the future holds but I’m excited (Twitter, @JaredTench)
Roommate, Camden *going to Waffle House in Dunn*: “If I get the smallest inkling of a crack-whore, I’m leaving!” (Twitter, @dr_pattyguin)

What computational linguists actually do all day: the regression model edition

What computational linguists actually do all day: The lexical frequency version

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another.

Tell someone that you’re a computational linguist, and the next thing out of their mouth is likely to be either:

How many languages do you speak?, or…
What’s that?

In theory, computational linguists spend their time thinking about fun questions like:

Is natural language Turing-complete?
The relationship, if any, between what we know about words (say, the word dog can be a noun or a verb, and it occurs more often with the words bark and leash than with the word meow) and what we know about the world (say, a dog is a canine, and might like to chase balls, and will eat cat shit if not instructed otherwise).
How Zipf’s Law, which describes the fact that a small number of words are extremely common, while a large number of words are extremely rare, but do occur, might or might not be related to the mathematical phenomenon of the fractal.

In practice, we spend most of our time trying to figure out where we went wrong in writing some computer program or another. (OK: that, and writing grant proposals.) Think that being a computational linguist sounds glamorous? Here’s how I spent my morning.

All I gotta do: go through a bunch of documents and count how often each word in that bunch of documents occurs. Easy-peasy–barely hard enough for a homework in Computational Linguistics 101.

Seulement voilà…

Easy enough to fix–I just failed to give the complete name of the program, and…. marde.

OK, easy enough to fix–I had written

…when I shoulda written

Shoulda: the typical spoken form of should have.

(Note the square bracket near the end of the middle line–I had left it out.) Great–avançons, alors. But, no, fuckashitpiss:

Easy enough to fix–turns out I wrote this:

…when I shoulda written this:

(Note the dollar sign before the rightmost instance of words now.) And so, on we go, but…

…and it’s easy enough to fix–I had written this:

…when I shoulda written this:

(Note the double quote before $frequencies{$words[$i]}\n”;) …and now I’m wondering:

These errors were all on one single line–what other horrors have I hidden in this code, and will they be as easy to find as those were?
What the hell was I thinking when I wrote that line? Was I thinking about the upcoming dissertation defense at 2 PM? Was I thinking about Trump giving my country to China? Was I thinking about tomorrow’s colonoscopy? Who the hell knows, really–whatever it was, it apparently wasn’t this line of code…

Mais returnons… Ah marde, but at least this one will be easy to fix…

…except that I verify the existence of the directory, and then get this:

…which is the exact same error that I got before. So, I go back and look at my code, where I see this, and remember that my error message is supposed to print out the name of the directory that it couldn’t open, but it did no such thing:

…which is ’cause I never gave the program the name of the input directory. So I take care of that, and also tell my program to print out the name of the directory that it couldn’t open if, it fact, it can’t open a directory–as we saw above, I had planned to do this, but of course left out that little detail:

…and now I experience a tiny little bit of success, because my program does not crash. Seulement voilà, it doesn’t actually produce any ouput:

Note the lack of a bunch of lexical frequencies… So, I go back to my script, and I start looking around in the region of the program where I meant for the output to happen. I don’t see anything obvious in that area, so: I go further up in the code, and start doing what I need to do to convince myself that the earlier parts of the program are working the way that I intended them to. This means printing out the results at intermediate steps of the processing. The resulting code (leaving out a bunch of details) looks like this:

…which does nothing different than it was doing before, so I know that I need to go even further up in the program and, again, print stuff out as I go, resulting in this:

…which, when I run the script, produces this:

…which suggests to me that the directory exists, and that I’m opening it correctly, but that I am either (a) reading its contents incorrectly, or (b) making a mistake when I make a decision about whether or not to open each file. A quick Google search finds the problem for me–I had written this:

…when I shoulda written this:

(Note that the text at the left end of the line was open, and now is opendir.)

Progress! Now I get some output, but note the last line–I’m just getting a bunch of file names, and no word frequencies. I can see the problem right away, though–I have the directory name right, and I have the file name right, but I need to combine them in order to be able to open the file. Doing so gives me this code:

…which results in my script running successfully for a while, but then crashing, and I know exactly what causes said crash…

…and I know that it’s a bear to fix, and I’ve been working on this fucking task that’s barely difficult enough to make a good homework assignment for two hours, and now it’s time to go to the aforementioned dissertation defense, and… Soupire…

Meme source: https://imgur.com/gallery/fzbkRI8

Gratuitous picture of me and my cat

In which I can’t even get beyond the Introduction.

Your lexicon–the words that you know, and what you know about them–is unlike every other part of your knowledge of your native language in that it continues to grow over the course of your entire life. By the time you’re a young child you know pretty much everything that you’re going to know about your language’s phonetics, phonology, morphology, and syntax. Your lexicon, though–that continues to grow throughout your life.

Now imagine someone who tries to learn a second language as an adult. Like everyone else who speaks that language, you’re going to be learning new words until you die. But, that’s going to be a lot more obvious to you than it is to people who speak it natively, because unlike them, you didn’t spend your entire youth learning the vocabulary of that language–start studying a language in your 50s, and you are literally 50 years behind a native speaker when it comes to learning the lexicon of the language in question.

If you’ve been reading this blog for a while, you know that you don’t have to work very hard to find words that you don’t know: Zipf’s Law, which describes the fact that a small number of words of a language are very, very common, while the rest occur only very rarely–but do occur–ensures that you will be running across new words just going about your daily life.

Living in France, I have no difficulty whatsoever running into 10 words that I don’t know every single day. Ads on the metro, the services written on a window installer’s truck, the name of a street that I walk by on the way to the lab–that’s all it takes. Living in the US, it’s a bit harder, but it’s totally doable–listening to the radio, watching something on YouTube, or listening to a book on tape will do it. 10 words a day, every day (except the month of December, which I spend reviewing the words that I learned from January to November), and mine de rien, you have a vocabulary of thousands of words.

And yet: as Zipf’s Law would suggest, I still have no problem whatsoever finding 10 new words a day to learn. Case in point: today I wanted to figure out what the symbol ≠ means in the grammar book that I’m working through at the moment (Grammaire progressive du français : niveau perfectionnement, B2 – C2, by Maïa Grégoire and Alina Kostucki). So, I went to the “front matter” of the book–the table of contents and stuff like that. This involved reading the Introduction, where I ran across the following:

WordReference.com found me most of the relevant definitions, and yet: dictionaries being the beautiful but imperfect things that they are (like, say, my cat), it did let me down for a couple words: relever, and mécanisation. To wit:

….même avec un vocabulaire riche et une bonne connaissance de la grammaire, les résultats atteints son souvent entravés par la persistance de fautes qui ont traversé les différents niveaux d’apprentissage. Bon nombre de ces difficultés tiennent à des interférences avec la langue d’origine et aucune grammaire ” générale ” ne peut prétendre en rendre compte. D’autres, en revanche, relèvent de particularités de de la langue française, mal perçues par les étudiants, et que nous tentons d’exposer de la façon la plus claire possible.

My best guess for an English-language equivalent of relever de would be “to arise from.” Here are some examples of to arise from from Word Sketch, purveyor of fine linguistic corpora and the tools for searching them:

The lectures focus on topics arising from research in science and technology.
The investigation arose from a referral from both Houses of the NSW Parliament. (Arise is an irregular verb, with the past tense form arose.)
He blames Jews for the ills arising from the industrial revolution, e.g., class divisions and hatred.
Leukaemias are devastating diseases of the haemopoietic system that arise from aberrant stem or progenitor cells. (Leukaemia and haemopoietic are the British English spellings of leukemia and hemopoietic.)

But: looking at WordReference, I don’t see to arise from as a possible translation of relever de, or vice versa. Phil d’Ange?

The other problem word: la mécanisation. The only translation of this word in Word Reference is…”mechanization”! What that means: I can only guess (see above about how your lexicon grows over the course of your entire life), and none of my guesses would make sense in this context. Mechanized infantry is infantry equipped with armored vehicles to move itself around, and mechanized artillery is artillery equipped with its own transport system, but oral mechanization, as in the sample from my book? I haven’t the faintest clew. (That’s “clue,” for us Americans–something about the faintest clew just demands that you spell it like a Brit.)

À la partie théorique, située sur la page de gauche, correspond, sur la page de droite, une présentation en contexte (parfois illustrée) des points de grammaire, et une série d’exercices de réemploi : exercices à trous, transformations, mécanisation orale, écrit.

Native speakers: can you show an anglophone some love? (To show someone some love means to help them, to do something nice for them, to give them something. Super-slangy.)

Finally, here is a gratuitous picture of a fat old bald guy and his cat Keiko. As you can tell from the amount of light in the dwelling, the photo was taken in America, not in wintertime Paris. The teddy bear on the floor is the property of my cat, and I suggest that you not touch it.

Conflict of interest statement: I have no conflicts of interest to declare. I pay for a subscription to Sketch Engine, I bought the book, and Word Reference is free to one and all.

	Anonymous on The many ways to spell “…
	Anonymous on Nightmare after nightmare: How…
	zipfslaw1 on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …
	Anonymous on Estimate your vocabulary …