What computational linguists actually do all day: The recursion edition

I know, I know: computational linguistics sounds like the world’s most glamorous profession, right? 

I know, I know: computational linguistics sounds like the world’s most glamorous profession, right?  You imagine a bunch of geeks in hip glasses sitting around talking about Sanskrit is-aorist verbs, playing a little foosball after a free sushi lunch in the Google cafeteria, and then writing code to translate Jacques Prévert into idiomatic American English with a little stock ticker in the upper-right corner of their screen so that they can watch the value of their vested options go up, and up, and up, and…

In reality, I’m sitting in the international student dormitory of a well-known East Coast American university.  Yesterday was a good day, because the shitwad in room D2 left his dirty dishes in the sink for the full 48 hours that let me feel fine about throwing the reeking things in the trash can.

But, then I realized something: I can only get easy copyright releases for the book I’m writing for papers published in 2016 or later.  That means that I need to do a serious analysis of what I’m citing in the book, which means…writing code (the computer language that makes up a program) to go through a bunch of citations to figure out what year they were published, in which conference or journal, etc., etc., etc.

That means that I write stuff that looks like this:

open (IN, "/Users/transfer/Dropbox/Scripts-new/bioNLP.bib") || die "Couldn't open input file...\n";

…and then spend a lot of time looking at the error message “Couldn’t open input file”, ’cause I was missing the slash at the beginning of this:

/Users/transfer/Dropbox/Scripts-new/dummy.bib

…which I was happy to figure out, but didn’t really find all that interesting.

Then I spent a lot of time writing things like the following:

    if ($line =~ /title.*=\{(.*)\},$/) {

        $DEBUG && print "TITLE: $1\n";

        $entry{"TITLE"} = $1;

    }

…which wasn’t particularly difficult, but caused a little pinprick in my soul, ’cause I knew as I was writing it that it would mess up any time that I had a title with a curly-brace in it ({}), and practicing your profession shittily never feels good.  For reasons that we need not go into, having curly-braces in the title of a work happens a hell of a lot more often than you might think, and that fixing that little flaw would require writing something called a recursive functionwhich really shouldn’t be that complicated for a computational linguist (recursion is one of the fundamental properties of language (the picture at the top of this page is a humorous illustration of recursion (which is probably oxymoronic (and as you might have guessed, these embedded parentheticals are themselves an example of recursion (as is the second sentence of this post (an example, that is–not necessarily a humorous one (unlike the cartoon))))))), and yet still, is more than my little brain de pois chiche (garbanzo bean) can handle on a Sunday morning.

Then, in order to be able to see any actual output, I had to write code like the following:

        my $output = "";

        for my $field (@fields) {

            #print "$entry{$field}\t";                                                               

            $field .= $entry{$field} . "\t";

        }

        $field =~ s/\t$//;

        print "$field\n";

    }

…which was neither particularly challenging nor particularly interesting, but caused my program to crash quite rudely, ’cause for reasons that we need not go into, I should have written

        my $output = "";

        for my $field (@fields) {

            #print "$entry{$field}\t";                                                               

            $output .= $entry{$field} . "\t";

        }

        $output =~ s/\t$//;

        print "$output\n";

    }

That gave me the first thought I’d had all morning that was actually interesting, as I contemplated how hard I’m pretty sure that it would have been–how impossible I at least hope it would be, for the moment at any rate–for a computer to find and fix that particular bug.

Another half hour or so of work, and now I can actually see what I wanted to know, which is the venues where the works that I cite were published.  This was useful, in that I noticed that one that should be heavily represented in my bibliography in fact barely figures there at all.  But, what it meant was that I needed to Google hither and yon to find out how to search Google Scholar (we’re just getting more and more meta here all the time) by name of conference.  Not particularly challenging; but, not particularly interesting, either.

This is a whiny post, right?  Totally tongue in cheek, though.  Actually, I have the incredible good luck to love what I do, and the book in question really is a labor of…a labor of love.

 


English notes

Something in this post that is perfectly fine English but that I probably would not have written if I didn’t spend a lot of time writing (poorly) in French these days:

I noticed that a publication venue that should be heavily represented in my bibliography in fact barely figures there at all.

An educated speaker of the langue de Molière will be aware that figurer sur une liste is perfectly natural (as far as I know) French.  What I wrote is perfectly fine English, but I would suspect that it doesn’t occur very often, even in written academic or official English.  Why did it pop out of mouth (well…fingers) today?  French-language interference, which is funny, ’cause in language teaching we often talk about first-language interference (carrying over aspects of the grammar of your native language, such that they fuck up your mastery of a foreign or second language), but I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first.  Go figure…

go figure is an expression that expresses surprise about something that you’ve just been talking about, or an assertion that you are about to make.  How I used it in the post:

I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first.  Go figure…

 

 

7 thoughts on “What computational linguists actually do all day: The recursion edition”

  1. After reading that I wonder :

    If your brain is a garbanzo bean (from a previous post of yours a metaphor (a comparison) (or so I believe)) I begin to wonder to what my brain could be compared. I had to pause and look up each highlighted reference. I looked back at the cartoon to understand the Joke. Which I enjoyed immensely once I understood!

    I appreciated the tongue-in-check (another metaphor?) tone of the piece.

    I was so excited to understand more of what someone in computational
    Linguistics does all day. And now sated (somewhat) I am aware at how much I do not know about computational linguistics and really anything.

    Thanks for the peek in.
    I enjoyed it even though I am aware that my brain and knowledge may be more like the toenail of an ant’s foot. I also wonder if that even exists.

    Liked by 1 person

  2. I can totally understand second languages interferences, as long as the second language is reasonably known, each time the second language owns certain expressions, or words, that are perfectly handy when they are painfully lacking in our first language .

    Liked by 1 person

    1. Yes, it’s really chiant when English lacks something in French that’s REALLY useful, like the distinction between “langue” and “langage”! And frankly, French needs a robust past-tense subjunctive, like Spanish still has…

      Like

      1. Hey man, we have the subjonctif imparfait and plus-que-parfait that do the job . I admit few people know them nowadays but I confess I’ve been more and more using them recently in oral speech when they are appropriate because it’s hurting me when I don’t . Don’t know why after all these years of life without them, but I do it and people understand what I mean ( most of the time they don’t even notice), anyway I don’t do it for them but for my digestion ha ha .

        Liked by 1 person

  3. Even more, I miss the conditionnel passé 2° forme that comes naturally when one uses the subjonctif imarfait or plus-que-parfait . Instead of “j’aurais aimé que cela arrive plus tôt” it gives “J’eusse aimé que cela arrivât plus tôt” . That’s French, the rest is just pidgin .

    Liked by 1 person

      1. Which tense ? the conditionnel passé 2° forme ? We can use it anytime instead of the ordinary conditionnel passé (1° forme): instead of “j’aurais voulu” say “j’eusse voulu” . The situations are the same, the meaning is the same, we can use both in the same cases . The 2° forme was once litterary, normal for XVIth or XVIIth real literature and now it fell down but it’s still an object of fun for French intellectuals among their mates . Use it anytime you need a conditionnel passé ( “je serais venu plus tôt si j’avais su” becomes “Je fusse venu plus tôt si j’avais su” . Note that the conditionnel passé 2° forme is made with the auxiliary conjugated at the subjonctif imparfait ( j’eusse, je fusse) exactly like the subjonctif plus-que-parfait, in fact they are exactly similar, the difference being only in the rest of the sentence .

        We say “LE Grevisse”, masculine, to mean Maurice Grevisse’s top opus, “Le bon usage”. I heard it was THE reference so I’d say yes – I can’t say, at the age of 8 my father, a grammar venerator, taught me the complete French grammar one lesson per day for a whole summer . It was a clever move, because he made sure I understood the sometimes rigorous/sometimes subtle intelligence BEHIND the French grammar, just like the French best mind, sometimes mathematic/sometimes artistic . Thanks to this fundamental learning I could learn old Greek and Latin more easily than my peers, later it helped me to learn faster Spanish, Italian, Portuguese when traveling, and even for the skeletal telegraphic tool that is called English this early teaching helped me a little .
        Try Le Grevisse for sure, but Grevisse also wrote other useful books such as :

        -Précis de grammaire française (1939) (aussi connu sous le titre Le petit Grevisse)
        -Exercices sur la grammaire française (1942)
        -Cours de dictées (1944)
        -Le Français correct (1973)
        -Savoir accorder le participe passé (1975)
        -Quelle préposition ? (1977).
        Any of those is useful for a motivated learner .

        Like

Leave a reply to zipfslaw1 Cancel reply

Languages. Motivation. Education. Travelling

"Je suis féru(e) de langues" is about language learning, study tips and travelling. Join my community!

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues