I know, I know: computational linguistics sounds like the world’s most glamorous profession, right? You imagine a bunch of geeks in hip glasses sitting around talking about Sanskrit is-aorist verbs, playing a little foosball after a free sushi lunch in the Google cafeteria, and then writing code to translate Jacques Prévert into idiomatic American English with a little stock ticker in the upper-right corner of their screen so that they can watch the value of their vested options go up, and up, and up, and…
In reality, I’m sitting in the international student dormitory of a well-known East Coast American university. Yesterday was a good day, because the shitwad in room D2 left his dirty dishes in the sink for the full 48 hours that let me feel fine about throwing the reeking things in the trash can.
But, then I realized something: I can only get easy copyright releases for the book I’m writing for papers published in 2016 or later. That means that I need to do a serious analysis of what I’m citing in the book, which means…writing code (the computer language that makes up a program) to go through a bunch of citations to figure out what year they were published, in which conference or journal, etc., etc., etc.
That means that I write stuff that looks like this:
open (IN, "/Users/transfer/Dropbox/Scripts-new/bioNLP.bib") || die "Couldn't open input file...\n";
…and then spend a lot of time looking at the error message “Couldn’t open input file”, ’cause I was missing the slash at the beginning of this:
/Users/transfer/Dropbox/Scripts-new/dummy.bib
…which I was happy to figure out, but didn’t really find all that interesting.
Then I spent a lot of time writing things like the following:
if ($line =~ /title.*=\{(.*)\},$/) { $DEBUG && print "TITLE: $1\n"; $entry{"TITLE"} = $1; }
…which wasn’t particularly difficult, but caused a little pinprick in my soul, ’cause I knew as I was writing it that it would mess up any time that I had a title with a curly-brace in it ({}), and practicing your profession shittily never feels good. For reasons that we need not go into, having curly-braces in the title of a work happens a hell of a lot more often than you might think, and that fixing that little flaw would require writing something called a recursive function, which really shouldn’t be that complicated for a computational linguist (recursion is one of the fundamental properties of language (the picture at the top of this page is a humorous illustration of recursion (which is probably oxymoronic (and as you might have guessed, these embedded parentheticals are themselves an example of recursion (as is the second sentence of this post (an example, that is–not necessarily a humorous one (unlike the cartoon))))))), and yet still, is more than my little brain de pois chiche (garbanzo bean) can handle on a Sunday morning.
Then, in order to be able to see any actual output, I had to write code like the following:
my $output = ""; for my $field (@fields) { #print "$entry{$field}\t"; $field .= $entry{$field} . "\t"; } $field =~ s/\t$//; print "$field\n"; }
…which was neither particularly challenging nor particularly interesting, but caused my program to crash quite rudely, ’cause for reasons that we need not go into, I should have written
my $output = ""; for my $field (@fields) { #print "$entry{$field}\t"; $output .= $entry{$field} . "\t"; } $output =~ s/\t$//; print "$output\n"; }
That gave me the first thought I’d had all morning that was actually interesting, as I contemplated how hard I’m pretty sure that it would have been–how impossible I at least hope it would be, for the moment at any rate–for a computer to find and fix that particular bug.
Another half hour or so of work, and now I can actually see what I wanted to know, which is the venues where the works that I cite were published. This was useful, in that I noticed that one that should be heavily represented in my bibliography in fact barely figures there at all. But, what it meant was that I needed to Google hither and yon to find out how to search Google Scholar (we’re just getting more and more meta here all the time) by name of conference. Not particularly challenging; but, not particularly interesting, either.
This is a whiny post, right? Totally tongue in cheek, though. Actually, I have the incredible good luck to love what I do, and the book in question really is a labor of…a labor of love.
English notes
Something in this post that is perfectly fine English but that I probably would not have written if I didn’t spend a lot of time writing (poorly) in French these days:
I noticed that a publication venue that should be heavily represented in my bibliography in fact barely figures there at all.
An educated speaker of the langue de Molière will be aware that figurer sur une liste is perfectly natural (as far as I know) French. What I wrote is perfectly fine English, but I would suspect that it doesn’t occur very often, even in written academic or official English. Why did it pop out of mouth (well…fingers) today? French-language interference, which is funny, ’cause in language teaching we often talk about first-language interference (carrying over aspects of the grammar of your native language, such that they fuck up your mastery of a foreign or second language), but I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first. Go figure…
go figure is an expression that expresses surprise about something that you’ve just been talking about, or an assertion that you are about to make. How I used it in the post:
I can’t recall ever running into the concept of second-language interference, and French is mostly definitely a second language for me, not my first. Go figure…