What computational linguists actually do all day: The variable reuse edition

I know, I know–computational linguistics seems like the most glamorous job in the world, right?

Dearly Beloved Colleagues,

I just spent several frustrating hours trying to fix a bug in my code.  In the end, the bug was purely a logic bug, and it was purely the product of poor variable-naming.

Code is the instructions that you write in a computer language, for a program to execute.

Here’s what happened.  I’m writing the world’s simplest script–I just need to read in some files that contain values for features for individual files–or, to put it better: for individual papers that I want to classify.

script is a kind of computer program, typically one that does a relatively simple task.

…and, with that, I think you can already guess what happened.  I was opening files that contained features that I had extracted from other files, and I reused a variable name.  Consequently, once my script reached some critical length, I could no longer keep track in my own head of the code that I was editing.  So, my test cases found a simple bug, and in the process of fixing that bug, I got myself so confused that I was mixing up the “files” in the sense of “papers that I’m classifying” and the “files” in the sense of “files containing feature values from papers,” and the next thing you know, several hours have gone by.

variable is something in a computer program whose value can be changed.  It’s the opposite of a constant, which is something whose value cannot be changed.  For example, the number is a constant–its value will always be 3.  On the other hand, a computer program might contain something called length_of_word, intended to store the length of some word that you’re looking at, and that length could be anything, in principle.  (Really?  How about 0?  Or a negative number?  This kind of unstated assumption is one way that computer programs can go wrong.)

This is one of those things that gets fixed by (1) printing out my code on actual paper, noticing the same variable name in two clearly-marked-off-as-different sections of the code, and thinking “Zipf, you might be even more stupid than you knew…”; (2) sitting in the Philadelphia sun with a pack of cigarettes and a quality zombie novel for a while (Déchirés, by Peter Stenson–the zombie apocalypse comes and the only people who survive are meth addicts–I think you can come to your own conclusion about the metaphor a lot quicker than I fixed my code); and then (3) you go back and look at the code and you see immediately how you managed to confuse the heck out of yourself.

My error here was in reusing my variable to store two different kinds of information.  This is a classic error in computer programming. I either didn’t notice that I was doing it when I moved from the first part of the program to the second part, or more likely, noticed it but didn’t think that it would be a problem because the script was relatively short and simple.  The problem with variable reuse is not for the program itself; rather, the problem is for the programmer, because variable reuse is a great way to confuse yourself.  That’s exactly what I did–bad Zipf, bad!

Happy Saturday from Penn Student Housing, where either the kid in D3 is going to stop throwing rotting chicken in the communal trash can or he’s going to wake up with it in his bed,

Zipf

I notice that I’ve been writing a lot of whiny posts about computational linguistics lately.  In fact I LOVE my job, enough so that I am probably one of the happiest people you know–or don’t know.  Want the English-language version of Déchirés?  Here it is: Fiend.  I read it three times in English before I read it in French, so it MUST be good, right?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues

%d bloggers like this: