One of the major trends in my field today is the use of Amazon Mechanical Turk (AMT) to create linguistic resources, particularly for natural language processing. Using AMT, tasks that require human intelligence—for example, deciding which synonym of a word is being used in a particular context, or labeling a photograph with the things that it pictures, or deciding whether or not a web page is relevant to a search query—are given to humans in very small increments, usually with the goal of using the humans’ data to train a computer to do the same task. It is a form of crowdsourcing—using the public to do a (typically large) job in (typically) small amounts, e.g. Wikipedia.
Karën Fort of the Sorbonne and Gilles Adda of LIMSI have researched the ethics of the AMT model for work and for remuneration. The AMT model turns out to raise many issues, including a number of ethical ones. Karën and Gilles have worked to develop a charter for ethical use of this and other crowdsourcing platforms. (Full disclosure: Karën and Gilles and I published an editorial on the use of Amazon Mechanical Turk in our field in the journal Computational Linguistics.) If you click on the picture, it will take you to a set of slides that Karën prepared for a talk on the subject. Zipf’s Law strikes in the domain of ethics as much as anywhere else—here are some words that I had to look up to read the slides:
▪ une ombre: shade, shadow.
▪ la zone d’ombre: gray zone.
▪ promouvoir: to promote.
▪ le vaut bien: to be worth it.
▪ la plate-forme: platform.
▪ la myriadisation: crowdsourcing.
▪ délocalisé: outsourced.
▪ la foule: crowd, mob, masses.
▪ le travail parcellisé: microworking.
▪ découpé: cut into pieces.
That’s ten words just to get to slide 10 out of 30, but that’s about all I can handle in a single day—more Zipf’s Law words next time.