The ethics of crowdsourcing for linguistic resource construction in French

Screenshot 2014-10-06 11.34.33One of the major trends in my field today is the use of Amazon Mechanical Turk (AMT) to create linguistic resources, particularly for natural language processing.  Using AMT, tasks that require human intelligence—for example, deciding which synonym of a word is being used in a particular context, or labeling a photograph with the things that it pictures, or deciding whether or not a web page is relevant to a search query—are given to humans in very small increments, usually with the goal of using the humans’ data to train a computer to do the same task.  It is a form of crowdsourcing—using the public to do a (typically large) job in (typically) small amounts, e.g. Wikipedia.

Karën Fort of the Sorbonne and Gilles Adda of LIMSI have researched the ethics of the AMT model for work and for remuneration.  The AMT model turns out to raise many issues, including a number of ethical ones.  Karën and Gilles have worked to develop a charter for ethical use of this and other crowdsourcing platforms.  (Full disclosure: Karën and Gilles and I published an editorial on the use of Amazon Mechanical Turk in our field in the journal Computational Linguistics.)   If you click on the picture, it will take you to a set of slides that Karën prepared for a talk on the subject.  Zipf’s Law strikes in the domain of ethics as much as anywhere else—here are some words that I had to look up to read the slides:

▪    une ombre: shade, shadow.
▪    la zone d’ombre: gray zone.
▪    promouvoir: to promote.
▪    le vaut bien: to be worth it.
▪    la plate-forme: platform.
▪    la myriadisation: crowdsourcing.
▪    délocalisé: outsourced.
▪    la foule: crowd, mob, masses.
▪    le travail parcellisé: microworking.
▪    découpé: cut into pieces.

That’s ten words just to get to slide 10 out of 30, but that’s about all I can handle in a single day—more Zipf’s Law words next time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing


Criminal Curiosities


Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too


Adventures in natural history collections

Our French Oasis


ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language




Exploring and venting about quantitative issues

%d bloggers like this: