If you’re a regular reader of this blog, you probably have some interest in language–language in the abstract, or le langage in French, as opposed to (or in addition to) any particular language, or la langue in French. I’m reblogging here the reading list for a week of a course on language processing that I’m teaching at the moment. The theme of the week is data in language processing: what you (might) mean when you talk about “data” with respect to language; what kinds of data there are; where that data comes from; and how to make some data if you can’t find the kind of data that you need.
I’m posting this particular reading list because I often suspect that many people who know that I’m a linguist imagine that I spend my days sitting around discussing how funny irregular verbs are, or how cool it is that French has three verbs that mean “go back,” or whatever. What you’ll find on this list has very little to do with coolness or lack thereof, and a lot to do with data formats, data set sizes, statistics, and a bit on ethics. Personally, I find this stuff fascinating–but, it’s often worth getting a glimpse at what we call in my field “the sausage-making process.” Enjoy! (Or go watch the latest episode of “The Walking Dead”–it’s pretty good.)
Here are some suggested readings for Week 5. Remember that I do not distribute my lecture notes. Note also that you are responsible for all of the material on which I lecture. These readings are not required, but they are intended to cover everything that I talk about in our lectures (modulo the caution in the preceding sentence). All of them are available for free on line except for the books (although the Good and Hardin book is available for free, as well). All of them should be available in an academic library. Feel free to contact me if you have trouble finding a copy of either.
- Banko, Michele, and Eric Brill. “Scaling to very very large corpora for natural language disambiguation.” Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2001.
- Miller, Greg. “A scientist’s nightmare: software problem leads to five…
View original post 360 more words
Apart from your central theme as usual, allow me to snigger : language is “langage” in French, we sold the “u” to our neighbours to buy Calais .
Three verbs to mean “go back” ? Revenir, retourner, repartir, rebrousser chemin, faire demi-tour, and for a particular place rentrer, regagner, réintégrer … Just in one minute, without checking colloquial, slang nor high literature level .
“WE SHALL BE BACK!”
Uh … “NO PASARAN !” “ARRIBA LA REVOLUCION !””Да здравствует Советский Союз !” “! במשך חינם פלסטין” I’m learning Southern Martian but I’m just a beginner .
LikeLiked by 1 person
It’s funny, isn’t it … people’s perceptions of what the reality of a particular discipline is. As you know, you have my utmost respect for just one of the pieces of work you have been or are involved in. Being any sort of expert, whether you like that over-used (English) word or not (I generally don’t) implies a vast amount of learning and then a vast amount of continuing learning, researching, understanding, learning, listening, reading, understanding, learning, you get my drift …. I am frankly in awe of what you share and I am happy to look at these works and see if I can get my small brain anywhere near them
LikeLiked by 2 people
Actually, I am of the “write about what you DON’T know” school! 🙂
LikeLiked by 2 people