Les sous-langages: sublanguages

The lines represent growth in the number of word types as increasing numbers of tokens are observed.  The blue line (BNRC) is unrestricted Bulgarian text.  The red line (epicrises) is Bulgarian clinical documents.  The clinical documents show lexical constraints--for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.
The lines represent growth in the number of word types as increasing numbers of tokens are observed. The blue line (BNRC) is unrestricted Bulgarian text. The red line (epicrises) is Bulgarian clinical documents. The clinical documents show lexical constraints–for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.

I had to look up all of these words today in order to be able to explain just one aspect of my research. One of the things that I work on is the topic of sublanguages (explained below). Looking for material on the subject in French, I came across the doctoral dissertation Sur la notion de sous-langage, by Roland Dachelet. Even in the context of discussing my own research, Zipf’s Law strikes.

  • le sous-langage: sublanguage.
  • le domaine: domain. A sublanguage is a variety of language associated with a specific domain—medicine, biology, weather, sports reporting.
  • spécialisé: specialized. Being related to a specific domain, a sublanguage is specialized.
  • la contrainte: constraint. Sublanguages are generally associated with constraints—constraints on the kinds of subjects and arguments that a verb in the domain can have, for instance; constraints on syntactic structures; constraints on the set of words.
  • le lexique: in this case, the set of words in a text—vocabulary. It has other meanings, too, such as bilingual dictionary. Typically the set of words in a sublanguage is constrained.
  • la morphologie: morphology (how words are put together).
  • une ambiguïté: ambiguity. The fundamental problem of language processing—if most things in language didn’t have multiple possible interpretations, computers could just look everything up.
  • la variabilité: variability. The other major problem of language processing—there are so many ways to express the same thing.
  • la caractérisation: characterization. The current challenge in sublanguages is to characterize them automatically—that is, with a computer, as opposed to a human doing it manually.
  • la syntaxe: syntax. This is how phrases are structured.
  • syntaxique: syntactic.
  • une analyse syntaxique: syntactic analysis.
  • la structure: structure. Syntax is mostly about structure.
  • la sémantique: semantics.
  • sous-jacent: below, underlying, implicit (the sense in which I need it). Important aspects of language, such as syntactic structure, are implicit in the sense that they are not visibly indicated in the stream of language.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues

%d bloggers like this: