Les sous-langages: sublanguages

The lines represent growth in the number of word types as increasing numbers of tokens are observed.  The blue line (BNRC) is unrestricted Bulgarian text.  The red line (epicrises) is Bulgarian clinical documents.  The clinical documents show lexical constraints--for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.
The lines represent growth in the number of word types as increasing numbers of tokens are observed. The blue line (BNRC) is unrestricted Bulgarian text. The red line (epicrises) is Bulgarian clinical documents. The clinical documents show lexical constraints–for a given number of tokens, the number of word types is much smaller, and tends toward finiteness.

I had to look up all of these words today in order to be able to explain just one aspect of my research. One of the things that I work on is the topic of sublanguages (explained below). Looking for material on the subject in French, I came across the doctoral dissertation Sur la notion de sous-langage, by Roland Dachelet. Even in the context of discussing my own research, Zipf’s Law strikes.

  • le sous-langage: sublanguage.
  • le domaine: domain. A sublanguage is a variety of language associated with a specific domain—medicine, biology, weather, sports reporting.
  • spécialisé: specialized. Being related to a specific domain, a sublanguage is specialized.
  • la contrainte: constraint. Sublanguages are generally associated with constraints—constraints on the kinds of subjects and arguments that a verb in the domain can have, for instance; constraints on syntactic structures; constraints on the set of words.
  • le lexique: in this case, the set of words in a text—vocabulary. It has other meanings, too, such as bilingual dictionary. Typically the set of words in a sublanguage is constrained.
  • la morphologie: morphology (how words are put together).
  • une ambiguïté: ambiguity. The fundamental problem of language processing—if most things in language didn’t have multiple possible interpretations, computers could just look everything up.
  • la variabilité: variability. The other major problem of language processing—there are so many ways to express the same thing.
  • la caractérisation: characterization. The current challenge in sublanguages is to characterize them automatically—that is, with a computer, as opposed to a human doing it manually.
  • la syntaxe: syntax. This is how phrases are structured.
  • syntaxique: syntactic.
  • une analyse syntaxique: syntactic analysis.
  • la structure: structure. Syntax is mostly about structure.
  • la sémantique: semantics.
  • sous-jacent: below, underlying, implicit (the sense in which I need it). Important aspects of language, such as syntactic structure, are implicit in the sense that they are not visibly indicated in the stream of language.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s