Design Perceptive

Vocabulary Behavior and the Locality Principle

by tim on Nov.30, 2008, under Artificial Intelligence, Dialogue Systems, Hypothesis

Temporal Locality, in regards to computer memory management, says that if an item is used, it is probable that it will be used again soon.  It is part of a principle in computer science called the Locality principle.  This principle maintains that data locations referenced in a short period of time in a running computer often consist of relatively well-predictible clusters1.  It was developed as the theory behind a robust page replacement algorithm for virtual memory in the 60′s.  Since then, the principle has been applied in a number of ways.  I think we can use it in another, useful and interesting way.

We might apply this principle in vocabulary selection for speaker-independent speech recognition systems.  Because size and composition impacts speech recognition performance, it is important, at any given time in a conversation, to maintain a reasonable vocabulary size.  Yet, we want vocabularies that are relevant to the context.  Obviously, we can code our own vocabularies and vocabulary swaps like the programmers of yesterday who had to manage their own page transfers.   But, this is prone to human error and does not rely on the power of computation.  How can we automate vocabulary selection and swapping?

Denning gives us the hint in his article in Communications of the ACM “The Locality Principle”.  It is useful to provide the entire quote here.

The locality principle flows from human cognitive and coordinative behavior.  The mind focuses on a small part of the sensory field and can work most quickly  on the objects of its attention… …The locality principle will be useful wherever there is an advantage in reducing the apparent distance from a process to the objects it accesses.

So, it seems to me, and this is my hypothesis, that the vocabulary of human discourse follows the same principle.  Simply put, working vocabularies have a reference of locality with the vocabulary domain of the discourse context.  This is something I intend to explore over the next year.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  1. Denning, P.  ”The Locality Principle” Communications of the ACM, July 2005 []
:, ,

1 Comment for this entry

  • tim

    Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36411 occurrences), followed by “and” (28852). Only 135 vocabulary items are needed to account for half the Brown Corpus. – Wikipedia on Zipf’s Law.

Leave a Reply

You must be logged in to post a comment.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!