Vocabulary Behavior and the Locality Principle
by tim on Nov.30, 2008, under Artificial Intelligence, Dialogue Systems, Hypothesis
Temporal Locality, in regards to computer memory management, says that if an item is used, it is probable that it will be used again soon. It is part of a principle in computer science called the Locality principle. This principle maintains that data locations referenced in a short period of time in a running computer often consist of relatively well-predictible clusters1. It was developed as the theory behind a robust page replacement algorithm for virtual memory in the 60′s. Since then, the principle has been applied in a number of ways. I think we can use it in another, useful and interesting way.
We might apply this principle in vocabulary selection for speaker-independent speech recognition systems. Because size and composition impacts speech recognition performance, it is important, at any given time in a conversation, to maintain a reasonable vocabulary size. Yet, we want vocabularies that are relevant to the context. Obviously, we can code our own vocabularies and vocabulary swaps like the programmers of yesterday who had to manage their own page transfers. But, this is prone to human error and does not rely on the power of computation. How can we automate vocabulary selection and swapping?
Denning gives us the hint in his article in Communications of the ACM “The Locality Principle”. It is useful to provide the entire quote here.
The locality principle flows from human cognitive and coordinative behavior. The mind focuses on a small part of the sensory field and can work most quickly on the objects of its attention… …The locality principle will be useful wherever there is an advantage in reducing the apparent distance from a process to the objects it accesses.
So, it seems to me, and this is my hypothesis, that the vocabulary of human discourse follows the same principle. Simply put, working vocabularies have a reference of locality with the vocabulary domain of the discourse context. This is something I intend to explore over the next year.
- Denning, P. ”The Locality Principle” Communications of the ACM, July 2005 [↩]
- The Disappearing Spoon
- Principles of Light From a Different Point of View
- Educating Heart, Mind and Body
December 8th, 2008 on 9:41 am
Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36411 occurrences), followed by “and” (28852). Only 135 vocabulary items are needed to account for half the Brown Corpus. – Wikipedia on Zipf’s Law.