||Learning: Language acquisition (continued)
The aim of this project is to simulate the task of learning a language.
Based upon Chomsky's X bar theory, algorithms have been developed to
determine the appropriate part of speech (POS) of a particular word and
to put the word in its proper context. Once the most probable POS has
been determined, the word is then networked to other words within the
sentence fixing a permanent relationship. In such a way, nouns are matched
with adjectives which decribe them, linked to verbs, and coupled to other
nouns which can modify them. All these connections are stored in mutiple arrays
and indexed to the source where these links were derived.
The current implementation of this project includes 6 main components.
1. A web crawler which acquires web pages based on a specified topic.
2. A sentence parser which excises full sentences from html, omitting titles, html tags, and scripts.
3. A word parser which identifies words and retrives a dictionary entry with possible POS of the word.
4. A X bar iterative algorithm which selects the most probable POS for each word.
5. A networking subroutine which identifies and stores all possible word to word interactions i.e. adjective--> noun, adverb--> verb, noun--> verb.
6. A user interface to facilitate access to the networked database by asking questions or providing new sentences to be parsed and networked.
Back to CLOCS home