Module ppattach
source code
Read lines from the Prepositional Phrase Attachment Corpus.
The PP Attachment Corpus contains several files having the format:
sentence_id verb noun1 preposition noun2 attachment
For example:
42960 gives authority to administration V 46742 gives inventors of
microchip N
The PP attachment is to the verb phrase (V) or noun phrase (N),
i.e.:
(VP gives (NP authority) (PP to administration)) (VP gives (NP
inventors (PP of microchip)))
The corpus contains the following files:
training: training set devset: development test set, used for
algorithm development. test: test set, used to report results
bitstrings: word classes derived from Mutual Information Clustering for
the Wall Street Journal.
Ratnaparkhi, Adwait (1994). A Maximum Entropy Model for Prepositional
Phrase Attachment. Proceedings of the ARPA Human Language Technology
Conference. [http://www.cis.upenn.edu/~adwait/papers/hlt94.ps]
The PP Attachment Corpus is distributed with NLTK with the permission
of the author.
|
|
|
dictionary(files=[ ' training ' , ' devset ' , ' test ' ] ) |
source code
|
|
|
|
|
items = [ ' training ' , ' devset ' , ' test ' ]
|
|
item_name = { ' devset ' : ' development test set ' , ' test ' : ' test s ...
|
item_name
- Value:
{ ' devset ' : ' development test set ' ,
' test ' : ' test set ' ,
' training ' : ' training set ' }
|
|