1
2
3
4
5
6
7
8
9
10 """
11 Classes and interfaces for producing tree structures that represent
12 the internal organization of a text. This task is known as X{parsing}
13 the text, and the resulting tree structures are called the text's
14 X{parses}. Typically, the text is a single sentence, and the tree
15 structure represents the syntactic structure of the sentence.
16 However, parsers can also be used in other domains. For example,
17 parsers can be used to derive the morphological structure of the
18 morphemes that make up a word, or to derive the discourse structure
19 for a set of utterances.
20
21 Sometimes, a single piece of text can be represented by more than one
22 tree structure. Texts represented by more than one tree structure are
23 called X{ambiguous} texts. Note that there are actually two ways in
24 which a text can be ambiguous:
25
26 - The text has multiple correct parses.
27 - There is not enough information to decide which of several
28 candidate parses is correct.
29
30 However, the parser module does I{not} distinguish these two types of
31 ambiguity.
32
33 The parser module defines C{ParseI}, a standard interface for parsing
34 texts; and two simple implementations of that interface,
35 C{ShiftReduce} and C{RecursiveDescent}. It also contains
36 three sub-modules for specialized kinds of parsing:
37
38 - C{nltk.parser.chart} defines chart parsing, which uses dynamic
39 programming to efficiently parse texts.
40 - C{nltk.parser.probabilistic} defines probabilistic parsing, which
41 associates a probability with each parse.
42 """
43
44
45
46
47
49 """
50 A processing class for deriving trees that represent possible
51 structures for a sequence of tokens. These tree structures are
52 known as X{parses}. Typically, parsers are used to derive syntax
53 trees for sentences. But parsers can also be used to derive other
54 kinds of tree structure, such as morphological trees and discourse
55 structures.
56
57 """
59 """
60 Derive a parse tree that represents the structure of the given
61 sentences words, and return a Tree. If no parse is found,
62 then output C{None}. If multiple parses are found, then
63 output the best parse.
64
65 The parsed trees derive a structure for the subtokens, but do
66 not modify them. In particular, the leaves of the subtree
67 should be equal to the list of subtokens.
68
69 @param sent: The sentence to be parsed
70 @type sent: L{list} of L{string}
71 """
72 raise NotImplementedError()
73
75 """
76 @return: A parse tree that represents the structure of the
77 sentence. If no parse is found, then return C{None}.
78
79 @rtype: L{Tree}
80 @param sent: The sentence to be parsed
81 @type sent: L{list} of L{string}
82 """
83
85 """
86 @return: A list of the parse trees for the sentence. When possible,
87 this list should be sorted from most likely to least likely.
88
89 @rtype: C{list} of L{Tree}
90 @param sent: The sentence to be parsed
91 @type sent: L{list} of L{string}
92 """
93
95 """
96 @return: A probability distribution over the parse trees for the sentence.
97
98 @rtype: L{ProbDistI}
99 @param sent: The sentence to be parsed
100 @type sent: L{list} of L{string}
101 """
102
104 """
105 @return: A dictionary mapping from the parse trees for the
106 sentence to numeric scores.
107
108 @rtype: C{dict}
109 @param sent: The sentence to be parsed
110 @type sent: L{list} of L{string}
111 """
112
113
114
115
117 """
118 An abstract base class for parsers. C{AbstractParse} provides
119 a default implementation for:
120
121 - L{parse} (based on C{get_parse})
122 - L{get_parse_list} (based on C{get_parse})
123 - L{get_parse} (based on C{get_parse_list})
124
125 Note that subclasses must override either C{get_parse} or
126 C{get_parse_list} (or both), to avoid infinite recursion.
127 """
129 """
130 Construct a new parser.
131 """
132
133 if self.__class__ == AbstractParse:
134 raise AssertionError, "Abstract classes can't be instantiated"
135
136 - def parse(self, sentence):
138
141
146
151
164