1
2
3
4
5
6
7
8
9 """
10 Read tokens, phonemes and audio data from the NLTK TIMIT Corpus.
11
12 This corpus contains selected portion of the TIMIT corpus.
13
14 * 16 speakers from 8 dialect regions
15 * 1 male and 1 female from each dialect region
16 * total 130 sentences (10 sentences per speaker. Note that some
17 sentences are shared among other speakers, especially sa1 and sa2
18 are spoken by all speakers.)
19 * total 160 recording of sentences (10 recordings per speaker)
20 * audio format: NIST Sphere, single channel, 16kHz sampling,
21 16 bit sample, PCM encoding
22
23
24 Module contents
25 ---------------
26
27 The timit module provides 4 functions and 4 data items.
28
29 * items
30
31 List of items in the corpus. There are total 160 items, each of which
32 corresponds to a unique utterance of a speaker. Here's an example of an
33 item in the list:
34
35 dr1-fvmh0:sx206
36 - _---- _---
37 | | | | |
38 | | | | |
39 | | | | `--- sentence number
40 | | | `----- sentence type (a:all, i:shared, x:exclusive)
41 | | `--------- speaker ID
42 | `------------ sex (m:male, f:female)
43 `-------------- dialect region (1..8)
44
45 * speakers
46
47 List of speaker IDs. An example of speaker ID:
48
49 dr1-fvmh0
50
51 Note that if you split an item ID with colon and take the first element of
52 the result, you will get a speaker ID.
53
54 >>> itemid = dr1-fvmh0:sx206
55 >>> spkrid,sentid = itemid.split(':')
56 >>> spkrid
57 'dr1-fvmh0'
58
59 The second element of the result is a sentence ID.
60
61 * dictionary
62
63 Phonetic dictionary of words contained in this corpus. This is a Python
64 dictionary from words to phoneme lists.
65
66 * spkrinfo
67
68 Speaker information table. It's a Python dictionary from speaker IDs to
69 records of 10 fields. Speaker IDs the same as the ones in timie.speakers.
70 Each record is a dictionary from field names to values, and the fields are
71 as follows:
72
73 id speaker ID as defined in the original TIMIT speaker info table
74 sex speaker gender (M:male, F:female)
75 dr speaker dialect region (1:new england, 2:northern,
76 3:north midland, 4:south midland, 5:southern, 6:new york city,
77 7:western, 8:army brat (moved around))
78 use corpus type (TRN:training, TST:test)
79 in this sample corpus only TRN is available
80 recdate recording date
81 birthdate speaker birth date
82 ht speaker height
83 race speaker race (WHT:white, BLK:black, AMR:american indian,
84 SPN:spanish-american, ORN:oriental,???:unknown)
85 edu speaker education level (HS:high school, AS:associate degree,
86 BS:bachelor's degree (BS or BA), MS:master's degree (MS or MA),
87 PHD:doctorate degree (PhD,JD,MD), ??:unknown)
88 comments comments by the recorder
89
90 The 4 functions are as follows.
91
92 * raw(sentences=items, offset=False)
93
94 Given a list of items, returns an iterator of a list of word lists,
95 each of which corresponds to an item (sentence). If offset is set to True,
96 each element of the word list is a tuple of word(string), start offset and
97 end offset, where offset is represented as a number of 16kHz samples.
98
99 * phonetic(sentences=items, offset=False)
100
101 Given a list of items, returns an iterator of a list of phoneme lists,
102 each of which corresponds to an item (sentence). If offset is set to True,
103 each element of the phoneme list is a tuple of word(string), start offset
104 and end offset, where offset is represented as a number of 16kHz samples.
105
106 * audiodata(item, start=0, end=None)
107
108 Given an item, returns a chunk of audio samples formatted into a string.
109 When the fuction is called, if start and end are omitted, the entire
110 samples of the recording will be returned. If only end is omitted,
111 samples from the start offset to the end of the recording will be returned.
112
113 * play(data)
114
115 Play the given audio samples. The audio samples can be obtained from the
116 timit.audiodata function.
117
118 """
119
120 from nltk_lite.corpora import get_basedir
121 from nltk_lite import tokenize
122 from itertools import islice
123 import ossaudiodev, time
124 import sys, os, re
125
126 if sys.platform.startswith('linux') or sys.platform.startswith('freebsd'):
127 PLAY_ENABLED = True
128 else:
129 PLAY_ENABLED = False
130
131 __all__ = ["items", "raw", "phonetic", "speakers", "dictionary", "spkrinfo",
132 "audiodata", "play"]
133
134 PREFIX = os.path.join(get_basedir(),"timit")
135
136 speakers = []
137 items = []
138 dictionary = {}
139 spkrinfo = {}
140
141 for f in os.listdir(PREFIX):
142 if re.match("^dr[0-9]-[a-z]{4}[0-9]$", f):
143 speakers.append(f)
144 for g in os.listdir(os.path.join(PREFIX,f)):
145 if g.endswith(".txt"):
146 items.append(f+':'+g[:-4])
147 speakers.sort()
148 items.sort()
149
150
151 for l in open(os.path.join(PREFIX,"timitdic.txt")):
152 if l[0] == ';': continue
153 a = l.strip().split(' ')
154 dictionary[a[0]] = a[1].strip('/').split()
155
156
157 header = ['id','sex','dr','use','recdate','birthdate','ht','race','edu',
158 'comments']
159 for l in open(os.path.join(PREFIX,"spkrinfo.txt")):
160 if l[0] == ';': continue
161 rec = l[:54].split() + [l[54:].strip()]
162 key = "dr%s-%s%s" % (rec[2],rec[1].lower(),rec[0].lower())
163 spkrinfo[key] = dict((header[i],rec[i]) for i in range(10))
164
166 if isinstance(sentences,str):
167 sentences = [sentences]
168 for sent in sentences:
169 fnam = os.path.sep.join([PREFIX] + sent.split(':')) + ext
170 r = []
171 for l in open(fnam):
172 if not l.strip(): continue
173 a = l.split()
174 if offset:
175 r.append((a[2],int(a[0]),int(a[1])))
176 else:
177 r.append(a[2])
178 yield r
179
181 """
182 Given a list of items, returns an iterator of a list of word lists,
183 each of which corresponds to an item (sentence). If offset is set to True,
184 each element of the word list is a tuple of word(string), start offset and
185 end offset, where offset is represented as a number of 16kHz samples.
186
187 @param sentences: List of items (sentences) for which tokenized word list
188 will be returned. In case there is only one item, it is possible to
189 pass the item id as a string.
190 @type sentences: list of strings or a string
191 @param offset: If True, the start and end offsets are accompanied to each
192 word in the returned list. Note that here, an offset is represented by
193 the number of 16kHz samples.
194 @type offset: bool
195 @return: List of list of strings (words) if offset is False. List of list
196 of tuples (word, start offset, end offset) if offset if True.
197 """
198 return _prim(".wrd", sentences, offset)
199
200
202 """
203 Given a list of items, returns an iterator of a list of phoneme lists,
204 each of which corresponds to an item (sentence). If offset is set to True,
205 each element of the phoneme list is a tuple of word(string), start offset
206 and end offset, where offset is represented as a number of 16kHz samples.
207
208 @param sentences: List of items (sentences) for which phoneme list
209 will be returned. In case there is only one item, it is possible to
210 pass the item id as a string.
211 @type sentences: list of strings or a string
212 @param offset: If True, the start and end offsets are accompanied to each
213 phoneme in the returned list. Note that here, an offset is represented by
214 the number of 16kHz samples.
215 @type offset: bool
216 @return: List of list of strings (phonemes) if offset is False. List of
217 list of tuples (phoneme, start offset, end offset) if offset if True.
218 """
219 return _prim(".phn", sentences, offset)
220
222 """
223 Given an item, returns a chunk of audio samples formatted into a string.
224 When the fuction is called, if start and end are omitted, the entire
225 samples of the recording will be returned. If only end is omitted,
226 samples from the start offset to the end of the recording will be returned.
227
228 @param start: start offset
229 @type start: integer (number of 16kHz frames)
230 @param end: end offset
231 @type end: integer (number of 16kHz frames) or None to indicate
232 the end of file
233 @return: string of sequence of bytes of audio samples
234 """
235 assert(end is None or end > start)
236 headersize = 44
237 fnam = os.path.join(PREFIX,item.replace(':',os.path.sep)) + '.wav'
238 if end is None:
239 data = open(fnam).read()
240 else:
241 data = open(fnam).read(headersize+end*2)
242 return data[headersize+start*2:]
243
245 """
246 Play the given audio samples.
247
248 @param data: audio samples
249 @type data: string of bytes of audio samples
250 """
251 if not PLAY_ENABLED:
252 print >>sys.stderr, "sorry, currently we don't support audio playback on this platform:", sys.platform
253 return
254
255 try:
256 dsp = ossaudiodev.open('w')
257 except IOError, e:
258 print >>sys.stderr, "can't acquire the audio device; please activate your audio device."
259 print >>sys.stderr, "system error message:", str(e)
260 return
261
262 dsp.setfmt(ossaudiodev.AFMT_S16_LE)
263 dsp.channels(1)
264 dsp.speed(16000)
265 dsp.write(data)
266 dsp.close()
267
269 from nltk_lite.corpora import timit
270
271 print "6th item (timit.items[5])"
272 print "-------------------------"
273 itemid = timit.items[5]
274 spkrid, sentid = itemid.split(':')
275 print " item id: ", itemid
276 print " speaker id: ", spkrid
277 print " sentence id:", sentid
278 print
279 record = timit.spkrinfo[spkrid]
280 print " speaker information:"
281 print " TIMIT speaker id: ", record['id']
282 print " speaker sex: ", record['sex']
283 print " dialect region: ", record['dr']
284 print " data type: ", record['use']
285 print " recording date: ", record['recdate']
286 print " date of birth: ", record['birthdate']
287 print " speaker height: ", record['ht']
288 print " speaker race: ", record['race']
289 print " speaker education:", record['edu']
290 print " comments: ", record['comments']
291 print
292
293 print " words of the sentence:"
294 print " ", timit.raw(sentences=itemid).next()
295 print
296
297 print " words of the sentence with offsets (first 3):"
298 print " ", timit.raw(sentences=itemid, offset=True).next()[:3]
299 print
300
301 print " phonemes of the sentence (first 10):"
302 print " ", timit.phonetic(sentences=itemid).next()[:10]
303 print
304
305 print " phonemes of the sentence with offsets (first 3):"
306 print " ", timit.phonetic(sentences=itemid, offset=True).next()[:3]
307 print
308
309 print " looking up dictionary for words of the sentence..."
310 words = timit.raw(sentences=itemid).next()
311 for word in words:
312 print " %-5s:" % word, timit.dictionary[word]
313 print
314
315
316 print "audio playback:"
317 print "---------------"
318 print " playing sentence", sentid, "by speaker", spkrid, "(a.k.a. %s)"%record["id"], "..."
319 data = timit.audiodata(itemid)
320 timit.play(data)
321 print
322 print " playing words:"
323 words = timit.raw(sentences=itemid, offset=True).next()
324 for word, start, end in words:
325 print " playing %-10s in 1.5 seconds ..." % `word`
326 time.sleep(1.5)
327 data = timit.audiodata(itemid, start, end)
328 timit.play(data)
329 print
330 print " playing phonemes (first 10):"
331 phones = timit.phonetic(sentences=itemid, offset=True).next()
332 for phone, start, end in phones[:10]:
333 print " playing %-10s in 1.5 seconds ..." % `phone`
334 time.sleep(1.5)
335 data = timit.audiodata(itemid, start, end)
336 timit.play(data)
337 print
338
339
340 sentid = 'sa1'
341 for spkr in timit.speakers:
342 if timit.spkrinfo[spkr]['sex'] == 'F':
343 itemid = spkr + ':' + sentid
344 print " playing sentence %s of speaker %s ..." % (sentid, spkr)
345 data = timit.audiodata(itemid)
346 timit.play(data)
347 print
348
349 if __name__ == '__main__':
350 demo()
351