1 """
2 This module provides utilities for treating Python dictionaries as X{feature
3 structures}. Specifically, it contains the C{unify} function, which can be used
4 to merge the properties of two dictionaries, and the C{Variable} class, which
5 holds an unknown value to be used in unification.
6
7 A X{feature structure} is a mapping from feature names to feature values,
8 where:
9
10 - Each X{feature name} is a case sensitive string.
11 - Each X{feature value} can be a base value (such as a string), a
12 variable, or a nested feature structure.
13
14 However, feature structures are not a specialized class; they are represented
15 by dictionaries, or more generally by anything that responds to the C{has_key}
16 method. The YAML representation can be used to create and display feature
17 structures intuitively:
18
19 >>> f1 = yaml.load('''
20 ... A:
21 ... B: b
22 ... D: d
23 ... ''')
24 >>> f2 = yaml.load('''
25 ... A:
26 ... C: c
27 ... D: d
28 ... ''')
29 >>> print show(unify(f1, f2))
30 A:
31 B: b
32 C: c
33 D: d
34
35 Feature structures are typically used to represent partial information
36 about objects. A feature name that is not mapped to a value stands
37 for a feature whose value is unknown (I{not} a feature without a
38 value). Two feature structures that represent (potentially
39 overlapping) information about the same object can be combined by
40 X{unification}. When two inconsistant feature structures are unified,
41 the unification fails and raises an error.
42
43 Features can be specified using X{feature paths}, or tuples of feature names
44 that specify paths through the nested feature structures to a value.
45
46 Feature structures may contain reentrant feature values. A
47 X{reentrant feature value} is a single feature value that can be
48 accessed via multiple feature paths. Unification preserves the
49 reentrance relations imposed by both of the unified feature
50 structures. After unification, any extensions to a reentrant feature
51 value will be visible using any of its feature paths.
52
53 Feature structure variables are encoded using the L{Variable} class. The scope
54 of a variable is determined by the X{bindings} used when the structure
55 including that variable is unified. Bindings can be reused between unifications
56 to ensure that variables with the same name get the same value.
57
58 """
59
60 from copy import copy, deepcopy
61 import re
62 import yaml
63
64 import sys
65
67 """
68 An exception that is raised when two values cannot be unified.
69 """
70 pass
71
73 """
74 Determine whether to treat a given object as a feature structure. The
75 test is whether it responds to C{has_key}. This can be overridden if the
76 object includes an attribute or method called C{_no_feature}.
77
78 @param obj: The object to be tested
79 @type obj: C{object}
80 @return: True iff the object can be treated as a feature structure
81 @rtype: C{bool}
82 """
83 return ('has_key' in dir(obj)) and ('_no_feature' not in dir(obj))
84
86 """
87 _FORWARD is a singleton value, used in unification as a flag that a value
88 has been forwarded to another object.
89
90 This class itself is used as the singleton value. It cannot be
91 instantiated.
92 """
94 raise TypeError, "The _FORWARD class is not meant to be instantiated"
95
97 """
98 A Variable is an object that can be used in unification to hold an
99 initially unknown value. Two equivalent Variables, for example, can be used
100 to require that two features have the same value.
101
102 When a Variable is assigned a value, it will eventually be replaced by
103 that value. However, in order to make that value show up everywhere the
104 variable appears, the Variable temporarily stores its assigned value and
105 becomes a I{bound variable}. Bound variables do not appear in the results
106 of unification.
107
108 Variables are distinguished by their name, and by the dictionary of
109 I{bindings} that is being used to determine their values. Two variables can
110 have the same name but be associated with two different binding
111 dictionaries: those variables are not equal.
112 """
113 _next_numbered_id = 1
114
115 - def __init__(self, name=None, value=None):
116 """
117 Construct a new feature structure variable.
118
119 The value should be left at its default of None; it is only used
120 internally to copy bound variables.
121
122 @type name: C{string}
123 @param name: An identifier for this variable. Two C{Variable} objects
124 with the same name will be given the same value in a given dictionary
125 of bindings.
126 """
127 self._uid = Variable._next_numbered_id
128 Variable._next_numbered_id += 1
129 if name is None: name = self._uid
130 self._name = str(name)
131 self._value = value
132
134 """
135 @return: This variable's name.
136 @rtype: C{string}
137 """
138 return self._name
139
141 """
142 If this varable is bound, find its value. If it is unbound or aliased
143 to an unbound variable, returns None.
144
145 @return: The value of this variable, if any.
146 @rtype: C{object}
147 """
148 if isinstance(self._value, Variable): return self._value.value()
149 else: return self._value
151 """
152 @return: A copy of this variable.
153 @rtype: C{Variable}
154 """
155 return Variable(self.name(), self.value())
156
158 """
159 Variables are aliased to other variables by one variable _forwarding_
160 to the other. The first variable simply has the second as its value,
161 but it acts like the second variable's _value_ is its value.
162
163 forwarded_self returns the final Variable object that actually stores
164 the value.
165
166 @return: The C{Variable} responsible for storing this variable's value.
167 @rtype: C{Variable}
168 """
169 if isinstance(self._value, Variable):
170 return self._value.forwarded_self()
171 else: return self
172
173 - def bindValue(self, value, ourbindings, otherbindings):
174 """
175 Bind this variable to a value. C{ourbindings} are the bindings that
176 accompany the feature structure this variable came from;
177 C{otherbindings} are the bindings from the structure it's being unified
178 with.
179
180 @type value: C{object}
181 @param value: The value to be assigned.
182 @type ourbindings: C{dict}
183 @param ourbindings: The bindings associated with this variable.
184 @type otherbindings: C{dict}
185 @param otherbindings: The bindings associated with the value being
186 assigned. (May be identical to C{ourbindings}.)
187 """
188 if isinstance(self._value, Variable):
189
190 return self._value.bindValue(value, ourbindings, otherbindings)
191 if self._value is None:
192
193 self._value = value
194 else:
195
196
197 self._value = unify(self._value, value, ourbindings, otherbindings)
198
199 - def forwardTo(self, other, ourbindings, otherbindings):
200 """
201 A unification wants this variable to be aliased to another variable.
202 Forward this variable to the other one, and return the other.
203
204 @type other: C{Variable}
205 @param other: The variable to replace this one.
206 @type ourbindings: C{dict}
207 @param ourbindings: The bindings associated with this variable.
208 @type otherbindings: C{dict}
209 @param otherbindings: The bindings associated with the other variable.
210 (May be identical to C{ourbindings}.)
211 @return: C{other}
212 @rtype: C{Variable}
213 """
214 other.bindValue(self.value(), ourbindings, otherbindings)
215 self._value = other
216 return other
217
218 - def __hash__(self): return hash(self._uid)
220 """
221 Variables are equal if they are the same object or forward to the
222 same object. Variables with the same name may still be unequal.
223 """
224 if not isinstance(other, Variable): return -1
225 if isinstance(self._value, Variable): return cmp(self._value, other)
226 else: return cmp(self._uid, other._uid)
228 if self._value is None: return '?%s' % self._name
229 else: return '?%s: %r' % (self._name, self._value)
230
232 """
233 Works like yaml.dump(), but with output suited for doctests. Flow style
234 is always off, and there is no blank line at the end.
235 """
236 return yaml.dump(data, default_flow_style=False).strip()
237
239 "Output variables in YAML as ?name."
240 return dumper.represent_scalar(u'!var', u'?%s' % var.name())
241 yaml.add_representer(Variable, variable_representer)
242
248 yaml.add_constructor(u'!var', variable_constructor)
249 yaml.add_implicit_resolver(u'!var', re.compile(r'^\?\w+$'))
250
252 """
253 Make a deep copy of a feature structure, preserving reentrance using the
254 C{memo} dictionary. Meanwhile, variables are replaced by their bound
255 values, if these values are already known, and variables with unknown
256 values are given placeholder bindings.
257 """
258 if memo is None: memo = {}
259 if id(feature) in memo: return memo[id(feature)]
260 if isinstance(feature, Variable) and bindings is not None:
261 if not bindings.has_key(feature.name()):
262 bindings[feature.name()] = feature.copy()
263 result = _copy_and_bind(bindings[feature.name()], None, memo)
264 else:
265 if isMapping(feature):
266
267 result = feature.__class__()
268 for (key, value) in feature.items():
269 result[key] = _copy_and_bind(value, bindings, memo)
270 else: result = feature
271 memo[id(feature)] = result
272 memo[id(result)] = result
273 return result
274
275 -def unify(feature1, feature2, bindings1=None, bindings2=None):
276 """
277 In general, the 'unify' procedure takes two values, and either returns a
278 value that provides the information provided by both values, or fails if
279 that is impossible.
280
281 These values can have any type, but fall into a few general cases:
282 - Values that respond to C{has_key} represent feature structures. The
283 C{unify} procedure will recurse into them and unify their inner values.
284 - L{Variable}s represent an unknown value, and are handled specially.
285 The values assigned to variables are tracked using X{bindings}.
286 - C{None} represents the absence of information.
287 - Any other value is considered a X{base value}. Base values are
288 compared to each other with the == operation.
289
290 The value 'None' represents the absence of any information. It specifies no
291 properties and acts as the identity in unification.
292 >>> unify(3, None)
293 3
294
295 >>> unify(None, 'fish')
296 'fish'
297
298 A base value unifies with itself, but not much else.
299 >>> unify(True, True)
300 True
301
302 >>> unify([1], [1])
303 [1]
304
305 >>> unify('a', 'b')
306 Traceback (most recent call last):
307 ...
308 UnificationFailure
309
310 When two mappings (representing feature structures, and usually implemented
311 as dictionaries) are unified, any chain of keys that accesses a value in
312 either mapping will access an equivalent or more specific value in the
313 unified mapping. If this is not possible, UnificationFailure is raised.
314
315 >>> f1 = dict(A=dict(B='b'))
316 >>> f2 = dict(A=dict(C='c'))
317 >>> unify(f1, f2) == dict(A=dict(B='b', C='c'))
318 True
319
320 The empty dictionary specifies no features. It unifies with any mapping.
321 >>> unify({}, dict(foo='bar'))
322 {'foo': 'bar'}
323
324 >>> unify({}, True)
325 Traceback (most recent call last):
326 ...
327 UnificationFailure
328
329 Representing dictionaries in YAML form is useful for making feature
330 structures readable:
331
332 >>> f1 = yaml.load("number: singular")
333 >>> f2 = yaml.load("person: 3")
334 >>> print show(unify(f1, f2))
335 number: singular
336 person: 3
337
338 >>> f1 = yaml.load('''
339 ... A:
340 ... B: b
341 ... D: d
342 ... ''')
343 >>> f2 = yaml.load('''
344 ... A:
345 ... C: c
346 ... D: d
347 ... ''')
348 >>> print show(unify(f1, f2))
349 A:
350 B: b
351 C: c
352 D: d
353
354 Variables are names for unknown values. Variables are assigned values
355 that will make unification succeed. The values of variables can be reused
356 in later unifications if you provide a dictionary of _bindings_ from
357 variables to their values.
358 >>> bindings = {}
359 >>> print unify(Variable('x'), 5, bindings)
360 5
361
362 >>> print bindings
363 {'x': 5}
364
365 >>> print unify({'a': Variable('x')}, {}, bindings)
366 {'a': 5}
367
368 The same variable name can be reused in different binding dictionaries
369 without collision. In some cases, you may want to provide two separate
370 binding dictionaries to C{unify} -- one for each feature structure, so
371 their variables do not collide.
372
373 In the following examples, two different feature structures use the
374 variable ?x to require that two values are equal. The values assigned to
375 ?x are consistent within each structure, but would be inconsistent if every
376 ?x had to have the same value.
377
378 >>> f1 = yaml.load('''
379 ... a: 1
380 ... b: 1
381 ... c: ?x
382 ... d: ?x
383 ... ''')
384 >>> f2 = yaml.load('''
385 ... a: ?x
386 ... b: ?x
387 ... c: 2
388 ... d: 2
389 ... ''')
390 >>> bindings1 = {}
391 >>> bindings2 = {}
392 >>> print show(unify(f1, f2, bindings1, bindings2))
393 a: 1
394 b: 1
395 c: 2
396 d: 2
397
398 >>> print bindings1
399 {'x': 2}
400
401 >>> print bindings2
402 {'x': 1}
403
404 Feature structures can involve _reentrant_ values, where multiple feature
405 paths lead to the same value. This is represented by the features having
406 the same Python object as a value. (This kind of identity can be tested
407 using the C{is} operator.)
408
409 Unification preserves the properties of reentrance. So if a reentrant value
410 is changed by unification, it is changed everywhere it occurs, and it is
411 still reentrant. Reentrant features can even form cycles; these
412 cycles can now be printed through the current YAML library.
413
414 >>> f1 = yaml.load('''
415 ... A: &1 # &1 defines a reference in YAML...
416 ... B: b
417 ... E:
418 ... F: *1 # and *1 uses the previously defined reference.
419 ... ''')
420 >>> f1['E']['F']['B']
421 'b'
422 >>> f1['A'] is f1['E']['F']
423 True
424 >>> f2 = yaml.load('''
425 ... A:
426 ... C: c
427 ... E:
428 ... F:
429 ... D: d
430 ... ''')
431 >>> f3 = unify(f1, f2)
432 >>> print show(f3)
433 A: &id001
434 B: b
435 C: c
436 D: d
437 E:
438 F: *id001
439 >>> f3['A'] is f3['E']['F'] # Showing that the reentrance still holds.
440 True
441
442 This unification creates a cycle:
443 >>> f1 = yaml.load('''
444 ... F: &1 {}
445 ... G: *1
446 ... ''')
447 >>> f2 = yaml.load('''
448 ... F:
449 ... H: &2 {}
450 ... G: *2
451 ... ''')
452 >>> f3 = unify(f1, f2)
453 >>> print f3
454 {'G': {'H': {...}}, 'F': {'H': {...}}}
455 >>> print f3['F'] is f3['G']
456 True
457 >>> print f3['F'] is f3['G']['H']
458 True
459 >>> print f3['F'] is f3['G']['H']['H']
460 True
461
462 A cycle can also be created using variables instead of reentrance.
463 Here we supply a single set of bindings, so that it is used on both sides
464 of the unification, making ?x mean the same thing in both feature
465 structures.
466
467 >>> f1 = yaml.load('''
468 ... F:
469 ... H: ?x
470 ... ''')
471 >>> f2 = yaml.load('''
472 ... F: ?x
473 ... ''')
474 >>> f3 = unify(f1, f2, {})
475 >>> print f3
476 {'F': {'H': {...}}}
477 >>> print f3['F'] is f3['F']['H']
478 True
479 >>> print f3['F'] is f3['F']['H']['H']
480 True
481
482 Two sets of bindings can be provided because the variable names on each
483 side of the unification may be unrelated. An example involves unifying the
484 following two structures, which each require that two values are
485 equivalent, and happen to both use ?x to express that requirement.
486
487 >>> f1 = yaml.load('''
488 ... a: 1
489 ... b: 1
490 ... c: ?x
491 ... d: ?x
492 ... ''')
493 >>> f2 = yaml.load('''
494 ... a: ?x
495 ... b: ?x
496 ... c: 2
497 ... d: 2
498 ... ''')
499 >>> bindings1 = {}
500 >>> bindings2 = {}
501 >>> # We could avoid defining two empty dictionaries by simply using the
502 >>> # defaults, with unify(f1, f2) -- but we want to be able to examine
503 >>> # the bindings afterward.
504 >>> print show(unify(f1, f2, bindings1, bindings2))
505 a: 1
506 b: 1
507 c: 2
508 d: 2
509 >>> print bindings1
510 {'x': 2}
511 >>> print bindings2
512 {'x': 1}
513
514 If a variable is unified with another variable, the two variables are
515 _aliased_ to each other; they share the same value, similarly to reentrant
516 feature structures. This is represented in a set of bindings as one
517 variable having the other as its value.
518 >>> f1 = yaml.load('''
519 ... a: ?x
520 ... b: ?x
521 ... ''')
522 >>> f2 = yaml.load('''
523 ... b: ?y
524 ... c: ?y
525 ... ''')
526 >>> bindings = {}
527 >>> print show(unify(f1, f2, bindings))
528 a: &id001 ?y
529 b: *id001
530 c: *id001
531 >>> print bindings
532 {'x': ?y}
533
534 Reusing the same variable bindings ensures that appropriate bindings are
535 made after the fact:
536 >>> bindings = {}
537 >>> f1 = {'a': Variable('x')}
538 >>> f2 = unify(f1, {'a': {}}, bindings)
539 >>> f3 = unify(f2, {'b': Variable('x')}, bindings)
540 >>> print show(f3)
541 a: &id001 {}
542 b: *id001
543 >>> print bindings
544 {'x': {}}
545
546 @param feature1: The first object to be unified.
547 @type feature1: C{object} (probably a mapping)
548 @param feature2: The second object to be unified.
549 @type feature2: C{object} (probably a mapping)
550 @param bindings1: The variable bindings associated with the first object.
551 @type bindings1: C{dict} or None
552 @param bindings2: The variable bindings associated with the second object,
553 if these are distinct from C{bindings1}.
554 @type bindings2: C{dict} or None
555 @return: The result of unifying the two objects.
556 @rtype: C{object} (probably a mapping)
557 """
558
559 if bindings1 is None and bindings2 is None:
560 bindings1 = {}
561 bindings2 = {}
562 else:
563 if bindings1 is None: bindings1 = {}
564 if bindings2 is None: bindings2 = bindings1
565
566
567
568
569 copymemo = {}
570 copy1 = _copy_and_bind(feature1, bindings1, copymemo)
571 copy2 = _copy_and_bind(feature2, bindings2, copymemo)
572
573 for b in (bindings1, bindings2):
574 for (vname, value) in b.items():
575 value_id = id(value)
576 if value_id in copymemo:
577 b[vname] = copymemo[value_id]
578
579
580 unified = _destructively_unify(copy1, copy2, bindings1, bindings2, {})
581
582 _apply_forwards_to_bindings(bindings1)
583 _apply_forwards_to_bindings(bindings2)
584 _apply_forwards(unified, {})
585 unified = _lookup_values(unified, {}, remove=False)
586 _lookup_values(bindings1, {}, remove=True)
587 _lookup_values(bindings2, {}, remove=True)
588
589 return unified
590
592 """
593 Attempt to unify C{self} and C{other} by modifying them
594 in-place. If the unification succeeds, then C{self} will
595 contain the unified value, and the value of C{other} is
596 undefined. If the unification fails, then a
597 UnificationFailure is raised, and the values of C{self}
598 and C{other} are undefined.
599 """
600 if memo.has_key((id(feature1), id(feature2))):
601 return memo[id(feature1), id(feature2)]
602 unified = _do_unify(feature1, feature2, bindings1, bindings2, memo)
603 memo[id(feature1), id(feature2)] = unified
604 return unified
605
606 -def _do_unify(feature1, feature2, bindings1, bindings2, memo):
607 """
608 Do the actual work of _destructively_unify when the result isn't memoized.
609 """
610
611
612 if feature1 is None: return feature2
613 if feature2 is None: return feature1
614 if feature1 is feature2: return feature1
615
616
617 if isinstance(feature1, Variable):
618 if isinstance(feature2, Variable):
619
620
621 return feature1.forwardTo(feature2, bindings1, bindings2)
622 else:
623 feature1.bindValue(feature2, bindings1, bindings2)
624 return feature1
625 if isinstance(feature2, Variable):
626 feature2.bindValue(feature1, bindings2, bindings1)
627 return feature2
628
629
630
631 if not isMapping(feature1):
632 if feature1 == feature2: return feature1
633 else:
634 raise UnificationFailure
635 if not isMapping(feature2): raise UnificationFailure
636
637
638
639
640 while feature2.has_key(_FORWARD): feature2 = feature2[_FORWARD]
641 feature2[_FORWARD] = feature1
642 for (fname, val2) in feature2.items():
643 if fname == _FORWARD: continue
644 val1 = feature1.get(fname)
645 feature1[fname] = _destructively_unify(val1, val2, bindings1, bindings2, memo)
646 return feature1
647
663
665 """
666 The unification procedure creates _bound variables_, which are Variable
667 objects that have been assigned a value. Bound variables are not useful
668 in the end result, however, so they should be replaced by their values.
669
670 This procedure takes a mapping, which may be a feature structure or a
671 binding dictionary, and replaces bound variables with their values.
672
673 If the dictionary is a binding dictionary, then 'remove' should be set to
674 True. This ensures that unbound, unaliased variables are removed from the
675 dictionary. If the variable name 'x' is mapped to the unbound variable ?x,
676 then, it should be removed. This is not done with features, because a
677 feature named 'x' can of course have a variable ?x as its value.
678 """
679 if isinstance(mapping, Variable):
680
681
682
683
684
685 var = mapping
686 if var.value() is not None:
687 return var.value()
688 else:
689 return var.forwarded_self()
690 if not isMapping(mapping): return mapping
691 if visited.has_key(id(mapping)): return mapping
692 visited[id(mapping)] = True
693
694 for fname, fval in mapping.items():
695 if isMapping(fval):
696 _lookup_values(fval, visited)
697 elif isinstance(fval, Variable):
698 if fval.value() is not None:
699 mapping[fname] = fval.value()
700 if isMapping(mapping[fname]):
701 _lookup_values(mapping[fname], visited)
702 else:
703 newval = fval.forwarded_self()
704 if remove and newval.name() == fname:
705 del mapping[fname]
706 else:
707 mapping[fname] = newval
708 return mapping
709
720
722 "Run unit tests on unification."
723 import doctest
724 doctest.testmod()
725
726 if __name__ == "__main__":
727 test()
728