Package morfologik.fsa
Class FSA
java.lang.Object
morfologik.fsa.FSA
- All Implemented Interfaces:
Iterable<ByteBuffer>
- Direct Known Subclasses:
CFSA
,CFSA2
,ConstantArcSizeFSA
,FSA5
This is a top abstract class for handling finite state automata. These
automata are arc-based, a design described in Jan Daciuk's Incremental
Construction of Finite-State Automata and Transducers, and Their Use in the
Natural Language Processing (PhD thesis, Technical University of Gdansk).
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract int
getArc
(int node, byte label) int
getArcCount
(int node) abstract byte
getArcLabel
(int arc) abstract int
getEndNode
(int arc) abstract int
getFirstArc
(int node) getFlags()
abstract int
getNextArc
(int arc) int
getRightLanguageCount
(int node) abstract int
final Iterable
<ByteBuffer> getSequences
(int node) Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes.abstract boolean
isArcFinal
(int arc) abstract boolean
isArcTerminal
(int arc) final Iterator
<ByteBuffer> iterator()
Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes.static FSA
read
(InputStream stream) A factory for reading automata in any of the supported versions.static <T extends FSA>
Tread
(InputStream stream, Class<? extends T> clazz) A factory for reading a specific FSA subclass, including proper casting.protected static final byte[]
<T extends StateVisitor>
TvisitAllStates
(T v) Visit all states.private boolean
visitInPostOrder
(StateVisitor v, int node, BitSet visited) Private recursion.<T extends StateVisitor>
TvisitInPostOrder
(T v) Same asvisitInPostOrder(StateVisitor, int)
, starting from root automaton node.<T extends StateVisitor>
TvisitInPostOrder
(T v, int node) Visits all states reachable fromnode
in postorder.private void
visitInPreOrder
(StateVisitor v, int node, BitSet visited) Private recursion.<T extends StateVisitor>
TvisitInPreOrder
(T v) Same asvisitInPreOrder(StateVisitor, int)
, starting from root automaton node.<T extends StateVisitor>
TvisitInPreOrder
(T v, int node) Visits all states in preorder.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
FSA
public FSA()
-
-
Method Details
-
getRootNode
public abstract int getRootNode()- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public abstract int getFirstArc(int node) - Parameters:
node
- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
node
or 0 if the node has no outgoing arcs.
-
getNextArc
public abstract int getNextArc(int arc) - Parameters:
arc
- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arc
and leavingnode
. Zero is returned if no more arcs are available for the node.
-
getArc
public abstract int getArc(int node, byte label) - Parameters:
node
- Identifier of the node.label
- The arc's label.- Returns:
- Returns the identifier of an arc leaving
node
and labeled withlabel
. An identifier equal to 0 means the node has no outgoing arc labeledlabel
.
-
getArcLabel
public abstract byte getArcLabel(int arc) - Parameters:
arc
- The arc's identifier.- Returns:
- Return the label associated with a given
arc
.
-
isArcFinal
public abstract boolean isArcFinal(int arc) - Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if the destination node at the end of thisarc
corresponds to an input sequence created when building this automaton.
-
isArcTerminal
public abstract boolean isArcTerminal(int arc) - Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if thisarc
does not have a terminating node (@linkgetEndNode(int)
will throw an exception). ImpliesisArcFinal(int)
.
-
getEndNode
public abstract int getEndNode(int arc) - Parameters:
arc
- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc
. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getFlags
- Returns:
- Returns a set of flags for this FSA instance.
-
getArcCount
public int getArcCount(int node) - Parameters:
node
- Identifier of the node.- Returns:
- Calculates and returns the number of arcs of a given node.
-
getRightLanguageCount
public int getRightLanguageCount(int node) - Parameters:
node
- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSAFlags.NUMBERS
. The size of the right language of the state, in other words. - Throws:
UnsupportedOperationException
- If the automaton was not compiled withFSAFlags.NUMBERS
. The value can then be computed by manual count ofgetSequences(int)
.
-
getSequences
Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes. This corresponds to a set of suffixes of a given prefix from all sequences stored in the automaton.The returned iterator is a
ByteBuffer
whose contents changes on each call toIterator.next()
. The keep the contents between calls toIterator.next()
, one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Parameters:
node
- Identifier of the starting node from which to return subsequences.- Returns:
- An iterable over all sequences encoded starting at the given node.
-
getSequences
- Returns:
- Returns all sequences encoded in the automaton.
-
iterator
Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes. The returned iterator is aByteBuffer
whose contents changes on each call toIterator.next()
. The keep the contents between calls toIterator.next()
, one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Specified by:
iterator
in interfaceIterable<ByteBuffer>
-
visitAllStates
Visit all states. The order of visiting is undefined. This method may be faster than traversing the automaton in post or preorder since it can scan states linearly. Returning false fromStateVisitor.accept(int)
immediately terminates the traversal.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
Same asvisitInPostOrder(StateVisitor, int)
, starting from root automaton node.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
Visits all states reachable fromnode
in postorder. Returning false fromStateVisitor.accept(int)
immediately terminates the traversal.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.node
- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
Private recursion. -
visitInPreOrder
Same asvisitInPreOrder(StateVisitor, int)
, starting from root automaton node.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPreOrder
Visits all states in preorder. Returning false fromStateVisitor.accept(int)
skips traversal of all sub-states of a given state.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.node
- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
readRemaining
- Parameters:
in
- The input stream.- Returns:
- Reads all remaining bytes from an input stream and returns them as a byte array.
- Throws:
IOException
- Rethrown if an I/O exception occurs.
-
visitInPreOrder
Private recursion. -
read
A factory for reading automata in any of the supported versions.- Parameters:
stream
- The input stream to read automaton data from. The stream is not closed.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException
- If the input stream does not represent an automaton or is otherwise invalid.
-
read
public static <T extends FSA> T read(InputStream stream, Class<? extends T> clazz) throws IOException A factory for reading a specific FSA subclass, including proper casting.- Type Parameters:
T
- A subclass ofFSA
to cast the read automaton to.- Parameters:
stream
- The input stream to read automaton data from. The stream is not closed.clazz
- A subclass ofFSA
to cast the read automaton to.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException
- If the input stream does not represent an automaton, is otherwise invalid or the class of the automaton read from the input stream is not assignable toclazz
.
-