Commit | Line | Data |
---|---|---|
34e49164 C |
1 | (** |
2 | Generalized suffix trees (GSTs). | |
3 | ||
4 | Computes generalized suffix trees from list of strings. A terminal symbol is | |
5 | implicitly added to them, but is not returned in the word labeling nodes and | |
6 | leaves. This should allow a rather transparent handling of GSTs. | |
7 | ||
8 | Node-based accesses are provided (sequences, root, children, suffix links, | |
9 | node labels, index), as well as a functional for synthesizing attributes from | |
10 | a GST. A readable representation of GSTs is derived from the later. | |
ae4735db | 11 | |
34e49164 C |
12 | *) |
13 | (* made by Sebastien Ferre *) | |
14 | ||
15 | (* extension by Yoann Padioleau: the function add, allowing to extend a gst | |
16 | (internally use now a DynArray instead of an Array) | |
17 | *) | |
18 | ||
19 | type node | |
20 | (** Type of nodes in GSTs. *) | |
21 | ||
22 | type t | |
23 | (** Type of GSTs. *) | |
24 | ||
25 | val make : string list -> t | |
26 | (** [make l_str] computes a GST based on the set of strings given in [l_str]. *) | |
27 | ||
28 | val add : string -> t -> unit | |
29 | (** [add l_str gst] add a new string in the GST. Does it via a side effect. *) | |
30 | ||
31 | val string_list : t -> string list | |
32 | (** [string_list gst] returns the list of strings from which [gst] was computed. *) | |
33 | ||
34 | val string : t -> int -> string | |
35 | (** [string gst k] returns the sequence number [k] (starting from 0). *) | |
36 | ||
37 | val root : t -> node | |
38 | (** [root gst] returns the root node of the gst. *) | |
39 | ||
40 | val word : t -> node -> string | |
41 | (** [word gst n] returns the word labeling node [n] in [gst]. *) | |
42 | ||
43 | val children : t -> node -> node list | |
44 | (** [children gst n] returns a list of the children nodes of [n] in [gst]. *) | |
45 | ||
46 | val linked_node : t -> node -> node | |
47 | (** [linked_node gst n] returns the node pointed by the suffix link from [n] in [gst]. *) | |
48 | ||
49 | val index : t -> node -> int * int | |
50 | (** [index gst n] returns the index of a leaf [n] in [gst]. | |
51 | This index is a pair [(k,i)], where [k] is the number of the sequence (as used by [string]), and | |
52 | [i] is the position of the related suffix (starting from [0] as usual in strings). | |
53 | @raise Invalid_argument "Suffix_tree.index: not a leaf" if [n] is not a leaf (has some child). *) | |
54 | ||
55 | val implicit_node : t -> string -> node * string * node | |
56 | (** [implicit_node gst word] returns an implicit_node [(node,word',child)], where [node] is the lowest | |
57 | node in the suffix tre such that the concatenation of the word recognized by [node] and [word'] is | |
58 | equal to [word], if [word'] is not the empty string, then [child] is the child node of [node], whose | |
59 | label has [word'] as a prefix. | |
60 | @raise Not_found when [word] is not a substring of [string_list gst]. *) | |
61 | ||
62 | ||
63 | val fold : t -> ('h -> node -> bool) -> ('h -> node -> 'h) -> ('s list -> 'h -> node -> 's) -> 'h -> 's | |
64 | (** [fold gst filter herit synth init] computes some attribute(s) over a GST by using the 3 functions | |
65 | [filter], [herit], [synth], and the initial value [init] inherited by the root node. ['h] is the type | |
66 | of inherited attributes, and ['s] is the type of synthesized attributes, and so the type of the result. | |
67 | ||
68 | The meaning of 3 functions is as follows: | |
69 | - [filter h child] returns [true] if the node [child] must be explored given the inherited value of the current | |
70 | node (parent of [child]), | |
71 | - [herit h child] returns the value inherited by [child] given the inherited value of the current node | |
72 | (parent of [child]), | |
73 | - [synth l h node] returns the synthesized value of the current node, given its inherited value [h], and | |
74 | the list [l] of synthesized values of explored children of [node] (according to [filter]). | |
75 | ||
76 | *) | |
77 | ||
78 | val fold_node : t -> ('h -> node -> bool) -> ('h -> node -> 'h) -> ('s list -> 'h -> node -> 's) -> 'h -> node -> 's | |
79 | (** Same as [fold], except the computation starts and finishes at the last argument node. *) | |
80 | ||
81 | val fold_s : t -> ('s list -> node -> 's) -> 's | |
ae4735db | 82 | (** [fold_s gst synth] is equivalent to [fold gst filter herit synth init], where there is no filtering, and |
34e49164 C |
83 | no inherited values: purely synthetic. *) |
84 | ||
85 | val fold_s_node : t -> ('s list -> node -> 's) -> node -> 's | |
86 | (** Same as [fold_s], except the computation starts and finishes at the last argument node. *) | |
87 | ||
88 | val fold_fs : t -> (node -> bool) -> ('s list -> node -> 's) -> 's | |
89 | (** [fold_fs gst filter synth] is equivalent to [fold gst filter herit synth init], where there is no inherited | |
90 | values. *) | |
91 | ||
92 | type tree = Node of string * tree list | Leaf of string * (int * int) | |
93 | val readable : t -> tree | |
94 | (** [readable gst] returns a (more) readable representation of [gst]. | |
95 | Each node and leaf is decorated by its word label, and leaves are | |
96 | also decorated by their index. *) | |
97 | ||
98 | val exact_matches : t -> string -> (int * int) list |