Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | TypeIndexedValues |
2 | ================= | |
3 | ||
4 | <:StandardML:Standard ML> does not support ad hoc polymorphism. This | |
5 | presents a challenge to programmers. The problem is that at first | |
6 | glance there seems to be no practical way to implement something like | |
7 | a function for converting a value of any type to a string or a | |
8 | function for computing a hash value for a value of any type. | |
9 | Fortunately there are ways to implement type-indexed values in SML as | |
10 | discussed in <!Cite(Yang98)>. Various articles such as | |
11 | <!Cite(Danvy98)>, <!Cite(Ramsey11)>, <!Cite(Elsman04)>, | |
12 | <!Cite(Kennedy04)>, and <!Cite(Benton05)> also contain examples of | |
13 | type-indexed values. | |
14 | ||
15 | *NOTE:* The technique used in the following example uses an early (and | |
16 | somewhat broken) variation of the basic technique used in an | |
17 | experimental generic programming library (see | |
18 | <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>) that can | |
19 | be found from the MLton repository. The generic programming library | |
20 | also includes a more advanced generic pretty printing function (see | |
21 | <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/public/value/pretty.sig)>). | |
22 | ||
23 | == Example: Converting any SML value to (roughly) SML syntax == | |
24 | ||
25 | Consider the problem of converting any SML value to a textual | |
26 | presentation that matches the syntax of SML as closely as possible. | |
27 | One solution is a type-indexed function that maps a given type to a | |
28 | function that maps any value (of the type) to its textual | |
29 | presentation. A type-indexed function like this can be useful for a | |
30 | variety of purposes. For example, one could use it to show debugging | |
31 | information. We'll call this function "`show`". | |
32 | ||
33 | We'll do a fairly complete implementation of `show`. We do not | |
34 | distinguish infix and nonfix constructors, but that is not an | |
35 | intrinsic property of SML datatypes. We also don't reconstruct a type | |
36 | name for the value, although it would be particularly useful for | |
37 | functional values. To reconstruct type names, some changes would be | |
38 | needed and the reader is encouraged to consider how to do that. A | |
39 | more realistic implementation would use some pretty printing | |
40 | combinators to compute a layout for the result. This should be a | |
41 | relatively easy change (given a suitable pretty printing library). | |
42 | Cyclic values (through references and arrays) do not have a standard | |
43 | textual presentation and it is impossible to convert arbitrary | |
44 | functional values (within SML) to a meaningful textual presentation. | |
45 | Finally, it would also make sense to show sharing of references and | |
46 | arrays. We'll leave these improvements to an actual library | |
47 | implementation. | |
48 | ||
49 | The following code uses the <:Fixpoints:fixpoint framework> and other | |
50 | utilities from an Extended Basis library (see | |
51 | <!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>). | |
52 | ||
53 | === Signature === | |
54 | ||
55 | Let's consider the design of the `SHOW` signature: | |
56 | [source,sml] | |
57 | ---- | |
58 | infixr --> | |
59 | ||
60 | signature SHOW = sig | |
61 | type 'a t (* complete type-index *) | |
62 | type 'a s (* incomplete sum *) | |
63 | type ('a, 'k) p (* incomplete product *) | |
64 | type u (* tuple or unlabelled product *) | |
65 | type l (* record or labelled product *) | |
66 | ||
67 | val show : 'a t -> 'a -> string | |
68 | ||
69 | (* user-defined types *) | |
70 | val inj : ('a -> 'b) -> 'b t -> 'a t | |
71 | ||
72 | (* tuples and records *) | |
73 | val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p | |
74 | ||
75 | val U : 'a t -> ('a, u) p | |
76 | val L : string -> 'a t -> ('a, l) p | |
77 | ||
78 | val tuple : ('a, u) p -> 'a t | |
79 | val record : ('a, l) p -> 'a t | |
80 | ||
81 | (* datatypes *) | |
82 | val + : 'a s * 'b s -> (('a, 'b) sum) s | |
83 | ||
84 | val C0 : string -> unit s | |
85 | val C1 : string -> 'a t -> 'a s | |
86 | ||
87 | val data : 'a s -> 'a t | |
88 | ||
89 | val Y : 'a t Tie.t | |
90 | ||
91 | (* exceptions *) | |
92 | val exn : exn t | |
93 | val regExn : (exn -> ('a * 'a s) option) -> unit | |
94 | ||
95 | (* some built-in type constructors *) | |
96 | val refc : 'a t -> 'a ref t | |
97 | val array : 'a t -> 'a array t | |
98 | val list : 'a t -> 'a list t | |
99 | val vector : 'a t -> 'a vector t | |
100 | val --> : 'a t * 'b t -> ('a -> 'b) t | |
101 | ||
102 | (* some built-in base types *) | |
103 | val string : string t | |
104 | val unit : unit t | |
105 | val bool : bool t | |
106 | val char : char t | |
107 | val int : int t | |
108 | val word : word t | |
109 | val real : real t | |
110 | end | |
111 | ---- | |
112 | ||
113 | While some details are shaped by the specific requirements of `show`, | |
114 | there are a number of (design) patterns that translate to other | |
115 | type-indexed values. The former kind of details are mostly shaped by | |
116 | the syntax of SML values that `show` is designed to produce. To this | |
117 | end, abstract types and phantom types are used to distinguish | |
118 | incomplete record, tuple, and datatype type-indices from each other | |
119 | and from complete type-indices. Also, names of record labels and | |
120 | datatype constructors need to be provided by the user. | |
121 | ||
122 | ==== Arbitrary user-defined datatypes ==== | |
123 | ||
124 | Perhaps the most important pattern is how the design supports | |
125 | arbitrary user-defined datatypes. A number of combinators together | |
126 | conspire to provide the functionality. First of all, to support new | |
127 | user-defined types, a combinator taking a conversion function to a | |
128 | previously supported type is provided: | |
129 | [source,sml] | |
130 | ---- | |
131 | val inj : ('a -> 'b) -> 'b t -> 'a t | |
132 | ---- | |
133 | ||
134 | An injection function is sufficient in this case, but in the general | |
135 | case, an embedding with injection and projection functions may be | |
136 | needed. | |
137 | ||
138 | To support products (tuples and records) a product combinator is | |
139 | provided: | |
140 | [source,sml] | |
141 | ---- | |
142 | val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p | |
143 | ---- | |
144 | The second (phantom) type variable `'k` is there to distinguish | |
145 | between labelled and unlabelled products and the type `p` | |
146 | distinguishes incomplete products from complete type-indices of type | |
147 | `t`. Most type-indexed values do not need to make such distinctions. | |
148 | ||
149 | To support sums (datatypes) a sum combinator is provided: | |
150 | [source,sml] | |
151 | ---- | |
152 | val + : 'a s * 'b s -> (('a, 'b) sum) s | |
153 | ---- | |
154 | Again, the purpose of the type `s` is to distinguish incomplete sums | |
155 | from complete type-indices of type `t`, which usually isn't necessary. | |
156 | ||
157 | Finally, to support recursive datatypes, including sets of mutually | |
158 | recursive datatypes, a <:Fixpoints:fixpoint tier> is provided: | |
159 | [source,sml] | |
160 | ---- | |
161 | val Y : 'a t Tie.t | |
162 | ---- | |
163 | ||
164 | Together these combinators (with the more domain specific combinators | |
165 | `U`, `L`, `tuple`, `record`, `C0`, `C1`, and `data`) enable one to | |
166 | encode a type-index for any user-defined datatype. | |
167 | ||
168 | ==== Exceptions ==== | |
169 | ||
170 | The `exn` type in SML is a <:UniversalType:universal type> into which | |
171 | all types can be embedded. SML also allows a program to generate new | |
172 | exception variants at run-time. Thus a mechanism is required to register | |
173 | handlers for particular variants: | |
174 | [source,sml] | |
175 | ---- | |
176 | val exn : exn t | |
177 | val regExn : (exn -> ('a * 'a s) option) -> unit | |
178 | ---- | |
179 | ||
180 | The universal `exn` type-index then makes use of the registered | |
181 | handlers. The above particular form of handler, which converts an | |
182 | exception value to a value of some type and a type-index for that type | |
183 | (essentially an existential type) is designed to make it convenient to | |
184 | write handlers. To write a handler, one can conveniently reuse | |
185 | existing type-indices: | |
186 | [source,sml] | |
187 | ---- | |
188 | exception Int of int | |
189 | ||
190 | local | |
191 | open Show | |
192 | in | |
193 | val () = regExn (fn Int v => SOME (v, C1"Int" int) | |
194 | | _ => NONE) | |
195 | end | |
196 | ---- | |
197 | ||
198 | Note that a single handler may actually handle an arbitrary number of | |
199 | different exceptions. | |
200 | ||
201 | ==== Other types ==== | |
202 | ||
203 | Some built-in and standard types typically require special treatment | |
204 | due to their special nature. The most important of these are arrays | |
205 | and references, because cyclic data (ignoring closures) and observable | |
206 | sharing can only be constructed through them. | |
207 | ||
208 | When arrow types are really supported, unlike in this case, they | |
209 | usually need special treatment due to the contravariance of arguments. | |
210 | ||
211 | Lists and vectors require special treatment in the case of `show`, | |
212 | because of their special syntax. This isn't usually the case. | |
213 | ||
214 | The set of base types to support also needs to be considered unless | |
215 | one exports an interface for constructing type-indices for entirely | |
216 | new base types. | |
217 | ||
218 | == Usage == | |
219 | ||
220 | Before going to the implementation, let's look at some examples. For | |
221 | the following examples, we'll assume a structure binding | |
222 | `Show :> SHOW`. If you want to try the examples immediately, just | |
223 | skip forward to the implementation. | |
224 | ||
225 | To use `show`, one first needs a type-index, which is then given to | |
226 | `show`. To show a list of integers, one would use the type-index | |
227 | `list int`, which has the type `int list Show.t`: | |
228 | [source,sml] | |
229 | ---- | |
230 | val "[3, 1, 4]" = | |
231 | let open Show in show (list int) end | |
232 | [3, 1, 4] | |
233 | ---- | |
234 | ||
235 | Likewise, to show a list of lists of characters, one would use the | |
236 | type-index `list (list char)`, which has the type `char list list | |
237 | Show.t`: | |
238 | [source,sml] | |
239 | ---- | |
240 | val "[[#\"a\", #\"b\", #\"c\"], []]" = | |
241 | let open Show in show (list (list char)) end | |
242 | [[#"a", #"b", #"c"], []] | |
243 | ---- | |
244 | ||
245 | Handling standard types is not particularly interesting. It is more | |
246 | interesting to see how user-defined types can be handled. Although | |
247 | the `option` datatype is a standard type, it requires no special | |
248 | support, so we can treat it as a user-defined type. Options can be | |
249 | encoded easily using a sum: | |
250 | [source,sml] | |
251 | ---- | |
252 | fun option t = let | |
253 | open Show | |
254 | in | |
255 | inj (fn NONE => INL () | |
256 | | SOME v => INR v) | |
257 | (data (C0"NONE" + C1"SOME" t)) | |
258 | end | |
259 | ||
260 | val "SOME 5" = | |
261 | let open Show in show (option int) end | |
262 | (SOME 5) | |
263 | ---- | |
264 | ||
265 | Readers new to type-indexed values might want to type annotate each | |
266 | subexpression of the above example as an exercise. (Use a compiler to | |
267 | check your annotations.) | |
268 | ||
269 | Using a product, user specified records can be also be encoded easily: | |
270 | [source,sml] | |
271 | ---- | |
272 | val abc = let | |
273 | open Show | |
274 | in | |
275 | inj (fn {a, b, c} => a & b & c) | |
276 | (record (L"a" (option int) * | |
277 | L"b" real * | |
278 | L"c" bool)) | |
279 | end | |
280 | ||
281 | val "{a = SOME 1, b = 3.0, c = false}" = | |
282 | let open Show in show abc end | |
283 | {a = SOME 1, b = 3.0, c = false} | |
284 | ---- | |
285 | ||
286 | As you can see, both of the above use `inj` to inject user-defined | |
287 | types to the general purpose sum and product types. | |
288 | ||
289 | Of particular interest is whether recursive datatypes and cyclic data | |
290 | can be handled. For example, how does one write a type-index for a | |
291 | recursive datatype such as a cyclic graph? | |
292 | [source,sml] | |
293 | ---- | |
294 | datatype 'a graph = VTX of 'a * 'a graph list ref | |
295 | fun arcs (VTX (_, r)) = r | |
296 | ---- | |
297 | ||
298 | Using the `Show` combinators, we could first write a new type-index | |
299 | combinator for `graph`: | |
300 | [source,sml] | |
301 | ---- | |
302 | fun graph a = let | |
303 | open Tie Show | |
304 | in | |
305 | fix Y (fn graph_a => | |
306 | inj (fn VTX (x, y) => x & y) | |
307 | (data (C1"VTX" | |
308 | (tuple (U a * | |
309 | U (refc (list graph_a))))))) | |
310 | end | |
311 | ---- | |
312 | ||
313 | To show a graph with integer labels | |
314 | [source,sml] | |
315 | ---- | |
316 | val a_graph = let | |
317 | val a = VTX (1, ref []) | |
318 | val b = VTX (2, ref []) | |
319 | val c = VTX (3, ref []) | |
320 | val d = VTX (4, ref []) | |
321 | val e = VTX (5, ref []) | |
322 | val f = VTX (6, ref []) | |
323 | in | |
324 | arcs a := [b, d] | |
325 | ; arcs b := [c, e] | |
326 | ; arcs c := [a, f] | |
327 | ; arcs d := [f] | |
328 | ; arcs e := [d] | |
329 | ; arcs f := [e] | |
330 | ; a | |
331 | end | |
332 | ---- | |
333 | we could then simply write | |
334 | [source,sml] | |
335 | ---- | |
336 | val "VTX (1, ref [VTX (2, ref [VTX (3, ref [VTX (1, %0), \ | |
337 | \VTX (6, ref [VTX (5, ref [VTX (4, ref [VTX (6, %3)])])] as %3)]), \ | |
338 | \VTX (5, ref [VTX (4, ref [VTX (6, ref [VTX (5, %2)])])] as %2)]), \ | |
339 | \VTX (4, ref [VTX (6, ref [VTX (5, ref [VTX (4, %1)])])] as %1)] as %0)" = | |
340 | let open Show in show (graph int) end | |
341 | a_graph | |
342 | ---- | |
343 | ||
344 | There is a subtle gotcha with cyclic data. Consider the following code: | |
345 | [source,sml] | |
346 | ---- | |
347 | exception ExnArray of exn array | |
348 | ||
349 | val () = let | |
350 | open Show | |
351 | in | |
352 | regExn (fn ExnArray a => | |
353 | SOME (a, C1"ExnArray" (array exn)) | |
354 | | _ => NONE) | |
355 | end | |
356 | ||
357 | val a_cycle = let | |
358 | val a = Array.fromList [Empty] | |
359 | in | |
360 | Array.update (a, 0, ExnArray a) ; a | |
361 | end | |
362 | ---- | |
363 | ||
364 | Although the above looks innocent enough, the evaluation of | |
365 | [source,sml] | |
366 | ---- | |
367 | val "[|ExnArray %0|] as %0" = | |
368 | let open Show in show (array exn) end | |
369 | a_cycle | |
370 | ---- | |
371 | goes into an infinite loop. To avoid this problem, the type-index | |
372 | `array exn` must be evaluated only once, as in the following: | |
373 | [source,sml] | |
374 | ---- | |
375 | val array_exn = let open Show in array exn end | |
376 | ||
377 | exception ExnArray of exn array | |
378 | ||
379 | val () = let | |
380 | open Show | |
381 | in | |
382 | regExn (fn ExnArray a => | |
383 | SOME (a, C1"ExnArray" array_exn) | |
384 | | _ => NONE) | |
385 | end | |
386 | ||
387 | val a_cycle = let | |
388 | val a = Array.fromList [Empty] | |
389 | in | |
390 | Array.update (a, 0, ExnArray a) ; a | |
391 | end | |
392 | ||
393 | val "[|ExnArray %0|] as %0" = | |
394 | let open Show in show array_exn end | |
395 | a_cycle | |
396 | ---- | |
397 | ||
398 | Cyclic data (excluding closures) in Standard ML can only be | |
399 | constructed imperatively through arrays and references (combined with | |
400 | exceptions or recursive datatypes). Before recursing to a reference | |
401 | or an array, one needs to check whether that reference or array has | |
402 | already been seen before. When `ref` or `array` is called with a | |
403 | type-index, a new cyclicity checker is instantiated. | |
404 | ||
405 | == Implementation == | |
406 | ||
407 | [source,sml] | |
408 | ---- | |
409 | structure SmlSyntax = struct | |
410 | local | |
411 | structure CV = CharVector and C = Char | |
412 | in | |
413 | val isSym = Char.contains "!%&$#+-/:<=>?@\\~`^|*" | |
414 | ||
415 | fun isSymId s = 0 < size s andalso CV.all isSym s | |
416 | ||
417 | fun isAlphaNumId s = | |
418 | 0 < size s | |
419 | andalso C.isAlpha (CV.sub (s, 0)) | |
420 | andalso CV.all (fn c => C.isAlphaNum c | |
421 | orelse #"'" = c | |
422 | orelse #"_" = c) s | |
423 | ||
424 | fun isNumLabel s = | |
425 | 0 < size s | |
426 | andalso #"0" <> CV.sub (s, 0) | |
427 | andalso CV.all C.isDigit s | |
428 | ||
429 | fun isId s = isAlphaNumId s orelse isSymId s | |
430 | ||
431 | fun isLongId s = List.all isId (String.fields (#"." <\ op =) s) | |
432 | ||
433 | fun isLabel s = isId s orelse isNumLabel s | |
434 | end | |
435 | end | |
436 | ||
437 | structure Show :> SHOW = struct | |
438 | datatype 'a t = IN of exn list * 'a -> bool * string | |
439 | type 'a s = 'a t | |
440 | type ('a, 'k) p = 'a t | |
441 | type u = unit | |
442 | type l = unit | |
443 | ||
444 | fun show (IN t) x = #2 (t ([], x)) | |
445 | ||
446 | (* user-defined types *) | |
447 | fun inj inj (IN b) = IN (b o Pair.map (id, inj)) | |
448 | ||
449 | local | |
450 | fun surround pre suf (_, s) = (false, concat [pre, s, suf]) | |
451 | fun parenthesize x = if #1 x then surround "(" ")" x else x | |
452 | fun construct tag = | |
453 | (fn (_, s) => (true, concat [tag, " ", s])) o parenthesize | |
454 | fun check p m s = if p s then () else raise Fail (m^s) | |
455 | in | |
456 | (* tuples and records *) | |
457 | fun (IN l) * (IN r) = | |
458 | IN (fn (rs, a & b) => | |
459 | (false, concat [#2 (l (rs, a)), | |
460 | ", ", | |
461 | #2 (r (rs, b))])) | |
462 | ||
463 | val U = id | |
464 | fun L l = (check SmlSyntax.isLabel "Invalid label: " l | |
465 | ; fn IN t => IN (surround (l^" = ") "" o t)) | |
466 | ||
467 | fun tuple (IN t) = IN (surround "(" ")" o t) | |
468 | fun record (IN t) = IN (surround "{" "}" o t) | |
469 | ||
470 | (* datatypes *) | |
471 | fun (IN l) + (IN r) = IN (fn (rs, INL a) => l (rs, a) | |
472 | | (rs, INR b) => r (rs, b)) | |
473 | ||
474 | fun C0 c = (check SmlSyntax.isId "Invalid constructor: " c | |
475 | ; IN (const (false, c))) | |
476 | fun C1 c (IN t) = (check SmlSyntax.isId "Invalid constructor: " c | |
477 | ; IN (construct c o t)) | |
478 | ||
479 | val data = id | |
480 | ||
481 | fun Y ? = Tie.iso Tie.function (fn IN x => x, IN) ? | |
482 | ||
483 | (* exceptions *) | |
484 | local | |
485 | val handlers = ref ([] : (exn -> unit t option) list) | |
486 | in | |
487 | val exn = IN (fn (rs, e) => let | |
488 | fun lp [] = | |
489 | C0(concat ["<exn:", | |
490 | General.exnName e, | |
491 | ">"]) | |
492 | | lp (f::fs) = | |
493 | case f e | |
494 | of NONE => lp fs | |
495 | | SOME t => t | |
496 | val IN f = lp (!handlers) | |
497 | in | |
498 | f (rs, ()) | |
499 | end) | |
500 | fun regExn f = | |
501 | handlers := (Option.map | |
502 | (fn (x, IN f) => | |
503 | IN (fn (rs, ()) => | |
504 | f (rs, x))) o f) | |
505 | :: !handlers | |
506 | end | |
507 | ||
508 | (* some built-in type constructors *) | |
509 | local | |
510 | fun cyclic (IN t) = let | |
511 | exception E of ''a * bool ref | |
512 | in | |
513 | IN (fn (rs, v : ''a) => let | |
514 | val idx = Int.toString o length | |
515 | fun lp (E (v', c)::rs) = | |
516 | if v' <> v then lp rs | |
517 | else (c := false ; (false, "%"^idx rs)) | |
518 | | lp (_::rs) = lp rs | |
519 | | lp [] = let | |
520 | val c = ref true | |
521 | val r = t (E (v, c)::rs, v) | |
522 | in | |
523 | if !c then r | |
524 | else surround "" (" as %"^idx rs) r | |
525 | end | |
526 | in | |
527 | lp rs | |
528 | end) | |
529 | end | |
530 | ||
531 | fun aggregate pre suf toList (IN t) = | |
532 | IN (surround pre suf o | |
533 | (fn (rs, a) => | |
534 | (false, | |
535 | String.concatWith | |
536 | ", " | |
537 | (map (#2 o curry t rs) | |
538 | (toList a))))) | |
539 | in | |
540 | fun refc ? = (cyclic o inj ! o C1"ref") ? | |
541 | fun array ? = (cyclic o aggregate "[|" "|]" (Array.foldr op:: [])) ? | |
542 | fun list ? = aggregate "[" "]" id ? | |
543 | fun vector ? = aggregate "#[" "]" (Vector.foldr op:: []) ? | |
544 | end | |
545 | ||
546 | fun (IN _) --> (IN _) = IN (const (false, "<fn>")) | |
547 | ||
548 | (* some built-in base types *) | |
549 | local | |
550 | fun mk toS = (fn x => (false, x)) o toS o (fn (_, x) => x) | |
551 | in | |
552 | val string = | |
553 | IN (surround "\"" "\"" o mk (String.translate Char.toString)) | |
554 | val unit = IN (mk (fn () => "()")) | |
555 | val bool = IN (mk Bool.toString) | |
556 | val char = IN (surround "#\"" "\"" o mk Char.toString) | |
557 | val int = IN (mk Int.toString) | |
558 | val word = IN (surround "0wx" "" o mk Word.toString) | |
559 | val real = IN (mk Real.toString) | |
560 | end | |
561 | end | |
562 | end | |
563 | ||
564 | (* Handlers for standard top-level exceptions *) | |
565 | val () = let | |
566 | open Show | |
567 | fun E0 name = SOME ((), C0 name) | |
568 | in | |
569 | regExn (fn Bind => E0"Bind" | |
570 | | Chr => E0"Chr" | |
571 | | Div => E0"Div" | |
572 | | Domain => E0"Domain" | |
573 | | Empty => E0"Empty" | |
574 | | Match => E0"Match" | |
575 | | Option => E0"Option" | |
576 | | Overflow => E0"Overflow" | |
577 | | Size => E0"Size" | |
578 | | Span => E0"Span" | |
579 | | Subscript => E0"Subscript" | |
580 | | _ => NONE) | |
581 | ; regExn (fn Fail s => SOME (s, C1"Fail" string) | |
582 | | _ => NONE) | |
583 | end | |
584 | ---- | |
585 | ||
586 | ||
587 | == Also see == | |
588 | ||
589 | There are a number of related techniques. Here are some of them. | |
590 | ||
591 | * <:Fold:> | |
592 | * <:StaticSum:> |