Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / TypeIndexedValues.adoc
CommitLineData
7f918cf1
CE
1TypeIndexedValues
2=================
3
4<:StandardML:Standard ML> does not support ad hoc polymorphism. This
5presents a challenge to programmers. The problem is that at first
6glance there seems to be no practical way to implement something like
7a function for converting a value of any type to a string or a
8function for computing a hash value for a value of any type.
9Fortunately there are ways to implement type-indexed values in SML as
10discussed in <!Cite(Yang98)>. Various articles such as
11<!Cite(Danvy98)>, <!Cite(Ramsey11)>, <!Cite(Elsman04)>,
12<!Cite(Kennedy04)>, and <!Cite(Benton05)> also contain examples of
13type-indexed values.
14
15*NOTE:* The technique used in the following example uses an early (and
16somewhat broken) variation of the basic technique used in an
17experimental generic programming library (see
18<!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>) that can
19be found from the MLton repository. The generic programming library
20also includes a more advanced generic pretty printing function (see
21<!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/public/value/pretty.sig)>).
22
23== Example: Converting any SML value to (roughly) SML syntax ==
24
25Consider the problem of converting any SML value to a textual
26presentation that matches the syntax of SML as closely as possible.
27One solution is a type-indexed function that maps a given type to a
28function that maps any value (of the type) to its textual
29presentation. A type-indexed function like this can be useful for a
30variety of purposes. For example, one could use it to show debugging
31information. We'll call this function "`show`".
32
33We'll do a fairly complete implementation of `show`. We do not
34distinguish infix and nonfix constructors, but that is not an
35intrinsic property of SML datatypes. We also don't reconstruct a type
36name for the value, although it would be particularly useful for
37functional values. To reconstruct type names, some changes would be
38needed and the reader is encouraged to consider how to do that. A
39more realistic implementation would use some pretty printing
40combinators to compute a layout for the result. This should be a
41relatively easy change (given a suitable pretty printing library).
42Cyclic values (through references and arrays) do not have a standard
43textual presentation and it is impossible to convert arbitrary
44functional values (within SML) to a meaningful textual presentation.
45Finally, it would also make sense to show sharing of references and
46arrays. We'll leave these improvements to an actual library
47implementation.
48
49The following code uses the <:Fixpoints:fixpoint framework> and other
50utilities from an Extended Basis library (see
51<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>).
52
53=== Signature ===
54
55Let's consider the design of the `SHOW` signature:
56[source,sml]
57----
58infixr -->
59
60signature SHOW = sig
61 type 'a t (* complete type-index *)
62 type 'a s (* incomplete sum *)
63 type ('a, 'k) p (* incomplete product *)
64 type u (* tuple or unlabelled product *)
65 type l (* record or labelled product *)
66
67 val show : 'a t -> 'a -> string
68
69 (* user-defined types *)
70 val inj : ('a -> 'b) -> 'b t -> 'a t
71
72 (* tuples and records *)
73 val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
74
75 val U : 'a t -> ('a, u) p
76 val L : string -> 'a t -> ('a, l) p
77
78 val tuple : ('a, u) p -> 'a t
79 val record : ('a, l) p -> 'a t
80
81 (* datatypes *)
82 val + : 'a s * 'b s -> (('a, 'b) sum) s
83
84 val C0 : string -> unit s
85 val C1 : string -> 'a t -> 'a s
86
87 val data : 'a s -> 'a t
88
89 val Y : 'a t Tie.t
90
91 (* exceptions *)
92 val exn : exn t
93 val regExn : (exn -> ('a * 'a s) option) -> unit
94
95 (* some built-in type constructors *)
96 val refc : 'a t -> 'a ref t
97 val array : 'a t -> 'a array t
98 val list : 'a t -> 'a list t
99 val vector : 'a t -> 'a vector t
100 val --> : 'a t * 'b t -> ('a -> 'b) t
101
102 (* some built-in base types *)
103 val string : string t
104 val unit : unit t
105 val bool : bool t
106 val char : char t
107 val int : int t
108 val word : word t
109 val real : real t
110end
111----
112
113While some details are shaped by the specific requirements of `show`,
114there are a number of (design) patterns that translate to other
115type-indexed values. The former kind of details are mostly shaped by
116the syntax of SML values that `show` is designed to produce. To this
117end, abstract types and phantom types are used to distinguish
118incomplete record, tuple, and datatype type-indices from each other
119and from complete type-indices. Also, names of record labels and
120datatype constructors need to be provided by the user.
121
122==== Arbitrary user-defined datatypes ====
123
124Perhaps the most important pattern is how the design supports
125arbitrary user-defined datatypes. A number of combinators together
126conspire to provide the functionality. First of all, to support new
127user-defined types, a combinator taking a conversion function to a
128previously supported type is provided:
129[source,sml]
130----
131val inj : ('a -> 'b) -> 'b t -> 'a t
132----
133
134An injection function is sufficient in this case, but in the general
135case, an embedding with injection and projection functions may be
136needed.
137
138To support products (tuples and records) a product combinator is
139provided:
140[source,sml]
141----
142val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
143----
144The second (phantom) type variable `'k` is there to distinguish
145between labelled and unlabelled products and the type `p`
146distinguishes incomplete products from complete type-indices of type
147`t`. Most type-indexed values do not need to make such distinctions.
148
149To support sums (datatypes) a sum combinator is provided:
150[source,sml]
151----
152val + : 'a s * 'b s -> (('a, 'b) sum) s
153----
154Again, the purpose of the type `s` is to distinguish incomplete sums
155from complete type-indices of type `t`, which usually isn't necessary.
156
157Finally, to support recursive datatypes, including sets of mutually
158recursive datatypes, a <:Fixpoints:fixpoint tier> is provided:
159[source,sml]
160----
161val Y : 'a t Tie.t
162----
163
164Together these combinators (with the more domain specific combinators
165`U`, `L`, `tuple`, `record`, `C0`, `C1`, and `data`) enable one to
166encode a type-index for any user-defined datatype.
167
168==== Exceptions ====
169
170The `exn` type in SML is a <:UniversalType:universal type> into which
171all types can be embedded. SML also allows a program to generate new
172exception variants at run-time. Thus a mechanism is required to register
173handlers for particular variants:
174[source,sml]
175----
176val exn : exn t
177val regExn : (exn -> ('a * 'a s) option) -> unit
178----
179
180The universal `exn` type-index then makes use of the registered
181handlers. The above particular form of handler, which converts an
182exception value to a value of some type and a type-index for that type
183(essentially an existential type) is designed to make it convenient to
184write handlers. To write a handler, one can conveniently reuse
185existing type-indices:
186[source,sml]
187----
188exception Int of int
189
190local
191 open Show
192in
193 val () = regExn (fn Int v => SOME (v, C1"Int" int)
194 | _ => NONE)
195end
196----
197
198Note that a single handler may actually handle an arbitrary number of
199different exceptions.
200
201==== Other types ====
202
203Some built-in and standard types typically require special treatment
204due to their special nature. The most important of these are arrays
205and references, because cyclic data (ignoring closures) and observable
206sharing can only be constructed through them.
207
208When arrow types are really supported, unlike in this case, they
209usually need special treatment due to the contravariance of arguments.
210
211Lists and vectors require special treatment in the case of `show`,
212because of their special syntax. This isn't usually the case.
213
214The set of base types to support also needs to be considered unless
215one exports an interface for constructing type-indices for entirely
216new base types.
217
218== Usage ==
219
220Before going to the implementation, let's look at some examples. For
221the following examples, we'll assume a structure binding
222`Show :> SHOW`. If you want to try the examples immediately, just
223skip forward to the implementation.
224
225To use `show`, one first needs a type-index, which is then given to
226`show`. To show a list of integers, one would use the type-index
227`list int`, which has the type `int list Show.t`:
228[source,sml]
229----
230val "[3, 1, 4]" =
231 let open Show in show (list int) end
232 [3, 1, 4]
233----
234
235Likewise, to show a list of lists of characters, one would use the
236type-index `list (list char)`, which has the type `char list list
237Show.t`:
238[source,sml]
239----
240val "[[#\"a\", #\"b\", #\"c\"], []]" =
241 let open Show in show (list (list char)) end
242 [[#"a", #"b", #"c"], []]
243----
244
245Handling standard types is not particularly interesting. It is more
246interesting to see how user-defined types can be handled. Although
247the `option` datatype is a standard type, it requires no special
248support, so we can treat it as a user-defined type. Options can be
249encoded easily using a sum:
250[source,sml]
251----
252fun option t = let
253 open Show
254in
255 inj (fn NONE => INL ()
256 | SOME v => INR v)
257 (data (C0"NONE" + C1"SOME" t))
258end
259
260val "SOME 5" =
261 let open Show in show (option int) end
262 (SOME 5)
263----
264
265Readers new to type-indexed values might want to type annotate each
266subexpression of the above example as an exercise. (Use a compiler to
267check your annotations.)
268
269Using a product, user specified records can be also be encoded easily:
270[source,sml]
271----
272val abc = let
273 open Show
274in
275 inj (fn {a, b, c} => a & b & c)
276 (record (L"a" (option int) *
277 L"b" real *
278 L"c" bool))
279end
280
281val "{a = SOME 1, b = 3.0, c = false}" =
282 let open Show in show abc end
283 {a = SOME 1, b = 3.0, c = false}
284----
285
286As you can see, both of the above use `inj` to inject user-defined
287types to the general purpose sum and product types.
288
289Of particular interest is whether recursive datatypes and cyclic data
290can be handled. For example, how does one write a type-index for a
291recursive datatype such as a cyclic graph?
292[source,sml]
293----
294datatype 'a graph = VTX of 'a * 'a graph list ref
295fun arcs (VTX (_, r)) = r
296----
297
298Using the `Show` combinators, we could first write a new type-index
299combinator for `graph`:
300[source,sml]
301----
302fun graph a = let
303 open Tie Show
304in
305 fix Y (fn graph_a =>
306 inj (fn VTX (x, y) => x & y)
307 (data (C1"VTX"
308 (tuple (U a *
309 U (refc (list graph_a)))))))
310end
311----
312
313To show a graph with integer labels
314[source,sml]
315----
316val a_graph = let
317 val a = VTX (1, ref [])
318 val b = VTX (2, ref [])
319 val c = VTX (3, ref [])
320 val d = VTX (4, ref [])
321 val e = VTX (5, ref [])
322 val f = VTX (6, ref [])
323in
324 arcs a := [b, d]
325 ; arcs b := [c, e]
326 ; arcs c := [a, f]
327 ; arcs d := [f]
328 ; arcs e := [d]
329 ; arcs f := [e]
330 ; a
331end
332----
333we could then simply write
334[source,sml]
335----
336val "VTX (1, ref [VTX (2, ref [VTX (3, ref [VTX (1, %0), \
337 \VTX (6, ref [VTX (5, ref [VTX (4, ref [VTX (6, %3)])])] as %3)]), \
338 \VTX (5, ref [VTX (4, ref [VTX (6, ref [VTX (5, %2)])])] as %2)]), \
339 \VTX (4, ref [VTX (6, ref [VTX (5, ref [VTX (4, %1)])])] as %1)] as %0)" =
340 let open Show in show (graph int) end
341 a_graph
342----
343
344There is a subtle gotcha with cyclic data. Consider the following code:
345[source,sml]
346----
347exception ExnArray of exn array
348
349val () = let
350 open Show
351in
352 regExn (fn ExnArray a =>
353 SOME (a, C1"ExnArray" (array exn))
354 | _ => NONE)
355end
356
357val a_cycle = let
358 val a = Array.fromList [Empty]
359in
360 Array.update (a, 0, ExnArray a) ; a
361end
362----
363
364Although the above looks innocent enough, the evaluation of
365[source,sml]
366----
367val "[|ExnArray %0|] as %0" =
368 let open Show in show (array exn) end
369 a_cycle
370----
371goes into an infinite loop. To avoid this problem, the type-index
372`array exn` must be evaluated only once, as in the following:
373[source,sml]
374----
375val array_exn = let open Show in array exn end
376
377exception ExnArray of exn array
378
379val () = let
380 open Show
381in
382 regExn (fn ExnArray a =>
383 SOME (a, C1"ExnArray" array_exn)
384 | _ => NONE)
385end
386
387val a_cycle = let
388 val a = Array.fromList [Empty]
389in
390 Array.update (a, 0, ExnArray a) ; a
391end
392
393val "[|ExnArray %0|] as %0" =
394 let open Show in show array_exn end
395 a_cycle
396----
397
398Cyclic data (excluding closures) in Standard ML can only be
399constructed imperatively through arrays and references (combined with
400exceptions or recursive datatypes). Before recursing to a reference
401or an array, one needs to check whether that reference or array has
402already been seen before. When `ref` or `array` is called with a
403type-index, a new cyclicity checker is instantiated.
404
405== Implementation ==
406
407[source,sml]
408----
409structure SmlSyntax = struct
410 local
411 structure CV = CharVector and C = Char
412 in
413 val isSym = Char.contains "!%&$#+-/:<=>?@\\~`^|*"
414
415 fun isSymId s = 0 < size s andalso CV.all isSym s
416
417 fun isAlphaNumId s =
418 0 < size s
419 andalso C.isAlpha (CV.sub (s, 0))
420 andalso CV.all (fn c => C.isAlphaNum c
421 orelse #"'" = c
422 orelse #"_" = c) s
423
424 fun isNumLabel s =
425 0 < size s
426 andalso #"0" <> CV.sub (s, 0)
427 andalso CV.all C.isDigit s
428
429 fun isId s = isAlphaNumId s orelse isSymId s
430
431 fun isLongId s = List.all isId (String.fields (#"." <\ op =) s)
432
433 fun isLabel s = isId s orelse isNumLabel s
434 end
435end
436
437structure Show :> SHOW = struct
438 datatype 'a t = IN of exn list * 'a -> bool * string
439 type 'a s = 'a t
440 type ('a, 'k) p = 'a t
441 type u = unit
442 type l = unit
443
444 fun show (IN t) x = #2 (t ([], x))
445
446 (* user-defined types *)
447 fun inj inj (IN b) = IN (b o Pair.map (id, inj))
448
449 local
450 fun surround pre suf (_, s) = (false, concat [pre, s, suf])
451 fun parenthesize x = if #1 x then surround "(" ")" x else x
452 fun construct tag =
453 (fn (_, s) => (true, concat [tag, " ", s])) o parenthesize
454 fun check p m s = if p s then () else raise Fail (m^s)
455 in
456 (* tuples and records *)
457 fun (IN l) * (IN r) =
458 IN (fn (rs, a & b) =>
459 (false, concat [#2 (l (rs, a)),
460 ", ",
461 #2 (r (rs, b))]))
462
463 val U = id
464 fun L l = (check SmlSyntax.isLabel "Invalid label: " l
465 ; fn IN t => IN (surround (l^" = ") "" o t))
466
467 fun tuple (IN t) = IN (surround "(" ")" o t)
468 fun record (IN t) = IN (surround "{" "}" o t)
469
470 (* datatypes *)
471 fun (IN l) + (IN r) = IN (fn (rs, INL a) => l (rs, a)
472 | (rs, INR b) => r (rs, b))
473
474 fun C0 c = (check SmlSyntax.isId "Invalid constructor: " c
475 ; IN (const (false, c)))
476 fun C1 c (IN t) = (check SmlSyntax.isId "Invalid constructor: " c
477 ; IN (construct c o t))
478
479 val data = id
480
481 fun Y ? = Tie.iso Tie.function (fn IN x => x, IN) ?
482
483 (* exceptions *)
484 local
485 val handlers = ref ([] : (exn -> unit t option) list)
486 in
487 val exn = IN (fn (rs, e) => let
488 fun lp [] =
489 C0(concat ["<exn:",
490 General.exnName e,
491 ">"])
492 | lp (f::fs) =
493 case f e
494 of NONE => lp fs
495 | SOME t => t
496 val IN f = lp (!handlers)
497 in
498 f (rs, ())
499 end)
500 fun regExn f =
501 handlers := (Option.map
502 (fn (x, IN f) =>
503 IN (fn (rs, ()) =>
504 f (rs, x))) o f)
505 :: !handlers
506 end
507
508 (* some built-in type constructors *)
509 local
510 fun cyclic (IN t) = let
511 exception E of ''a * bool ref
512 in
513 IN (fn (rs, v : ''a) => let
514 val idx = Int.toString o length
515 fun lp (E (v', c)::rs) =
516 if v' <> v then lp rs
517 else (c := false ; (false, "%"^idx rs))
518 | lp (_::rs) = lp rs
519 | lp [] = let
520 val c = ref true
521 val r = t (E (v, c)::rs, v)
522 in
523 if !c then r
524 else surround "" (" as %"^idx rs) r
525 end
526 in
527 lp rs
528 end)
529 end
530
531 fun aggregate pre suf toList (IN t) =
532 IN (surround pre suf o
533 (fn (rs, a) =>
534 (false,
535 String.concatWith
536 ", "
537 (map (#2 o curry t rs)
538 (toList a)))))
539 in
540 fun refc ? = (cyclic o inj ! o C1"ref") ?
541 fun array ? = (cyclic o aggregate "[|" "|]" (Array.foldr op:: [])) ?
542 fun list ? = aggregate "[" "]" id ?
543 fun vector ? = aggregate "#[" "]" (Vector.foldr op:: []) ?
544 end
545
546 fun (IN _) --> (IN _) = IN (const (false, "<fn>"))
547
548 (* some built-in base types *)
549 local
550 fun mk toS = (fn x => (false, x)) o toS o (fn (_, x) => x)
551 in
552 val string =
553 IN (surround "\"" "\"" o mk (String.translate Char.toString))
554 val unit = IN (mk (fn () => "()"))
555 val bool = IN (mk Bool.toString)
556 val char = IN (surround "#\"" "\"" o mk Char.toString)
557 val int = IN (mk Int.toString)
558 val word = IN (surround "0wx" "" o mk Word.toString)
559 val real = IN (mk Real.toString)
560 end
561 end
562end
563
564(* Handlers for standard top-level exceptions *)
565val () = let
566 open Show
567 fun E0 name = SOME ((), C0 name)
568in
569 regExn (fn Bind => E0"Bind"
570 | Chr => E0"Chr"
571 | Div => E0"Div"
572 | Domain => E0"Domain"
573 | Empty => E0"Empty"
574 | Match => E0"Match"
575 | Option => E0"Option"
576 | Overflow => E0"Overflow"
577 | Size => E0"Size"
578 | Span => E0"Span"
579 | Subscript => E0"Subscript"
580 | _ => NONE)
581 ; regExn (fn Fail s => SOME (s, C1"Fail" string)
582 | _ => NONE)
583end
584----
585
586
587== Also see ==
588
589There are a number of related techniques. Here are some of them.
590
591* <:Fold:>
592* <:StaticSum:>