Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / TypeIndexedValues.adoc
1 TypeIndexedValues
2 =================
3
4 <:StandardML:Standard ML> does not support ad hoc polymorphism. This
5 presents a challenge to programmers. The problem is that at first
6 glance there seems to be no practical way to implement something like
7 a function for converting a value of any type to a string or a
8 function for computing a hash value for a value of any type.
9 Fortunately there are ways to implement type-indexed values in SML as
10 discussed in <!Cite(Yang98)>. Various articles such as
11 <!Cite(Danvy98)>, <!Cite(Ramsey11)>, <!Cite(Elsman04)>,
12 <!Cite(Kennedy04)>, and <!Cite(Benton05)> also contain examples of
13 type-indexed values.
14
15 *NOTE:* The technique used in the following example uses an early (and
16 somewhat broken) variation of the basic technique used in an
17 experimental generic programming library (see
18 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>) that can
19 be found from the MLton repository. The generic programming library
20 also includes a more advanced generic pretty printing function (see
21 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/public/value/pretty.sig)>).
22
23 == Example: Converting any SML value to (roughly) SML syntax ==
24
25 Consider the problem of converting any SML value to a textual
26 presentation that matches the syntax of SML as closely as possible.
27 One solution is a type-indexed function that maps a given type to a
28 function that maps any value (of the type) to its textual
29 presentation. A type-indexed function like this can be useful for a
30 variety of purposes. For example, one could use it to show debugging
31 information. We'll call this function "`show`".
32
33 We'll do a fairly complete implementation of `show`. We do not
34 distinguish infix and nonfix constructors, but that is not an
35 intrinsic property of SML datatypes. We also don't reconstruct a type
36 name for the value, although it would be particularly useful for
37 functional values. To reconstruct type names, some changes would be
38 needed and the reader is encouraged to consider how to do that. A
39 more realistic implementation would use some pretty printing
40 combinators to compute a layout for the result. This should be a
41 relatively easy change (given a suitable pretty printing library).
42 Cyclic values (through references and arrays) do not have a standard
43 textual presentation and it is impossible to convert arbitrary
44 functional values (within SML) to a meaningful textual presentation.
45 Finally, it would also make sense to show sharing of references and
46 arrays. We'll leave these improvements to an actual library
47 implementation.
48
49 The following code uses the <:Fixpoints:fixpoint framework> and other
50 utilities from an Extended Basis library (see
51 <!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>).
52
53 === Signature ===
54
55 Let's consider the design of the `SHOW` signature:
56 [source,sml]
57 ----
58 infixr -->
59
60 signature SHOW = sig
61 type 'a t (* complete type-index *)
62 type 'a s (* incomplete sum *)
63 type ('a, 'k) p (* incomplete product *)
64 type u (* tuple or unlabelled product *)
65 type l (* record or labelled product *)
66
67 val show : 'a t -> 'a -> string
68
69 (* user-defined types *)
70 val inj : ('a -> 'b) -> 'b t -> 'a t
71
72 (* tuples and records *)
73 val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
74
75 val U : 'a t -> ('a, u) p
76 val L : string -> 'a t -> ('a, l) p
77
78 val tuple : ('a, u) p -> 'a t
79 val record : ('a, l) p -> 'a t
80
81 (* datatypes *)
82 val + : 'a s * 'b s -> (('a, 'b) sum) s
83
84 val C0 : string -> unit s
85 val C1 : string -> 'a t -> 'a s
86
87 val data : 'a s -> 'a t
88
89 val Y : 'a t Tie.t
90
91 (* exceptions *)
92 val exn : exn t
93 val regExn : (exn -> ('a * 'a s) option) -> unit
94
95 (* some built-in type constructors *)
96 val refc : 'a t -> 'a ref t
97 val array : 'a t -> 'a array t
98 val list : 'a t -> 'a list t
99 val vector : 'a t -> 'a vector t
100 val --> : 'a t * 'b t -> ('a -> 'b) t
101
102 (* some built-in base types *)
103 val string : string t
104 val unit : unit t
105 val bool : bool t
106 val char : char t
107 val int : int t
108 val word : word t
109 val real : real t
110 end
111 ----
112
113 While some details are shaped by the specific requirements of `show`,
114 there are a number of (design) patterns that translate to other
115 type-indexed values. The former kind of details are mostly shaped by
116 the syntax of SML values that `show` is designed to produce. To this
117 end, abstract types and phantom types are used to distinguish
118 incomplete record, tuple, and datatype type-indices from each other
119 and from complete type-indices. Also, names of record labels and
120 datatype constructors need to be provided by the user.
121
122 ==== Arbitrary user-defined datatypes ====
123
124 Perhaps the most important pattern is how the design supports
125 arbitrary user-defined datatypes. A number of combinators together
126 conspire to provide the functionality. First of all, to support new
127 user-defined types, a combinator taking a conversion function to a
128 previously supported type is provided:
129 [source,sml]
130 ----
131 val inj : ('a -> 'b) -> 'b t -> 'a t
132 ----
133
134 An injection function is sufficient in this case, but in the general
135 case, an embedding with injection and projection functions may be
136 needed.
137
138 To support products (tuples and records) a product combinator is
139 provided:
140 [source,sml]
141 ----
142 val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
143 ----
144 The second (phantom) type variable `'k` is there to distinguish
145 between labelled and unlabelled products and the type `p`
146 distinguishes incomplete products from complete type-indices of type
147 `t`. Most type-indexed values do not need to make such distinctions.
148
149 To support sums (datatypes) a sum combinator is provided:
150 [source,sml]
151 ----
152 val + : 'a s * 'b s -> (('a, 'b) sum) s
153 ----
154 Again, the purpose of the type `s` is to distinguish incomplete sums
155 from complete type-indices of type `t`, which usually isn't necessary.
156
157 Finally, to support recursive datatypes, including sets of mutually
158 recursive datatypes, a <:Fixpoints:fixpoint tier> is provided:
159 [source,sml]
160 ----
161 val Y : 'a t Tie.t
162 ----
163
164 Together these combinators (with the more domain specific combinators
165 `U`, `L`, `tuple`, `record`, `C0`, `C1`, and `data`) enable one to
166 encode a type-index for any user-defined datatype.
167
168 ==== Exceptions ====
169
170 The `exn` type in SML is a <:UniversalType:universal type> into which
171 all types can be embedded. SML also allows a program to generate new
172 exception variants at run-time. Thus a mechanism is required to register
173 handlers for particular variants:
174 [source,sml]
175 ----
176 val exn : exn t
177 val regExn : (exn -> ('a * 'a s) option) -> unit
178 ----
179
180 The universal `exn` type-index then makes use of the registered
181 handlers. The above particular form of handler, which converts an
182 exception value to a value of some type and a type-index for that type
183 (essentially an existential type) is designed to make it convenient to
184 write handlers. To write a handler, one can conveniently reuse
185 existing type-indices:
186 [source,sml]
187 ----
188 exception Int of int
189
190 local
191 open Show
192 in
193 val () = regExn (fn Int v => SOME (v, C1"Int" int)
194 | _ => NONE)
195 end
196 ----
197
198 Note that a single handler may actually handle an arbitrary number of
199 different exceptions.
200
201 ==== Other types ====
202
203 Some built-in and standard types typically require special treatment
204 due to their special nature. The most important of these are arrays
205 and references, because cyclic data (ignoring closures) and observable
206 sharing can only be constructed through them.
207
208 When arrow types are really supported, unlike in this case, they
209 usually need special treatment due to the contravariance of arguments.
210
211 Lists and vectors require special treatment in the case of `show`,
212 because of their special syntax. This isn't usually the case.
213
214 The set of base types to support also needs to be considered unless
215 one exports an interface for constructing type-indices for entirely
216 new base types.
217
218 == Usage ==
219
220 Before going to the implementation, let's look at some examples. For
221 the following examples, we'll assume a structure binding
222 `Show :> SHOW`. If you want to try the examples immediately, just
223 skip forward to the implementation.
224
225 To use `show`, one first needs a type-index, which is then given to
226 `show`. To show a list of integers, one would use the type-index
227 `list int`, which has the type `int list Show.t`:
228 [source,sml]
229 ----
230 val "[3, 1, 4]" =
231 let open Show in show (list int) end
232 [3, 1, 4]
233 ----
234
235 Likewise, to show a list of lists of characters, one would use the
236 type-index `list (list char)`, which has the type `char list list
237 Show.t`:
238 [source,sml]
239 ----
240 val "[[#\"a\", #\"b\", #\"c\"], []]" =
241 let open Show in show (list (list char)) end
242 [[#"a", #"b", #"c"], []]
243 ----
244
245 Handling standard types is not particularly interesting. It is more
246 interesting to see how user-defined types can be handled. Although
247 the `option` datatype is a standard type, it requires no special
248 support, so we can treat it as a user-defined type. Options can be
249 encoded easily using a sum:
250 [source,sml]
251 ----
252 fun option t = let
253 open Show
254 in
255 inj (fn NONE => INL ()
256 | SOME v => INR v)
257 (data (C0"NONE" + C1"SOME" t))
258 end
259
260 val "SOME 5" =
261 let open Show in show (option int) end
262 (SOME 5)
263 ----
264
265 Readers new to type-indexed values might want to type annotate each
266 subexpression of the above example as an exercise. (Use a compiler to
267 check your annotations.)
268
269 Using a product, user specified records can be also be encoded easily:
270 [source,sml]
271 ----
272 val abc = let
273 open Show
274 in
275 inj (fn {a, b, c} => a & b & c)
276 (record (L"a" (option int) *
277 L"b" real *
278 L"c" bool))
279 end
280
281 val "{a = SOME 1, b = 3.0, c = false}" =
282 let open Show in show abc end
283 {a = SOME 1, b = 3.0, c = false}
284 ----
285
286 As you can see, both of the above use `inj` to inject user-defined
287 types to the general purpose sum and product types.
288
289 Of particular interest is whether recursive datatypes and cyclic data
290 can be handled. For example, how does one write a type-index for a
291 recursive datatype such as a cyclic graph?
292 [source,sml]
293 ----
294 datatype 'a graph = VTX of 'a * 'a graph list ref
295 fun arcs (VTX (_, r)) = r
296 ----
297
298 Using the `Show` combinators, we could first write a new type-index
299 combinator for `graph`:
300 [source,sml]
301 ----
302 fun graph a = let
303 open Tie Show
304 in
305 fix Y (fn graph_a =>
306 inj (fn VTX (x, y) => x & y)
307 (data (C1"VTX"
308 (tuple (U a *
309 U (refc (list graph_a)))))))
310 end
311 ----
312
313 To show a graph with integer labels
314 [source,sml]
315 ----
316 val a_graph = let
317 val a = VTX (1, ref [])
318 val b = VTX (2, ref [])
319 val c = VTX (3, ref [])
320 val d = VTX (4, ref [])
321 val e = VTX (5, ref [])
322 val f = VTX (6, ref [])
323 in
324 arcs a := [b, d]
325 ; arcs b := [c, e]
326 ; arcs c := [a, f]
327 ; arcs d := [f]
328 ; arcs e := [d]
329 ; arcs f := [e]
330 ; a
331 end
332 ----
333 we could then simply write
334 [source,sml]
335 ----
336 val "VTX (1, ref [VTX (2, ref [VTX (3, ref [VTX (1, %0), \
337 \VTX (6, ref [VTX (5, ref [VTX (4, ref [VTX (6, %3)])])] as %3)]), \
338 \VTX (5, ref [VTX (4, ref [VTX (6, ref [VTX (5, %2)])])] as %2)]), \
339 \VTX (4, ref [VTX (6, ref [VTX (5, ref [VTX (4, %1)])])] as %1)] as %0)" =
340 let open Show in show (graph int) end
341 a_graph
342 ----
343
344 There is a subtle gotcha with cyclic data. Consider the following code:
345 [source,sml]
346 ----
347 exception ExnArray of exn array
348
349 val () = let
350 open Show
351 in
352 regExn (fn ExnArray a =>
353 SOME (a, C1"ExnArray" (array exn))
354 | _ => NONE)
355 end
356
357 val a_cycle = let
358 val a = Array.fromList [Empty]
359 in
360 Array.update (a, 0, ExnArray a) ; a
361 end
362 ----
363
364 Although the above looks innocent enough, the evaluation of
365 [source,sml]
366 ----
367 val "[|ExnArray %0|] as %0" =
368 let open Show in show (array exn) end
369 a_cycle
370 ----
371 goes into an infinite loop. To avoid this problem, the type-index
372 `array exn` must be evaluated only once, as in the following:
373 [source,sml]
374 ----
375 val array_exn = let open Show in array exn end
376
377 exception ExnArray of exn array
378
379 val () = let
380 open Show
381 in
382 regExn (fn ExnArray a =>
383 SOME (a, C1"ExnArray" array_exn)
384 | _ => NONE)
385 end
386
387 val a_cycle = let
388 val a = Array.fromList [Empty]
389 in
390 Array.update (a, 0, ExnArray a) ; a
391 end
392
393 val "[|ExnArray %0|] as %0" =
394 let open Show in show array_exn end
395 a_cycle
396 ----
397
398 Cyclic data (excluding closures) in Standard ML can only be
399 constructed imperatively through arrays and references (combined with
400 exceptions or recursive datatypes). Before recursing to a reference
401 or an array, one needs to check whether that reference or array has
402 already been seen before. When `ref` or `array` is called with a
403 type-index, a new cyclicity checker is instantiated.
404
405 == Implementation ==
406
407 [source,sml]
408 ----
409 structure SmlSyntax = struct
410 local
411 structure CV = CharVector and C = Char
412 in
413 val isSym = Char.contains "!%&$#+-/:<=>?@\\~`^|*"
414
415 fun isSymId s = 0 < size s andalso CV.all isSym s
416
417 fun isAlphaNumId s =
418 0 < size s
419 andalso C.isAlpha (CV.sub (s, 0))
420 andalso CV.all (fn c => C.isAlphaNum c
421 orelse #"'" = c
422 orelse #"_" = c) s
423
424 fun isNumLabel s =
425 0 < size s
426 andalso #"0" <> CV.sub (s, 0)
427 andalso CV.all C.isDigit s
428
429 fun isId s = isAlphaNumId s orelse isSymId s
430
431 fun isLongId s = List.all isId (String.fields (#"." <\ op =) s)
432
433 fun isLabel s = isId s orelse isNumLabel s
434 end
435 end
436
437 structure Show :> SHOW = struct
438 datatype 'a t = IN of exn list * 'a -> bool * string
439 type 'a s = 'a t
440 type ('a, 'k) p = 'a t
441 type u = unit
442 type l = unit
443
444 fun show (IN t) x = #2 (t ([], x))
445
446 (* user-defined types *)
447 fun inj inj (IN b) = IN (b o Pair.map (id, inj))
448
449 local
450 fun surround pre suf (_, s) = (false, concat [pre, s, suf])
451 fun parenthesize x = if #1 x then surround "(" ")" x else x
452 fun construct tag =
453 (fn (_, s) => (true, concat [tag, " ", s])) o parenthesize
454 fun check p m s = if p s then () else raise Fail (m^s)
455 in
456 (* tuples and records *)
457 fun (IN l) * (IN r) =
458 IN (fn (rs, a & b) =>
459 (false, concat [#2 (l (rs, a)),
460 ", ",
461 #2 (r (rs, b))]))
462
463 val U = id
464 fun L l = (check SmlSyntax.isLabel "Invalid label: " l
465 ; fn IN t => IN (surround (l^" = ") "" o t))
466
467 fun tuple (IN t) = IN (surround "(" ")" o t)
468 fun record (IN t) = IN (surround "{" "}" o t)
469
470 (* datatypes *)
471 fun (IN l) + (IN r) = IN (fn (rs, INL a) => l (rs, a)
472 | (rs, INR b) => r (rs, b))
473
474 fun C0 c = (check SmlSyntax.isId "Invalid constructor: " c
475 ; IN (const (false, c)))
476 fun C1 c (IN t) = (check SmlSyntax.isId "Invalid constructor: " c
477 ; IN (construct c o t))
478
479 val data = id
480
481 fun Y ? = Tie.iso Tie.function (fn IN x => x, IN) ?
482
483 (* exceptions *)
484 local
485 val handlers = ref ([] : (exn -> unit t option) list)
486 in
487 val exn = IN (fn (rs, e) => let
488 fun lp [] =
489 C0(concat ["<exn:",
490 General.exnName e,
491 ">"])
492 | lp (f::fs) =
493 case f e
494 of NONE => lp fs
495 | SOME t => t
496 val IN f = lp (!handlers)
497 in
498 f (rs, ())
499 end)
500 fun regExn f =
501 handlers := (Option.map
502 (fn (x, IN f) =>
503 IN (fn (rs, ()) =>
504 f (rs, x))) o f)
505 :: !handlers
506 end
507
508 (* some built-in type constructors *)
509 local
510 fun cyclic (IN t) = let
511 exception E of ''a * bool ref
512 in
513 IN (fn (rs, v : ''a) => let
514 val idx = Int.toString o length
515 fun lp (E (v', c)::rs) =
516 if v' <> v then lp rs
517 else (c := false ; (false, "%"^idx rs))
518 | lp (_::rs) = lp rs
519 | lp [] = let
520 val c = ref true
521 val r = t (E (v, c)::rs, v)
522 in
523 if !c then r
524 else surround "" (" as %"^idx rs) r
525 end
526 in
527 lp rs
528 end)
529 end
530
531 fun aggregate pre suf toList (IN t) =
532 IN (surround pre suf o
533 (fn (rs, a) =>
534 (false,
535 String.concatWith
536 ", "
537 (map (#2 o curry t rs)
538 (toList a)))))
539 in
540 fun refc ? = (cyclic o inj ! o C1"ref") ?
541 fun array ? = (cyclic o aggregate "[|" "|]" (Array.foldr op:: [])) ?
542 fun list ? = aggregate "[" "]" id ?
543 fun vector ? = aggregate "#[" "]" (Vector.foldr op:: []) ?
544 end
545
546 fun (IN _) --> (IN _) = IN (const (false, "<fn>"))
547
548 (* some built-in base types *)
549 local
550 fun mk toS = (fn x => (false, x)) o toS o (fn (_, x) => x)
551 in
552 val string =
553 IN (surround "\"" "\"" o mk (String.translate Char.toString))
554 val unit = IN (mk (fn () => "()"))
555 val bool = IN (mk Bool.toString)
556 val char = IN (surround "#\"" "\"" o mk Char.toString)
557 val int = IN (mk Int.toString)
558 val word = IN (surround "0wx" "" o mk Word.toString)
559 val real = IN (mk Real.toString)
560 end
561 end
562 end
563
564 (* Handlers for standard top-level exceptions *)
565 val () = let
566 open Show
567 fun E0 name = SOME ((), C0 name)
568 in
569 regExn (fn Bind => E0"Bind"
570 | Chr => E0"Chr"
571 | Div => E0"Div"
572 | Domain => E0"Domain"
573 | Empty => E0"Empty"
574 | Match => E0"Match"
575 | Option => E0"Option"
576 | Overflow => E0"Overflow"
577 | Size => E0"Size"
578 | Span => E0"Span"
579 | Subscript => E0"Subscript"
580 | _ => NONE)
581 ; regExn (fn Fail s => SOME (s, C1"Fail" string)
582 | _ => NONE)
583 end
584 ----
585
586
587 == Also see ==
588
589 There are a number of related techniques. Here are some of them.
590
591 * <:Fold:>
592 * <:StaticSum:>