[hcoop/debian/mlton.git] / doc / guide / src / StaticSum.adoc

StaticSum
=========

While SML makes it impossible to write functions whose types would
depend on the values of their arguments, or so called dependently
typed functions, it is possible, and arguably commonplace, to write
functions whose types depend on the types of their arguments.  Indeed,
the types of parametrically polymorphic functions like `map` and
`foldl` can be said to depend on the types of their arguments.  What
is less commonplace, however, is to write functions whose behavior
would depend on the types of their arguments.  Nevertheless, there are
several techniques for writing such functions.
<:TypeIndexedValues:Type-indexed values> and <:Fold:fold> are two such
techniques.  This page presents another such technique dubbed static
sums.


== Ordinary Sums ==

Consider the sum type as defined below:
[source,sml]
----
structure Sum = struct
   datatype ('a, 'b) t = INL of 'a | INR of 'b
end
----

While a generic sum type such as defined above is very useful, it has
a number of limitations.  As an example, we could write the function
`out` to extract the value from a sum as follows:
[source,sml]
----
fun out (s : ('a, 'a) Sum.t) : 'a =
    case s
     of Sum.INL a => a
      | Sum.INR a => a
----

As can be seen from the type of `out`, it is limited in the sense that
it requires both variants of the sum to have the same type.  So, `out`
cannot be used to extract the value of a sum of two different types,
such as the type `(int, real) Sum.t`.  As another example of a
limitation, consider the following attempt at a `succ` function:
[source,sml]
----
fun succ (s : (int, real) Sum.t) : ??? =
    case s
     of Sum.INL i => i + 1
      | Sum.INR r => Real.nextAfter (r, Real.posInf)
----

The above definition of `succ` cannot be typed, because there is no
type for the codomain within SML.


== Static Sums ==

Interestingly, it is possible to define values `inL`, `inR`, and
`match` that satisfy the laws
----
match (inL x) (f, g) = f x
match (inR x) (f, g) = g x
----
and do not suffer from the same limitions.  The definitions are
actually quite trivial:
[source,sml]
----
structure StaticSum = struct
   fun inL x (f, _) = f x
   fun inR x (_, g) = g x
   fun match x = x
end
----

Now, given the `succ` function defined as
[source,sml]
----
fun succ s =
    StaticSum.match s
       (fn i => i + 1,
        fn r => Real.nextAfter (r, Real.posInf))
----
we get
[source,sml]
----
succ (StaticSum.inL 1) = 2
succ (StaticSum.inR Real.maxFinite) = Real.posInf
----

To better understand how this works, consider the following signature
for static sums:
[source,sml]
----
structure StaticSum :> sig
   type ('dL, 'cL, 'dR, 'cR, 'c) t
   val inL : 'dL -> ('dL, 'cL, 'dR, 'cR, 'cL) t
   val inR : 'dR -> ('dL, 'cL, 'dR, 'cR, 'cR) t
   val match : ('dL, 'cL, 'dR, 'cR, 'c) t -> ('dL -> 'cL) * ('dR -> 'cR) -> 'c
end = struct
   type ('dL, 'cL, 'dR, 'cR, 'c) t = ('dL -> 'cL) * ('dR -> 'cR) -> 'c
   open StaticSum
end
----

Above, `'d` stands for domain and `'c` for codomain.  The key
difference between an ordinary sum type, like `(int, real) Sum.t`, and
a static sum type, like `(int, real, real, int, real) StaticSum.t`, is
that the ordinary sum type says nothing about the type of the result
of deconstructing a sum while the static sum type specifies the type.

With the sealed static sum module, we get the type
[source,sml]
----
val succ : (int, int, real, real, 'a) StaticSum.t -> 'a
----
for the previously defined `succ` function.  The type specifies that
`succ` maps a left `int` to an `int` and a right `real` to a `real`.
For example, the type of `StaticSum.inL 1` is
`(int, 'cL, 'dR, 'cR, 'cL) StaticSum.t`.  Unifying this with the
argument type of `succ` gives the type `(int, int, real, real, int)
StaticSum.t -> int`.

The `out` function is quite useful on its own.  Here is how it can be
defined:
[source,sml]
----
structure StaticSum = struct
   open StaticSum
   val out : ('a, 'a, 'b, 'b, 'c) t -> 'c =
    fn s => match s (fn x => x, fn x => x)
end
----

Due to the value restriction, lack of first class polymorphism and
polymorphic recursion, the usefulness and convenience of static sums
is somewhat limited in SML.  So, don't throw away the ordinary sum
type just yet.  Static sums can nevertheless be quite useful.


=== Example: Send and Receive with Argument Type Dependent Result Types ===

In some situations it would seem useful to define functions whose
result type would depend on some of the arguments.  Traditionally such
functions have been thought to be impossible in SML and the solution
has been to define multiple functions.  For example, the
http://www.standardml.org/Basis/socket.html[`Socket` structure] of the
Basis library defines 16 `send` and 16 `recv` functions.  In contrast,
the Net structure
(<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sig)>) of the
Basic library designed by Stephen Weeks defines only a single `send`
and a single `receive` and the result types of the functions depend on
their arguments.  The implementation
(<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sml)>) uses
static sums (with a slighly different signature:
<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/static-sum.sig)>).


=== Example: Picking Monad Results ===

Suppose that we need to write a parser that accepts a pair of integers
and returns their sum given a monadic parsing combinator library.  A
part of the signature of such library could look like this
[source,sml]
----
signature PARSING = sig
   include MONAD
   val int : int t
   val lparen : unit t
   val rparen : unit t
   val comma : unit t
   (* ... *)
end
----
where the `MONAD` signature could be defined as
[source,sml]
----
signature MONAD = sig
   type 'a t
   val return : 'a -> 'a t
   val >>= : 'a t * ('a -> 'b t) -> 'b t
end
infix >>=
----

The straightforward, but tedious, way to write the desired parser is:
[source,sml]
----
val p = lparen >>= (fn _ =>
        int    >>= (fn x =>
        comma  >>= (fn _ =>
        int    >>= (fn y =>
        rparen >>= (fn _ =>
        return (x + y))))))
----

In Haskell, the parser could be written using the `do` notation
considerably less verbosely as:
[source,haskell]
----
p = do { lparen ; x <- int ; comma ; y <- int ; rparen ; return $ x + y }
----

SML doesn't provide a `do` notation, so we need another solution.

Suppose we would have a "pick" notation for monads that would allows
us to write the parser as
[source,sml]
----
val p = `lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
----
using four auxiliary combinators: +&grave;+, `\`, `^`, and `@`.

Roughly speaking

* +&grave;p+ means that the result of `p` is dropped,
* `\p` means that the result of `p` is taken,
* `p ^ q` means that results of `p` and `q` are taken as a product, and
* `p @ a` means that the results of `p` are passed to the function `a` and that result is returned.

The difficulty is in implementing the concatenation combinator `^`.
The type of the result of the concatenation depends on the types of
the arguments.

Using static sums and the <:ProductType:product type>, the pick
notation for monads can be implemented as follows:
[source,sml]
----
functor MkMonadPick (include MONAD) = let
   open StaticSum
in
   struct
      fun `a = inL (a >>= (fn _ => return ()))
      val \ = inR
      fun a @ f = out a >>= (return o f)
      fun a ^ b =
          (match b o match a)
             (fn a =>
                 (fn b => inL (a >>= (fn _ => b)),
                  fn b => inR (a >>= (fn _ => b))),
              fn a =>
                 (fn b => inR (a >>= (fn a => b >>= (fn _ => return a))),
                  fn b => inR (a >>= (fn a => b >>= (fn b => return (a & b))))))
   end
end
----

The above implementation is inefficient, however.  It uses many more
bind operations, `>>=`, than necessary.  That can be solved with an
additional level of abstraction:
[source,sml]
----
functor MkMonadPick (include MONAD) = let
   open StaticSum
in
   struct
      fun `a = inL (fn b => a >>= (fn _ => b ()))
      fun \a = inR (fn b => a >>= b)
      fun a @ f = out a (return o f)
      fun a ^ b =
          (match b o match a)
             (fn a => (fn b => inL (fn c => a (fn () => b c)),
                       fn b => inR (fn c => a (fn () => b c))),
              fn a => (fn b => inR (fn c => a (fn a => b (fn () => c a))),
                       fn b => inR (fn c => a (fn a => b (fn b => c (a & b))))))
   end
end
----

After instantiating and opening either of the above monad pick
implementations, the previously given definition of `p` can be
compiled and results in a parser whose result is of type `int`.  Here
is a functor to test the theory:
[source,sml]
----
functor Test (Arg : PARSING) = struct
   local
      structure Pick = MkMonadPick (Arg)
      open Pick Arg
   in
      val p : int t =
          `lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
   end
end
----


== Also see ==

There are a number of related techniques.  Here are some of them.

* <:Fold:>
* <:TypeIndexedValues:>
Commit	Line	Data
7f918cf1 CE	1	StaticSum
	2	=========
	3
	4	While SML makes it impossible to write functions whose types would
	5	depend on the values of their arguments, or so called dependently
	6	typed functions, it is possible, and arguably commonplace, to write
	7	functions whose types depend on the types of their arguments. Indeed,
	8	the types of parametrically polymorphic functions like `map` and
	9	`foldl` can be said to depend on the types of their arguments. What
	10	is less commonplace, however, is to write functions whose behavior
	11	would depend on the types of their arguments. Nevertheless, there are
	12	several techniques for writing such functions.
	13	<:TypeIndexedValues:Type-indexed values> and <:Fold:fold> are two such
	14	techniques. This page presents another such technique dubbed static
	15	sums.
	16
	17
	18	== Ordinary Sums ==
	19
	20	Consider the sum type as defined below:
	21	[source,sml]
	22	----
	23	structure Sum = struct
	24	datatype ('a, 'b) t = INL of 'a \| INR of 'b
	25	end
	26	----
	27
	28	While a generic sum type such as defined above is very useful, it has
	29	a number of limitations. As an example, we could write the function
	30	`out` to extract the value from a sum as follows:
	31	[source,sml]
	32	----
	33	fun out (s : ('a, 'a) Sum.t) : 'a =
	34	case s
	35	of Sum.INL a => a
	36	\| Sum.INR a => a
	37	----
	38
	39	As can be seen from the type of `out`, it is limited in the sense that
	40	it requires both variants of the sum to have the same type. So, `out`
	41	cannot be used to extract the value of a sum of two different types,
	42	such as the type `(int, real) Sum.t`. As another example of a
	43	limitation, consider the following attempt at a `succ` function:
	44	[source,sml]
	45	----
	46	fun succ (s : (int, real) Sum.t) : ??? =
	47	case s
	48	of Sum.INL i => i + 1
	49	\| Sum.INR r => Real.nextAfter (r, Real.posInf)
	50	----
	51
	52	The above definition of `succ` cannot be typed, because there is no
	53	type for the codomain within SML.
	54
	55
	56	== Static Sums ==
	57
	58	Interestingly, it is possible to define values `inL`, `inR`, and
	59	`match` that satisfy the laws
	60	----
	61	match (inL x) (f, g) = f x
	62	match (inR x) (f, g) = g x
	63	----
	64	and do not suffer from the same limitions. The definitions are
65	actually quite trivial:
66	[source,sml]
67	----
68	structure StaticSum = struct
69	fun inL x (f, _) = f x
70	fun inR x (_, g) = g x
71	fun match x = x
72	end
73	----
74
75	Now, given the `succ` function defined as
76	[source,sml]
77	----
78	fun succ s =
79	StaticSum.match s
80	(fn i => i + 1,
81	fn r => Real.nextAfter (r, Real.posInf))
82	----
83	we get
84	[source,sml]
85	----
86	succ (StaticSum.inL 1) = 2
87	succ (StaticSum.inR Real.maxFinite) = Real.posInf
88	----
89
90	To better understand how this works, consider the following signature
91	for static sums:
92	[source,sml]
93	----
94	structure StaticSum :> sig
95	type ('dL, 'cL, 'dR, 'cR, 'c) t
96	val inL : 'dL -> ('dL, 'cL, 'dR, 'cR, 'cL) t
97	val inR : 'dR -> ('dL, 'cL, 'dR, 'cR, 'cR) t
98	val match : ('dL, 'cL, 'dR, 'cR, 'c) t -> ('dL -> 'cL) * ('dR -> 'cR) -> 'c
99	end = struct
100	type ('dL, 'cL, 'dR, 'cR, 'c) t = ('dL -> 'cL) * ('dR -> 'cR) -> 'c
101	open StaticSum
102	end
103	----
104
105	Above, `'d` stands for domain and `'c` for codomain. The key
106	difference between an ordinary sum type, like `(int, real) Sum.t`, and
107	a static sum type, like `(int, real, real, int, real) StaticSum.t`, is
108	that the ordinary sum type says nothing about the type of the result
109	of deconstructing a sum while the static sum type specifies the type.
110
111	With the sealed static sum module, we get the type
112	[source,sml]
113	----
114	val succ : (int, int, real, real, 'a) StaticSum.t -> 'a
115	----
116	for the previously defined `succ` function. The type specifies that
117	`succ` maps a left `int` to an `int` and a right `real` to a `real`.
118	For example, the type of `StaticSum.inL 1` is
119	`(int, 'cL, 'dR, 'cR, 'cL) StaticSum.t`. Unifying this with the
120	argument type of `succ` gives the type `(int, int, real, real, int)
121	StaticSum.t -> int`.
122
123	The `out` function is quite useful on its own. Here is how it can be
124	defined:
125	[source,sml]
126	----
127	structure StaticSum = struct
128	open StaticSum
129	val out : ('a, 'a, 'b, 'b, 'c) t -> 'c =
130	fn s => match s (fn x => x, fn x => x)
131	end
132	----
133
134	Due to the value restriction, lack of first class polymorphism and
135	polymorphic recursion, the usefulness and convenience of static sums
136	is somewhat limited in SML. So, don't throw away the ordinary sum
137	type just yet. Static sums can nevertheless be quite useful.
138
139
140	=== Example: Send and Receive with Argument Type Dependent Result Types ===
141
142	In some situations it would seem useful to define functions whose
143	result type would depend on some of the arguments. Traditionally such
144	functions have been thought to be impossible in SML and the solution
145	has been to define multiple functions. For example, the
146	http://www.standardml.org/Basis/socket.html[`Socket` structure] of the
147	Basis library defines 16 `send` and 16 `recv` functions. In contrast,
148	the Net structure
149	(<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sig)>) of the
150	Basic library designed by Stephen Weeks defines only a single `send`
151	and a single `receive` and the result types of the functions depend on
152	their arguments. The implementation
153	(<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sml)>) uses
154	static sums (with a slighly different signature:
155	<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/static-sum.sig)>).
156
157
158	=== Example: Picking Monad Results ===
159
160	Suppose that we need to write a parser that accepts a pair of integers
161	and returns their sum given a monadic parsing combinator library. A
162	part of the signature of such library could look like this
163	[source,sml]
164	----
165	signature PARSING = sig
166	include MONAD
167	val int : int t
168	val lparen : unit t
169	val rparen : unit t
170	val comma : unit t
171	(* ... *)
172	end
173	----
174	where the `MONAD` signature could be defined as
175	[source,sml]
176	----
177	signature MONAD = sig
178	type 'a t
179	val return : 'a -> 'a t
180	val >>= : 'a t * ('a -> 'b t) -> 'b t
181	end
182	infix >>=
183	----
184
185	The straightforward, but tedious, way to write the desired parser is:
186	[source,sml]
187	----
188	val p = lparen >>= (fn _ =>
189	int >>= (fn x =>
190	comma >>= (fn _ =>
191	int >>= (fn y =>
192	rparen >>= (fn _ =>
193	return (x + y))))))
194	----
195
196	In Haskell, the parser could be written using the `do` notation
197	considerably less verbosely as:
198	[source,haskell]
199	----
200	p = do { lparen ; x <- int ; comma ; y <- int ; rparen ; return $ x + y }
201	----
202
203	SML doesn't provide a `do` notation, so we need another solution.
204
205	Suppose we would have a "pick" notation for monads that would allows
206	us to write the parser as
207	[source,sml]
208	----
209	val p = `lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
210	----
211	using four auxiliary combinators: +&grave;+, `\`, `^`, and `@`.
212
213	Roughly speaking
214
215	* +&grave;p+ means that the result of `p` is dropped,
216	* `\p` means that the result of `p` is taken,
217	* `p ^ q` means that results of `p` and `q` are taken as a product, and
218	* `p @ a` means that the results of `p` are passed to the function `a` and that result is returned.
219
220	The difficulty is in implementing the concatenation combinator `^`.
221	The type of the result of the concatenation depends on the types of
222	the arguments.
223
224	Using static sums and the <:ProductType:product type>, the pick
225	notation for monads can be implemented as follows:
226	[source,sml]
227	----
228	functor MkMonadPick (include MONAD) = let
229	open StaticSum
230	in
231	struct
232	fun `a = inL (a >>= (fn _ => return ()))
233	val \ = inR
234	fun a @ f = out a >>= (return o f)
235	fun a ^ b =
236	(match b o match a)
237	(fn a =>
238	(fn b => inL (a >>= (fn _ => b)),
239	fn b => inR (a >>= (fn _ => b))),
240	fn a =>
241	(fn b => inR (a >>= (fn a => b >>= (fn _ => return a))),
242	fn b => inR (a >>= (fn a => b >>= (fn b => return (a & b))))))
243	end
244	end
245	----
246
247	The above implementation is inefficient, however. It uses many more
248	bind operations, `>>=`, than necessary. That can be solved with an
249	additional level of abstraction:
250	[source,sml]
251	----
252	functor MkMonadPick (include MONAD) = let
253	open StaticSum
254	in
255	struct
256	fun `a = inL (fn b => a >>= (fn _ => b ()))
257	fun \a = inR (fn b => a >>= b)
258	fun a @ f = out a (return o f)
259	fun a ^ b =
260	(match b o match a)
261	(fn a => (fn b => inL (fn c => a (fn () => b c)),
262	fn b => inR (fn c => a (fn () => b c))),
263	fn a => (fn b => inR (fn c => a (fn a => b (fn () => c a))),
264	fn b => inR (fn c => a (fn a => b (fn b => c (a & b))))))
265	end
266	end
267	----
268
269	After instantiating and opening either of the above monad pick
270	implementations, the previously given definition of `p` can be
271	compiled and results in a parser whose result is of type `int`. Here
272	is a functor to test the theory:
273	[source,sml]
274	----
275	functor Test (Arg : PARSING) = struct
276	local
277	structure Pick = MkMonadPick (Arg)
278	open Pick Arg
279	in
280	val p : int t =
281	`lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
282	end
283	end
284	----
285
286
287	== Also see ==
288
289	There are a number of related techniques. Here are some of them.
290
291	* <:Fold:>
292	* <:TypeIndexedValues:>