[hcoop/debian/mlton.git] / doc / guide / src / ConcurrentMLImplementation.adoc

ConcurrentMLImplementation
==========================

Here are some notes on MLton's implementation of <:ConcurrentML:>.

Concurrent ML was originally implemented for SML/NJ.  It was ported to
MLton in the summer of 2004.  The main difference between the
implementations is that SML/NJ uses continuations to implement CML
threads, while MLton uses its underlying <:MLtonThread:thread>
package.  Presently, MLton's threads are a little more heavyweight
than SML/NJ's continuations, but it's pretty clear that there is some
fat there that could be trimmed.

The implementation of CML in SML/NJ is built upon the first-class
continuations of the `SMLofNJ.Cont` module.
[source,sml]
----
type 'a cont
val callcc: ('a cont -> 'a) -> 'a
val isolate: ('a -> unit) -> 'a cont
val throw: 'a cont -> 'a -> 'b
----

The implementation of CML in MLton is built upon the first-class
threads of the <:MLtonThread:> module.
[source,sml]
----
type 'a t
val new: ('a -> unit) -> 'a t
val prepare: 'a t * 'a -> Runnable.t
val switch: ('a t -> Runnable.t) -> 'a
----

The port is relatively straightforward, because CML always throws to a
continuation at most once.  Hence, an "abstract" implementation of
CML could be built upon first-class one-shot continuations, which map
equally well to SML/NJ's continuations and MLton's threads.

The "essence" of the port is to transform:
----
callcc (fn k => ... throw k' v')
----
{empty}to
----
switch (fn t => ... prepare (t', v'))
----
which suffices for the vast majority of the CML implementation.

There was only one complicated transformation: blocking multiple base
events.  In SML/NJ CML, the representation of base events is given by:
[source,sml]
----
datatype 'a event_status
  = ENABLED of {prio: int, doFn: unit -> 'a}
  | BLOCKED of {
        transId: trans_id ref,
        cleanUp: unit -> unit,
        next: unit -> unit
      } -> 'a
type 'a base_evt = unit -> 'a event_status
----

When synchronizing on a set of base events, which are all blocked, we
must invoke each `BLOCKED` function with the same `transId` and
`cleanUp` (the `transId` is (checked and) set to `CANCEL` by the
`cleanUp` function, which is invoked by the first enabled event; this
"fizzles" every other event in the synchronization group that later
becomes enabled).  However, each `BLOCKED` function is implemented by
a callcc, so that when the event is enabled, it throws back to the
point of synchronization.  Hence, the next function (which doesn't
return) is invoked by the `BLOCKED` function to escape the callcc and
continue in the thread performing the synchronization.  In SML/NJ this
is implemented as follows:
[source,sml]
----
fun ext ([], blockFns) = callcc (fn k => let
      val throw = throw k
      val (transId, setFlg) = mkFlg()
      fun log [] = S.atomicDispatch ()
        | log (blockFn:: r) =
            throw (blockFn {
                transId = transId,
                cleanUp = setFlg,
                next = fn () => log r
              })
      in
        log blockFns; error "[log]"
      end)
----
(Note that `S.atomicDispatch` invokes the continuation of the next
continuation on the ready queue.)  This doesn't map well to the MLton
thread model.  Although it follows the
----
callcc (fn k => ... throw k v)
----
model, the fact that `blockFn` will also attempt to do
----
callcc (fn k' => ... next ())
----
means that the naive transformation will result in nested `switch`-es.

We need to think a little more about what this code is trying to do.
Essentially, each `blockFn` wants to capture this continuation, hold
on to it until the event is enabled, and continue with next; when the
event is enabled, before invoking the continuation and returning to
the synchronization point, the `cleanUp` and other event specific
operations are performed.

To accomplish the same effect in the MLton thread implementation, we
have the following:
[source,sml]
----
datatype 'a status =
   ENABLED of {prio: int, doitFn: unit -> 'a}
 | BLOCKED of {transId: trans_id,
               cleanUp: unit -> unit,
               next: unit -> rdy_thread} -> 'a

type 'a base = unit -> 'a status

fun ext ([], blockFns): 'a =
     S.atomicSwitch
     (fn (t: 'a S.thread) =>
      let
         val (transId, cleanUp) = TransID.mkFlg ()
         fun log blockFns: S.rdy_thread =
            case blockFns of
               [] => S.next ()
             | blockFn::blockFns =>
                  (S.prep o S.new)
                  (fn _ => fn () =>
                   let
                      val () = S.atomicBegin ()
                      val x = blockFn {transId = transId,
                                       cleanUp = cleanUp,
                                       next = fn () => log blockFns}
                   in S.switch(fn _ => S.prepVal (t, x))
                   end)
      in
         log blockFns
      end)
----

To avoid the nested `switch`-es, I run the `blockFn` in it's own
thread, whose only purpose is to return to the synchronization point.
This corresponds to the `throw (blockFn {...})` in the SML/NJ
implementation.  I'm worried that this implementation might be a
little expensive, starting a new thread for each blocked event (when
there are only multiple blocked events in a synchronization group).
But, I don't see another way of implementing this behavior in the
MLton thread model.

Note that another way of thinking about what is going on is to
consider each `blockFn` as prepending a different set of actions to
the thread `t`.  It might be possible to give a
`MLton.Thread.unsafePrepend`.
[source,sml]
----
fun unsafePrepend (T r: 'a t, f: 'b -> 'a): 'b t =
   let
      val t =
         case !r of
            Dead => raise Fail "prepend to a Dead thread"
          | New g => New (g o f)
          | Paused (g, t) => Paused (fn h => g (f o h), t)
   in (* r := Dead; *)
      T (ref t)
   end
----
I have commented out the `r := Dead`, which would allow multiple
prepends to the same thread (i.e., not destroying the original thread
in the process).  Of course, only one of the threads could be run: if
the original thread were in the `Paused` state, then multiple threads
would share the underlying runtime/primitive thread.  Now, this
matches the "one-shot" nature of CML continuations/threads, but I'm
not comfortable with extending `MLton.Thread` with such an unsafe
operation.

Other than this complication with blocking multiple base events, the
port was quite routine.  (As a very pleasant surprise, the CML
implementation in SML/NJ doesn't use any SML/NJ-isms.)  There is a
slight difference in the way in which critical sections are handled in
SML/NJ and MLton; since `MLton.Thread.switch` _always_ leaves a
critical section, it is sometimes necessary to add additional
`atomicBegin`-s/`atomicEnd`-s to ensure that we remain in a critical
section after a thread switch.

While looking at virtually every file in the core CML implementation,
I took the liberty of simplifying things where it seemed possible; in
terms of style, the implementation is about half-way between Reppy's
original and MLton's.

Some changes of note:

* `util/` contains all pertinent data-structures: (functional and
imperative) queues, (functional) priority queues.  Hence, it should be
easier to switch in more efficient or real-time implementations.

* `core-cml/scheduler.sml`: in both implementations, this is where
most of the interesting action takes place.  I've made the connection
between `MLton.Thread.t`-s and `ThreadId.thread_id`-s more abstract
than it is in the SML/NJ implementation, and encapsulated all of the
`MLton.Thread` operations in this module.

* eliminated all of the "by hand" inlining


== Future Extensions ==

The CML documentation says the following:
____

----
CML.joinEvt: thread_id -> unit event
----

* `joinEvt tid`
+
creates an event value for synchronizing on the termination of the
thread with the ID tid.  There are three ways that a thread may
terminate: the function that was passed to spawn (or spawnc) may
return; it may call the exit function, or it may have an uncaught
exception.  Note that `joinEvt` does not distinguish between these
cases; it also does not become enabled if the named thread deadlocks
(even if it is garbage collected).
____

I believe that the `MLton.Finalizable` might be able to relax that
last restriction.  Upon the creation of a `'a Scheduler.thread`, we
could attach a finalizer to the underlying `'a MLton.Thread.t` that
enables the `joinEvt` (in the associated `ThreadID.thread_id`) when
the `'a MLton.Thread.t` becomes unreachable.

I don't know why CML doesn't have
----
CML.kill: thread_id -> unit
----
which has a fairly simple implementation -- setting a kill flag in the
`thread_id` and adjusting the scheduler to discard any killed threads
that it takes off the ready queue.  The fairness of the scheduler
ensures that a killed thread will eventually be discarded.  The
semantics are little murky for blocked threads that are killed,
though.  For example, consider a thread blocked on `SyncVar.mTake mv`
and a thread blocked on `SyncVar.mGet mv`.  If the first thread is
killed while blocked, and a third thread does `SyncVar.mPut (mv, x)`,
then we might expect that we'll enable the second thread, and never
the first.  But, when only the ready queue is able to discard killed
threads, then the `SyncVar.mPut` could enable the first thread
(putting it on the ready queue, from which it will be discarded) and
leave the second thread blocked.  We could solve this by adjusting the
`TransID.trans_id types` and the "cleaner" functions to look for both
canceled transactions and transactions on killed threads.

John Reppy says that <!Cite(MarlowEtAl01)> and <!Cite(FlattFindler04)>
explain why `CML.kill` would be a bad idea.

Between `CML.timeOutEvt` and `CML.kill`, one could give an efficient
solution to the recent `comp.lang.ml` post about terminating a
function that doesn't complete in a given time.
[source,sml]
----
  fun timeOut (f: unit -> 'a, t: Time.time): 'a option =
    let
       val iv = SyncVar.iVar ()
       val tid = CML.spawn (fn () => SyncVar.iPut (iv, f ()))
    in
       CML.select
       [CML.wrap (CML.timeOutEvt t, fn () => (CML.kill tid; NONE)),
        CML.wrap (SyncVar.iGetEvt iv, fn x => SOME x)]
    end
----


== Space Safety ==

There are some CML related posts on the MLton mailing list:

* http://www.mlton.org/pipermail/mlton/2004-May/

that discuss concerns that SML/NJ's implementation is not space
efficient, because multi-shot continuations can be held indefinitely
on event queues.  MLton is better off because of the one-shot nature
-- when an event enables a thread, all other copies of the thread
waiting in other event queues get turned into dead threads (of zero
size).
Commit	Line	Data
7f918cf1 CE	1	ConcurrentMLImplementation
	2	==========================
	3
	4	Here are some notes on MLton's implementation of <:ConcurrentML:>.
	5
	6	Concurrent ML was originally implemented for SML/NJ. It was ported to
	7	MLton in the summer of 2004. The main difference between the
	8	implementations is that SML/NJ uses continuations to implement CML
	9	threads, while MLton uses its underlying <:MLtonThread:thread>
	10	package. Presently, MLton's threads are a little more heavyweight
	11	than SML/NJ's continuations, but it's pretty clear that there is some
	12	fat there that could be trimmed.
	13
	14	The implementation of CML in SML/NJ is built upon the first-class
	15	continuations of the `SMLofNJ.Cont` module.
	16	[source,sml]
	17	----
	18	type 'a cont
	19	val callcc: ('a cont -> 'a) -> 'a
	20	val isolate: ('a -> unit) -> 'a cont
	21	val throw: 'a cont -> 'a -> 'b
	22	----
	23
	24	The implementation of CML in MLton is built upon the first-class
	25	threads of the <:MLtonThread:> module.
	26	[source,sml]
	27	----
	28	type 'a t
	29	val new: ('a -> unit) -> 'a t
	30	val prepare: 'a t * 'a -> Runnable.t
	31	val switch: ('a t -> Runnable.t) -> 'a
	32	----
	33
	34	The port is relatively straightforward, because CML always throws to a
	35	continuation at most once. Hence, an "abstract" implementation of
	36	CML could be built upon first-class one-shot continuations, which map
	37	equally well to SML/NJ's continuations and MLton's threads.
	38
	39	The "essence" of the port is to transform:
	40	----
	41	callcc (fn k => ... throw k' v')
	42	----
	43	{empty}to
	44	----
	45	switch (fn t => ... prepare (t', v'))
	46	----
	47	which suffices for the vast majority of the CML implementation.
	48
	49	There was only one complicated transformation: blocking multiple base
	50	events. In SML/NJ CML, the representation of base events is given by:
	51	[source,sml]
	52	----
	53	datatype 'a event_status
	54	= ENABLED of {prio: int, doFn: unit -> 'a}
	55	\| BLOCKED of {
	56	transId: trans_id ref,
	57	cleanUp: unit -> unit,
	58	next: unit -> unit
	59	} -> 'a
	60	type 'a base_evt = unit -> 'a event_status
	61	----
	62
	63	When synchronizing on a set of base events, which are all blocked, we
	64	must invoke each `BLOCKED` function with the same `transId` and
65	`cleanUp` (the `transId` is (checked and) set to `CANCEL` by the
66	`cleanUp` function, which is invoked by the first enabled event; this
67	"fizzles" every other event in the synchronization group that later
68	becomes enabled). However, each `BLOCKED` function is implemented by
69	a callcc, so that when the event is enabled, it throws back to the
70	point of synchronization. Hence, the next function (which doesn't
71	return) is invoked by the `BLOCKED` function to escape the callcc and
72	continue in the thread performing the synchronization. In SML/NJ this
73	is implemented as follows:
74	[source,sml]
75	----
76	fun ext ([], blockFns) = callcc (fn k => let
77	val throw = throw k
78	val (transId, setFlg) = mkFlg()
79	fun log [] = S.atomicDispatch ()
80	\| log (blockFn:: r) =
81	throw (blockFn {
82	transId = transId,
83	cleanUp = setFlg,
84	next = fn () => log r
85	})
86	in
87	log blockFns; error "[log]"
88	end)
89	----
90	(Note that `S.atomicDispatch` invokes the continuation of the next
91	continuation on the ready queue.) This doesn't map well to the MLton
92	thread model. Although it follows the
93	----
94	callcc (fn k => ... throw k v)
95	----
96	model, the fact that `blockFn` will also attempt to do
97	----
98	callcc (fn k' => ... next ())
99	----
100	means that the naive transformation will result in nested `switch`-es.
101
102	We need to think a little more about what this code is trying to do.
103	Essentially, each `blockFn` wants to capture this continuation, hold
104	on to it until the event is enabled, and continue with next; when the
105	event is enabled, before invoking the continuation and returning to
106	the synchronization point, the `cleanUp` and other event specific
107	operations are performed.
108
109	To accomplish the same effect in the MLton thread implementation, we
110	have the following:
111	[source,sml]
112	----
113	datatype 'a status =
114	ENABLED of {prio: int, doitFn: unit -> 'a}
115	\| BLOCKED of {transId: trans_id,
116	cleanUp: unit -> unit,
117	next: unit -> rdy_thread} -> 'a
118
119	type 'a base = unit -> 'a status
120
121	fun ext ([], blockFns): 'a =
122	S.atomicSwitch
123	(fn (t: 'a S.thread) =>
124	let
125	val (transId, cleanUp) = TransID.mkFlg ()
126	fun log blockFns: S.rdy_thread =
127	case blockFns of
128	[] => S.next ()
129	\| blockFn::blockFns =>
130	(S.prep o S.new)
131	(fn _ => fn () =>
132	let
133	val () = S.atomicBegin ()
134	val x = blockFn {transId = transId,
135	cleanUp = cleanUp,
136	next = fn () => log blockFns}
137	in S.switch(fn _ => S.prepVal (t, x))
138	end)
139	in
140	log blockFns
141	end)
142	----
143
144	To avoid the nested `switch`-es, I run the `blockFn` in it's own
145	thread, whose only purpose is to return to the synchronization point.
146	This corresponds to the `throw (blockFn {...})` in the SML/NJ
147	implementation. I'm worried that this implementation might be a
148	little expensive, starting a new thread for each blocked event (when
149	there are only multiple blocked events in a synchronization group).
150	But, I don't see another way of implementing this behavior in the
151	MLton thread model.
152
153	Note that another way of thinking about what is going on is to
154	consider each `blockFn` as prepending a different set of actions to
155	the thread `t`. It might be possible to give a
156	`MLton.Thread.unsafePrepend`.
157	[source,sml]
158	----
159	fun unsafePrepend (T r: 'a t, f: 'b -> 'a): 'b t =
160	let
161	val t =
162	case !r of
163	Dead => raise Fail "prepend to a Dead thread"
164	\| New g => New (g o f)
165	\| Paused (g, t) => Paused (fn h => g (f o h), t)
166	in (* r := Dead; *)
167	T (ref t)
168	end
169	----
170	I have commented out the `r := Dead`, which would allow multiple
171	prepends to the same thread (i.e., not destroying the original thread
172	in the process). Of course, only one of the threads could be run: if
173	the original thread were in the `Paused` state, then multiple threads
174	would share the underlying runtime/primitive thread. Now, this
175	matches the "one-shot" nature of CML continuations/threads, but I'm
176	not comfortable with extending `MLton.Thread` with such an unsafe
177	operation.
178
179	Other than this complication with blocking multiple base events, the
180	port was quite routine. (As a very pleasant surprise, the CML
181	implementation in SML/NJ doesn't use any SML/NJ-isms.) There is a
182	slight difference in the way in which critical sections are handled in
183	SML/NJ and MLton; since `MLton.Thread.switch` _always_ leaves a
184	critical section, it is sometimes necessary to add additional
185	`atomicBegin`-s/`atomicEnd`-s to ensure that we remain in a critical
186	section after a thread switch.
187
188	While looking at virtually every file in the core CML implementation,
189	I took the liberty of simplifying things where it seemed possible; in
190	terms of style, the implementation is about half-way between Reppy's
191	original and MLton's.
192
193	Some changes of note:
194
195	* `util/` contains all pertinent data-structures: (functional and
196	imperative) queues, (functional) priority queues. Hence, it should be
197	easier to switch in more efficient or real-time implementations.
198
199	* `core-cml/scheduler.sml`: in both implementations, this is where
200	most of the interesting action takes place. I've made the connection
201	between `MLton.Thread.t`-s and `ThreadId.thread_id`-s more abstract
202	than it is in the SML/NJ implementation, and encapsulated all of the
203	`MLton.Thread` operations in this module.
204
205	* eliminated all of the "by hand" inlining
206
207
208	== Future Extensions ==
209
210	The CML documentation says the following:
211	____
212
213	----
214	CML.joinEvt: thread_id -> unit event
215	----
216
217	* `joinEvt tid`
218	+
219	creates an event value for synchronizing on the termination of the
220	thread with the ID tid. There are three ways that a thread may
221	terminate: the function that was passed to spawn (or spawnc) may
222	return; it may call the exit function, or it may have an uncaught
223	exception. Note that `joinEvt` does not distinguish between these
224	cases; it also does not become enabled if the named thread deadlocks
225	(even if it is garbage collected).
226	____
227
228	I believe that the `MLton.Finalizable` might be able to relax that
229	last restriction. Upon the creation of a `'a Scheduler.thread`, we
230	could attach a finalizer to the underlying `'a MLton.Thread.t` that
231	enables the `joinEvt` (in the associated `ThreadID.thread_id`) when
232	the `'a MLton.Thread.t` becomes unreachable.
233
234	I don't know why CML doesn't have
235	----
236	CML.kill: thread_id -> unit
237	----
238	which has a fairly simple implementation -- setting a kill flag in the
239	`thread_id` and adjusting the scheduler to discard any killed threads
240	that it takes off the ready queue. The fairness of the scheduler
241	ensures that a killed thread will eventually be discarded. The
242	semantics are little murky for blocked threads that are killed,
243	though. For example, consider a thread blocked on `SyncVar.mTake mv`
244	and a thread blocked on `SyncVar.mGet mv`. If the first thread is
245	killed while blocked, and a third thread does `SyncVar.mPut (mv, x)`,
246	then we might expect that we'll enable the second thread, and never
247	the first. But, when only the ready queue is able to discard killed
248	threads, then the `SyncVar.mPut` could enable the first thread
249	(putting it on the ready queue, from which it will be discarded) and
250	leave the second thread blocked. We could solve this by adjusting the
251	`TransID.trans_id types` and the "cleaner" functions to look for both
252	canceled transactions and transactions on killed threads.
253
254	John Reppy says that <!Cite(MarlowEtAl01)> and <!Cite(FlattFindler04)>
255	explain why `CML.kill` would be a bad idea.
256
257	Between `CML.timeOutEvt` and `CML.kill`, one could give an efficient
258	solution to the recent `comp.lang.ml` post about terminating a
259	function that doesn't complete in a given time.
260	[source,sml]
261	----
262	fun timeOut (f: unit -> 'a, t: Time.time): 'a option =
263	let
264	val iv = SyncVar.iVar ()
265	val tid = CML.spawn (fn () => SyncVar.iPut (iv, f ()))
266	in
267	CML.select
268	[CML.wrap (CML.timeOutEvt t, fn () => (CML.kill tid; NONE)),
269	CML.wrap (SyncVar.iGetEvt iv, fn x => SOME x)]
270	end
271	----
272
273
274	== Space Safety ==
275
276	There are some CML related posts on the MLton mailing list:
277
278	* http://www.mlton.org/pipermail/mlton/2004-May/
279
280	that discuss concerns that SML/NJ's implementation is not space
281	efficient, because multi-shot continuations can be held indefinitely
282	on event queues. MLton is better off because of the one-shot nature
283	-- when an event enables a thread, all other copies of the thread
284	waiting in other event queues get turned into dead threads (of zero
285	size).