Backport from sid to buster
[hcoop/debian/mlton.git] / doc / guide / src / Regions.adoc
1 Regions
2 =======
3
4 In region-based memory management, the heap is divided into a
5 collection of regions into which objects are allocated. At compile
6 time, either in the source program or through automatic inference,
7 allocation points are annotated with the region in which the
8 allocation will occur. Typically, although not always, the regions
9 are allocated and deallocated according to a stack discipline.
10
11 MLton does not use region-based memory management; it uses traditional
12 <:GarbageCollection:>. We have considered integrating regions with
13 MLton, but in our opinion it is far from clear that regions would
14 provide MLton with improved performance, while they would certainly
15 add a lot of complexity to the compiler and complicate reasoning about
16 and achieving <:SpaceSafety:>. Region-based memory management and
17 garbage collection have different strengths and weaknesses; it's
18 pretty easy to come up with programs that do significantly better
19 under regions than under GC, and vice versa. We believe that it is
20 the case that common SML idioms tend to work better under GC than
21 under regions.
22
23 One common argument for regions is that the region operations can all
24 be done in (approximately) constant time; therefore, you eliminate GC
25 pause times, leading to a real-time GC. However, because of space
26 safety concerns (see below), we believe that region-based memory
27 management for SML must also include a traditional garbage collector.
28 Hence, to achieve real-time memory management for MLton/SML, we
29 believe that it would be both easier and more efficient to implement a
30 traditional real-time garbage collector than it would be to implement
31 a region system.
32
33 == Regions, the ML Kit, and space safety ==
34
35 The <:MLKit:ML Kit> pioneered the use of regions for compiling
36 Standard ML. The ML Kit maintains a stack of regions at run time. At
37 compile time, it uses region inference to decide when data can be
38 allocated in a stack-like manner, assigning it to an appropriate
39 region. The ML Kit has put a lot of effort into improving the
40 supporting analyses and representations of regions, which are all
41 necessary to improve the performance.
42
43 Unfortunately, under a pure stack-based region system, space leaks are
44 inevitable in theory, and costly in practice. Data for which region
45 inference can not determine the lifetime is moved into the "global
46 region" whose lifetime is the entire program. There are two ways in
47 which region inference will place an object to the global region.
48
49 * When the inference is too conservative, that is, when the data is
50 used in a stack-like manner but the region inference can't figure it
51 out.
52
53 * When data is not used in a stack-like manner. In this case,
54 correctness requires region inference to place the object
55
56 This global region is a source of space leaks. No matter what region
57 system you use, there are some programs such that the global region
58 must exist, and its size will grow to an unbounded multiple of the
59 live data size. For these programs one must have a GC to achieve
60 space safety.
61
62 To solve this problem, the ML Kit has undergone work to combine
63 garbage collection with region-based memory management.
64 <!Cite(HallenbergEtAl02)> and <!Cite(Elsman03)> describe the addition
65 of a garbage collector to the ML Kit's region-based system. These
66 papers provide convincing evidence for space leaks in the global
67 region. They show a number of benchmarks where the memory usage of
68 the program running with just regions is a large multiple (2, 10, 50,
69 even 150) of the program running with regions plus GC.
70
71 These papers also give some numbers to show the ML Kit with just
72 regions does better than either a system with just GC or a combined
73 system. Unfortunately, a pure region system isn't practical because
74 of the lack of space safety. And the other performance numbers are
75 not so convincing, because they compare to an old version of SML/NJ
76 and not at all with MLton. It would be interesting to see a
77 comparison with a more serious collector.
78
79 == Regions, Garbage Collection, and Cyclone ==
80
81 One possibility is to take Cyclone's approach, and provide both
82 region-based memory management and garbage collection, but at the
83 programmer's option (<!Cite(GrossmanEtAl02)>, <!Cite(HicksEtAl03)>).
84
85 One might ask whether we might do the same thing -- i.e., provide a
86 `MLton.Regions` structure with explicit region based memory
87 management operations, so that the programmer could use them when
88 appropriate. <:MatthewFluet:> has thought about this question
89
90 * http://www.cs.cornell.edu/People/fluet/rgn-monad/index.html
91
92 Unfortunately, his conclusion is that the SML type system is too weak
93 to support this option, although there might be a "poor-man's" version
94 with dynamic checks.