Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | Regions |
2 | ======= | |
3 | ||
4 | In region-based memory management, the heap is divided into a | |
5 | collection of regions into which objects are allocated. At compile | |
6 | time, either in the source program or through automatic inference, | |
7 | allocation points are annotated with the region in which the | |
8 | allocation will occur. Typically, although not always, the regions | |
9 | are allocated and deallocated according to a stack discipline. | |
10 | ||
11 | MLton does not use region-based memory management; it uses traditional | |
12 | <:GarbageCollection:>. We have considered integrating regions with | |
13 | MLton, but in our opinion it is far from clear that regions would | |
14 | provide MLton with improved performance, while they would certainly | |
15 | add a lot of complexity to the compiler and complicate reasoning about | |
16 | and achieving <:SpaceSafety:>. Region-based memory management and | |
17 | garbage collection have different strengths and weaknesses; it's | |
18 | pretty easy to come up with programs that do significantly better | |
19 | under regions than under GC, and vice versa. We believe that it is | |
20 | the case that common SML idioms tend to work better under GC than | |
21 | under regions. | |
22 | ||
23 | One common argument for regions is that the region operations can all | |
24 | be done in (approximately) constant time; therefore, you eliminate GC | |
25 | pause times, leading to a real-time GC. However, because of space | |
26 | safety concerns (see below), we believe that region-based memory | |
27 | management for SML must also include a traditional garbage collector. | |
28 | Hence, to achieve real-time memory management for MLton/SML, we | |
29 | believe that it would be both easier and more efficient to implement a | |
30 | traditional real-time garbage collector than it would be to implement | |
31 | a region system. | |
32 | ||
33 | == Regions, the ML Kit, and space safety == | |
34 | ||
35 | The <:MLKit:ML Kit> pioneered the use of regions for compiling | |
36 | Standard ML. The ML Kit maintains a stack of regions at run time. At | |
37 | compile time, it uses region inference to decide when data can be | |
38 | allocated in a stack-like manner, assigning it to an appropriate | |
39 | region. The ML Kit has put a lot of effort into improving the | |
40 | supporting analyses and representations of regions, which are all | |
41 | necessary to improve the performance. | |
42 | ||
43 | Unfortunately, under a pure stack-based region system, space leaks are | |
44 | inevitable in theory, and costly in practice. Data for which region | |
45 | inference can not determine the lifetime is moved into the "global | |
46 | region" whose lifetime is the entire program. There are two ways in | |
47 | which region inference will place an object to the global region. | |
48 | ||
49 | * When the inference is too conservative, that is, when the data is | |
50 | used in a stack-like manner but the region inference can't figure it | |
51 | out. | |
52 | ||
53 | * When data is not used in a stack-like manner. In this case, | |
54 | correctness requires region inference to place the object | |
55 | ||
56 | This global region is a source of space leaks. No matter what region | |
57 | system you use, there are some programs such that the global region | |
58 | must exist, and its size will grow to an unbounded multiple of the | |
59 | live data size. For these programs one must have a GC to achieve | |
60 | space safety. | |
61 | ||
62 | To solve this problem, the ML Kit has undergone work to combine | |
63 | garbage collection with region-based memory management. | |
64 | <!Cite(HallenbergEtAl02)> and <!Cite(Elsman03)> describe the addition | |
65 | of a garbage collector to the ML Kit's region-based system. These | |
66 | papers provide convincing evidence for space leaks in the global | |
67 | region. They show a number of benchmarks where the memory usage of | |
68 | the program running with just regions is a large multiple (2, 10, 50, | |
69 | even 150) of the program running with regions plus GC. | |
70 | ||
71 | These papers also give some numbers to show the ML Kit with just | |
72 | regions does better than either a system with just GC or a combined | |
73 | system. Unfortunately, a pure region system isn't practical because | |
74 | of the lack of space safety. And the other performance numbers are | |
75 | not so convincing, because they compare to an old version of SML/NJ | |
76 | and not at all with MLton. It would be interesting to see a | |
77 | comparison with a more serious collector. | |
78 | ||
79 | == Regions, Garbage Collection, and Cyclone == | |
80 | ||
81 | One possibility is to take Cyclone's approach, and provide both | |
82 | region-based memory management and garbage collection, but at the | |
83 | programmer's option (<!Cite(GrossmanEtAl02)>, <!Cite(HicksEtAl03)>). | |
84 | ||
85 | One might ask whether we might do the same thing -- i.e., provide a | |
86 | `MLton.Regions` structure with explicit region based memory | |
87 | management operations, so that the programmer could use them when | |
88 | appropriate. <:MatthewFluet:> has thought about this question | |
89 | ||
90 | * http://www.cs.cornell.edu/People/fluet/rgn-monad/index.html | |
91 | ||
92 | Unfortunately, his conclusion is that the SML type system is too weak | |
93 | to support this option, although there might be a "poor-man's" version | |
94 | with dynamic checks. |