Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / ShareZeroVec.adoc
CommitLineData
7f918cf1
CE
1ShareZeroVec
2============
3
4<:ShareZeroVec:> is an optimization pass for the <:SSA:>
5<:IntermediateLanguage:>, invoked from <:SSASimplify:>.
6
7== Description ==
8
9An SSA optimization to share zero-length vectors.
10
11From <!ViewGitCommit(mlton,be8c5f576)>, which replaced the use of the
12`Array_array0Const` primitive in the Basis Library implementation with a
13(nullary) `Vector_vector` primitive:
14
15________
16
17The original motivation for the `Array_array0Const` primitive was to share the
18heap space required for zero-length vectors among all vectors (of a given type).
19It was claimed that this optimization is important, e.g., in a self-compile,
20where vectors are used for lots of syntax tree elements and many of those
21vectors are empty. See:
22http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html
23
24Curiously, the full effect of this optimization has been missing for quite some
25time (perhaps since the port of <:ConstantPropagation:> to the SSA IL). While
26<:ConstantPropagation:> has "globalized" the nullary application of the
27`Array_array0Const` primitive, it also simultaneously transformed it to an
28application of the `Array_uninit` (previously, the `Array_array`) primitive to
29the zero constant. The hash-consing of globals, meant to create exactly one
30global for each distinct constant, treats `Array_uninit` primitives as unequal
31(appropriately, since `Array_uninit` allocates an array with identity (though
32the identity may be supressed by a subsequent `Array_toVector`)), hence each
33distinct `Array_array0Const` primitive in the program remained as distinct
34globals. The limited amount of inlining prior to <:ConstantPropagation:> meant
35that there were typically fewer than a dozen "copies" of the same empty vector
36in a program for a given type.
37
38As a "functional" primitive, a nullary `Vector_vector` is globalized by
39ClosureConvert, but is further recognized by ConstantPropagation and hash-consed
40into a unique instance for each type.
41________
42
43However, a single, shared, global `Vector_vector ()` inhibits the
44coercion-based optimizations of `Useless`. For example, consider the
45following program:
46
47[source,sml]
48----
49 val n = valOf (Int.fromString (hd (CommandLine.arguments ())))
50
51 val v1 = Vector.tabulate (n, fn i =>
52 let val w = Word16.fromInt i
53 in (w - 0wx1, w, w + 0wx1 + w)
54 end)
55 val v2 = Vector.map (fn (w1, w2, w3) => (w1, 0wx2 * w2, 0wx3 * w3)) v1
56 val v3 = VectorSlice.vector (VectorSlice.slice (v1, 1, SOME (n - 2)))
57 val ans1 = Vector.foldl (fn ((w1,w2,w3),w) => w + w1 + w2 + w3) 0wx0 v1
58 val ans2 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v2
59 val ans3 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v3
60
61 val _ = print (concat ["ans1 = ", Word16.toString ans1, " ",
62 "ans2 = ", Word16.toString ans2, " ",
63 "ans3 = ", Word16.toString ans3, "\n"])
64----
65
66We would like `v2` and `v3` to be optimized from
67`(word16 * word16 * word16) vector` to `word16 vector` because only
68the 2nd component of the elements is needed to compute the answer.
69
70With `Array_array0Const`, each distinct occurrence of
71`Array_array0Const((word16 * word16 * word16))` arising from
72polyvariance and inlining remained a distinct
73`Array_uninit((word16 * word16 * word16)) (0x0)` global, which
74resulted in distinct occurrences for the
75`val v1 = Vector.tabulate ...` and for the
76`val v2 = Vector.map ...`. The latter could be optimized to
77`Array_uninit(word16) (0x0)` by `Useless`, because its result only
78flows to places requiring the 2nd component of the elements.
79
80With `Vector_vector ()`, the distinct occurrences of
81`Vector_vector((word16 * word16 * word16)) ()` arising from
82polyvariance are globalized during `ClosureConvert`, those global
83references may be further duplicated by inlining, but the distinct
84occurrences of `Vector_vector((word16 * word16 * word16)) ()` are
85merged to a single occurrence. Because this result flows to places
86requiring all three components of the elements, it remains
87`Vector_vector((word16 * word16 * word16)) ()` after
88`Useless`. Furthermore, because one cannot (in constant time) coerce a
89`(word16 * word16 * word16) vector` to a `word16 vector`, the `v2`
90value remains of type `(word16 * word16 * word16) vector`.
91
92One option would be to drop the 0-element vector "optimization"
93entirely. This costs some space (no sharing of empty vectors) and
94some time (allocation and garbage collection of empty vectors).
95
96Another option would be to reinstate the `Array_array0Const` primitive
97and associated `ConstantPropagation` treatment. But, the semantics
98and purpose of `Array_array0Const` was poorly understood, resulting in
99this break.
100
101The <:ShareZeroVec:> pass pursues a different approach: perform the 0-element
102vector "optimization" as a separate optimization, after
103`ConstantPropagation` and `Useless`. A trivial static analysis is
104used to match `val v: t vector = Array_toVector(t) (a)` with
105corresponding `val a: array = Array_uninit(t) (l)` and the later are
106expanded to
107`val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)`
108with a single global `val zeroArr_[t] = Array_uninit(t) (0)` created
109for each distinct type (after coercion-based optimizations).
110
111One disadvantage of this approach, compared to the `Vector_vector(t) ()`
112approach, is that `Array_toVector` is applied each time a vector
113is created, even if it is being applied to the `zeroArr_[t]`
114zero-length array. (Although, this was the behavior of the
115`Array_array0Const` approach.) This updates the object header each
116time, whereas the `Vector_vector(t) ()` approach would have updated
117the object header once, when the global was created, and the
118`zeroVec_[t]` global and the `Array_toVector` result would flow to the
119join point.
120
121It would be possible to properly share zero-length vectors, but doing
122so is a more sophisticated analysis and transformation, because there
123can be arbitrary code between the
124`val a: t array = Array_uninit(t) (l)` and the corresponding
125`val v: v vector = Array_toVector(t) (a)`, although, in practice,
126nothing happens when a zero-length vector is created. It may be best
127to pursue a more general "array to vector" optimization that
128transforms creations of static-length vectors (e.g., all the
129`Vector.new<N>` functions) into `Vector_vector` primitives (some of
130which could be globalized).
131
132== Implementation ==
133
134* <!ViewGitFile(mlton,master,mlton/ssa/share-zero-vec.fun)>
135
136== Details and Notes ==
137
138{empty}