Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | ShareZeroVec |
2 | ============ | |
3 | ||
4 | <:ShareZeroVec:> is an optimization pass for the <:SSA:> | |
5 | <:IntermediateLanguage:>, invoked from <:SSASimplify:>. | |
6 | ||
7 | == Description == | |
8 | ||
9 | An SSA optimization to share zero-length vectors. | |
10 | ||
11 | From <!ViewGitCommit(mlton,be8c5f576)>, which replaced the use of the | |
12 | `Array_array0Const` primitive in the Basis Library implementation with a | |
13 | (nullary) `Vector_vector` primitive: | |
14 | ||
15 | ________ | |
16 | ||
17 | The original motivation for the `Array_array0Const` primitive was to share the | |
18 | heap space required for zero-length vectors among all vectors (of a given type). | |
19 | It was claimed that this optimization is important, e.g., in a self-compile, | |
20 | where vectors are used for lots of syntax tree elements and many of those | |
21 | vectors are empty. See: | |
22 | http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html | |
23 | ||
24 | Curiously, the full effect of this optimization has been missing for quite some | |
25 | time (perhaps since the port of <:ConstantPropagation:> to the SSA IL). While | |
26 | <:ConstantPropagation:> has "globalized" the nullary application of the | |
27 | `Array_array0Const` primitive, it also simultaneously transformed it to an | |
28 | application of the `Array_uninit` (previously, the `Array_array`) primitive to | |
29 | the zero constant. The hash-consing of globals, meant to create exactly one | |
30 | global for each distinct constant, treats `Array_uninit` primitives as unequal | |
31 | (appropriately, since `Array_uninit` allocates an array with identity (though | |
32 | the identity may be supressed by a subsequent `Array_toVector`)), hence each | |
33 | distinct `Array_array0Const` primitive in the program remained as distinct | |
34 | globals. The limited amount of inlining prior to <:ConstantPropagation:> meant | |
35 | that there were typically fewer than a dozen "copies" of the same empty vector | |
36 | in a program for a given type. | |
37 | ||
38 | As a "functional" primitive, a nullary `Vector_vector` is globalized by | |
39 | ClosureConvert, but is further recognized by ConstantPropagation and hash-consed | |
40 | into a unique instance for each type. | |
41 | ________ | |
42 | ||
43 | However, a single, shared, global `Vector_vector ()` inhibits the | |
44 | coercion-based optimizations of `Useless`. For example, consider the | |
45 | following program: | |
46 | ||
47 | [source,sml] | |
48 | ---- | |
49 | val n = valOf (Int.fromString (hd (CommandLine.arguments ()))) | |
50 | ||
51 | val v1 = Vector.tabulate (n, fn i => | |
52 | let val w = Word16.fromInt i | |
53 | in (w - 0wx1, w, w + 0wx1 + w) | |
54 | end) | |
55 | val v2 = Vector.map (fn (w1, w2, w3) => (w1, 0wx2 * w2, 0wx3 * w3)) v1 | |
56 | val v3 = VectorSlice.vector (VectorSlice.slice (v1, 1, SOME (n - 2))) | |
57 | val ans1 = Vector.foldl (fn ((w1,w2,w3),w) => w + w1 + w2 + w3) 0wx0 v1 | |
58 | val ans2 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v2 | |
59 | val ans3 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v3 | |
60 | ||
61 | val _ = print (concat ["ans1 = ", Word16.toString ans1, " ", | |
62 | "ans2 = ", Word16.toString ans2, " ", | |
63 | "ans3 = ", Word16.toString ans3, "\n"]) | |
64 | ---- | |
65 | ||
66 | We would like `v2` and `v3` to be optimized from | |
67 | `(word16 * word16 * word16) vector` to `word16 vector` because only | |
68 | the 2nd component of the elements is needed to compute the answer. | |
69 | ||
70 | With `Array_array0Const`, each distinct occurrence of | |
71 | `Array_array0Const((word16 * word16 * word16))` arising from | |
72 | polyvariance and inlining remained a distinct | |
73 | `Array_uninit((word16 * word16 * word16)) (0x0)` global, which | |
74 | resulted in distinct occurrences for the | |
75 | `val v1 = Vector.tabulate ...` and for the | |
76 | `val v2 = Vector.map ...`. The latter could be optimized to | |
77 | `Array_uninit(word16) (0x0)` by `Useless`, because its result only | |
78 | flows to places requiring the 2nd component of the elements. | |
79 | ||
80 | With `Vector_vector ()`, the distinct occurrences of | |
81 | `Vector_vector((word16 * word16 * word16)) ()` arising from | |
82 | polyvariance are globalized during `ClosureConvert`, those global | |
83 | references may be further duplicated by inlining, but the distinct | |
84 | occurrences of `Vector_vector((word16 * word16 * word16)) ()` are | |
85 | merged to a single occurrence. Because this result flows to places | |
86 | requiring all three components of the elements, it remains | |
87 | `Vector_vector((word16 * word16 * word16)) ()` after | |
88 | `Useless`. Furthermore, because one cannot (in constant time) coerce a | |
89 | `(word16 * word16 * word16) vector` to a `word16 vector`, the `v2` | |
90 | value remains of type `(word16 * word16 * word16) vector`. | |
91 | ||
92 | One option would be to drop the 0-element vector "optimization" | |
93 | entirely. This costs some space (no sharing of empty vectors) and | |
94 | some time (allocation and garbage collection of empty vectors). | |
95 | ||
96 | Another option would be to reinstate the `Array_array0Const` primitive | |
97 | and associated `ConstantPropagation` treatment. But, the semantics | |
98 | and purpose of `Array_array0Const` was poorly understood, resulting in | |
99 | this break. | |
100 | ||
101 | The <:ShareZeroVec:> pass pursues a different approach: perform the 0-element | |
102 | vector "optimization" as a separate optimization, after | |
103 | `ConstantPropagation` and `Useless`. A trivial static analysis is | |
104 | used to match `val v: t vector = Array_toVector(t) (a)` with | |
105 | corresponding `val a: array = Array_uninit(t) (l)` and the later are | |
106 | expanded to | |
107 | `val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)` | |
108 | with a single global `val zeroArr_[t] = Array_uninit(t) (0)` created | |
109 | for each distinct type (after coercion-based optimizations). | |
110 | ||
111 | One disadvantage of this approach, compared to the `Vector_vector(t) ()` | |
112 | approach, is that `Array_toVector` is applied each time a vector | |
113 | is created, even if it is being applied to the `zeroArr_[t]` | |
114 | zero-length array. (Although, this was the behavior of the | |
115 | `Array_array0Const` approach.) This updates the object header each | |
116 | time, whereas the `Vector_vector(t) ()` approach would have updated | |
117 | the object header once, when the global was created, and the | |
118 | `zeroVec_[t]` global and the `Array_toVector` result would flow to the | |
119 | join point. | |
120 | ||
121 | It would be possible to properly share zero-length vectors, but doing | |
122 | so is a more sophisticated analysis and transformation, because there | |
123 | can be arbitrary code between the | |
124 | `val a: t array = Array_uninit(t) (l)` and the corresponding | |
125 | `val v: v vector = Array_toVector(t) (a)`, although, in practice, | |
126 | nothing happens when a zero-length vector is created. It may be best | |
127 | to pursue a more general "array to vector" optimization that | |
128 | transforms creations of static-length vectors (e.g., all the | |
129 | `Vector.new<N>` functions) into `Vector_vector` primitives (some of | |
130 | which could be globalized). | |
131 | ||
132 | == Implementation == | |
133 | ||
134 | * <!ViewGitFile(mlton,master,mlton/ssa/share-zero-vec.fun)> | |
135 | ||
136 | == Details and Notes == | |
137 | ||
138 | {empty} |