Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | CommonArg |
2 | ========= | |
3 | ||
4 | <:CommonArg:> is an optimization pass for the <:SSA:> | |
5 | <:IntermediateLanguage:>, invoked from <:SSASimplify:>. | |
6 | ||
7 | == Description == | |
8 | ||
9 | It optimizes instances of `Goto` transfers that pass the same | |
10 | arguments to the same label; e.g. | |
11 | ---- | |
12 | L_1 () | |
13 | ... | |
14 | z1 = ? | |
15 | ... | |
16 | L_3 (x, y, z1) | |
17 | L_2 () | |
18 | ... | |
19 | z2 = ? | |
20 | ... | |
21 | L_3 (x, y, z2) | |
22 | L_3 (a, b, c) | |
23 | ... | |
24 | ---- | |
25 | ||
26 | This code can be simplified to: | |
27 | ---- | |
28 | L_1 () | |
29 | ... | |
30 | z1 = ? | |
31 | ... | |
32 | L_3 (z1) | |
33 | L_2 () | |
34 | ... | |
35 | z2 = ? | |
36 | ... | |
37 | L_3 (z2) | |
38 | L_3 (c) | |
39 | a = x | |
40 | b = y | |
41 | ---- | |
42 | which saves a number of resources: time of setting up the arguments | |
43 | for the jump to `L_3`, space (either stack or pseudo-registers) for | |
44 | the arguments of `L_3`, etc. It may also expose some other | |
45 | optimizations, if more information is known about `x` or `y`. | |
46 | ||
47 | == Implementation == | |
48 | ||
49 | * <!ViewGitFile(mlton,master,mlton/ssa/common-arg.fun)> | |
50 | ||
51 | == Details and Notes == | |
52 | ||
53 | Three analyses were originally proposed to drive the optimization | |
54 | transformation. Only the _Dominator Analysis_ is currently | |
55 | implemented. (Implementations of the other analyses are available in | |
56 | the <:Sources:repository history>.) | |
57 | ||
58 | === Syntactic Analysis === | |
59 | ||
60 | The simplest analysis I could think of maintains | |
61 | ---- | |
62 | varInfo: Var.t -> Var.t option list ref | |
63 | ---- | |
64 | initialized to `[]`. | |
65 | ||
66 | * For each variable `v` bound in a `Statement.t` or in the | |
67 | `Function.t` args, then `List.push(varInfo v, NONE)`. | |
68 | * For each `L (x1, ..., xn)` transfer where `(a1, ..., an)` are the | |
69 | formals of `L`, then `List.push(varInfo ai, SOME xi)`. | |
70 | * For each block argument a used in an unknown context (e.g., | |
71 | arguments of blocks used as continuations, handlers, arith success, | |
72 | runtime return, or case switch labels), then | |
73 | `List.push(varInfo a, NONE)`. | |
74 | ||
75 | Now, any block argument `a` such that `varInfo a = xs`, where all of | |
76 | the elements of `xs` are equal to `SOME x`, can be optimized by | |
77 | setting `a = x` at the beginning of the block and dropping the | |
78 | argument from `Goto` transfers. | |
79 | ||
80 | That takes care of the example above. We can clearly do slightly | |
81 | better, by changing the transformation criteria to the following: any | |
82 | block argument a such that `varInfo a = xs`, where all of the elements | |
83 | of `xs` are equal to `SOME x` _or_ are equal to `SOME a`, can be | |
84 | optimized by setting `a = x` at the beginning of the block and | |
85 | dropping the argument from `Goto` transfers. This optimizes a case | |
86 | like: | |
87 | ---- | |
88 | L_1 () | |
89 | ... z1 = ? ... | |
90 | L_3 (x, y, z1) | |
91 | L_2 () | |
92 | ... z2 = ? ... | |
93 | L_3(x, y, z2) | |
94 | L_3 (a, b, c) | |
95 | ... w = ? ... | |
96 | case w of | |
97 | true => L_4 | false => L_5 | |
98 | L_4 () | |
99 | ... | |
100 | L_3 (a, b, w) | |
101 | L_5 () | |
102 | ... | |
103 | ---- | |
104 | where a common argument is passed to a loop (and is invariant through | |
105 | the loop). Of course, the <:LoopInvariant:> optimization pass would | |
106 | normally introduce a local loop and essentially reduce this to the | |
107 | first example, but I have seen this in practice, which suggests that | |
108 | some optimizations after <:LoopInvariant:> do enough simplifications | |
109 | to introduce (new) loop invariant arguments. | |
110 | ||
111 | === Fixpoint Analysis === | |
112 | ||
113 | However, the above analysis and transformation doesn't cover the cases | |
114 | where eliminating one common argument exposes the opportunity to | |
115 | eliminate other common arguments. For example: | |
116 | ---- | |
117 | L_1 () | |
118 | ... | |
119 | L_3 (x) | |
120 | L_2 () | |
121 | ... | |
122 | L_3 (x) | |
123 | L_3 (a) | |
124 | ... | |
125 | L_5 (a) | |
126 | L_4 () | |
127 | ... | |
128 | L_5 (x) | |
129 | L_5 (b) | |
130 | ... | |
131 | ---- | |
132 | ||
133 | One pass of analysis and transformation would eliminate the argument | |
134 | to `L_3` and rewrite the `L_5(a)` transfer to `L_5 (x)`, thereby | |
135 | exposing the opportunity to eliminate the common argument to `L_5`. | |
136 | ||
137 | The interdependency the arguments to `L_3` and `L_5` suggest | |
138 | performing some sort of fixed-point analysis. This analysis is | |
139 | relatively simple; maintain | |
140 | ---- | |
141 | varInfo: Var.t -> VarLattice.t | |
142 | ---- | |
143 | {empty}where | |
144 | ---- | |
145 | VarLattice.t ~=~ Bot | Point of Var.t | Top | |
146 | ---- | |
147 | (but is implemented by the <:FlatLattice:> functor with a `lessThan` | |
148 | list and `value ref` under the hood), initialized to `Bot`. | |
149 | ||
150 | * For each variable `v` bound in a `Statement.t` or in the | |
151 | `Function.t` args, then `VarLattice.<= (Point v, varInfo v)` | |
152 | * For each `L (x1, ..., xn)` transfer where `(a1, ..., an)` are the | |
153 | formals of `L`}, then `VarLattice.<= (varInfo xi, varInfo ai)`. | |
154 | * For each block argument a used in an unknown context, then | |
155 | `VarLattice.<= (Point a, varInfo a)`. | |
156 | ||
157 | Now, any block argument a such that `varInfo a = Point x` can be | |
158 | optimized by setting `a = x` at the beginning of the block and | |
159 | dropping the argument from `Goto` transfers. | |
160 | ||
161 | Now, with the last example, we introduce the ordering constraints: | |
162 | ---- | |
163 | varInfo x <= varInfo a | |
164 | varInfo a <= varInfo b | |
165 | varInfo x <= varInfo b | |
166 | ---- | |
167 | ||
168 | Assuming that `varInfo x = Point x`, then we get `varInfo a = Point x` | |
169 | and `varInfo b = Point x`, and we optimize the example as desired. | |
170 | ||
171 | But, that is a rather weak assumption. It's quite possible for | |
172 | `varInfo x = Top`. For example, consider: | |
173 | ---- | |
174 | G_1 () | |
175 | ... n = 1 ... | |
176 | L_0 (n) | |
177 | G_2 () | |
178 | ... m = 2 ... | |
179 | L_0 (m) | |
180 | L_0 (x) | |
181 | ... | |
182 | L_1 () | |
183 | ... | |
184 | L_3 (x) | |
185 | L_2 () | |
186 | ... | |
187 | L_3 (x) | |
188 | L_3 (a) | |
189 | ... | |
190 | L_5(a) | |
191 | L_4 () | |
192 | ... | |
193 | L_5(x) | |
194 | L_5 (b) | |
195 | ... | |
196 | ---- | |
197 | ||
198 | Now `varInfo x = varInfo a = varInfo b = Top`. What went wrong here? | |
199 | When `varInfo x` went to `Top`, it got propagated all the way through | |
200 | to `a` and `b`, and prevented the elimination of any common arguments. | |
201 | What we'd like to do instead is when `varInfo x` goes to `Top`, | |
202 | propagate on `Point x` -- we have no hope of eliminating `x`, but if | |
203 | we hold `x` constant, then we have a chance of eliminating arguments | |
204 | for which `x` is passed as an actual. | |
205 | ||
206 | === Dominator Analysis === | |
207 | ||
208 | Does anyone see where this is going yet? Pausing for a little | |
209 | thought, <:MatthewFluet:> realized that he had once before tried | |
210 | proposing this kind of "fix" to a fixed-point analysis -- when we were | |
211 | first investigating the <:Contify:> optimization in light of John | |
212 | Reppy's CWS paper. Of course, that "fix" failed because it defined a | |
213 | non-monotonic function and one couldn't take the fixed point. But, | |
214 | <:StephenWeeks:> suggested a dominator based approach, and we were | |
215 | able to show that, indeed, the dominator analysis subsumed both the | |
216 | previous call based analysis and the cont based analysis. And, a | |
217 | moment's reflection reveals further parallels: when | |
218 | `varInfo: Var.t -> Var.t option list ref`, we have something analogous | |
219 | to the call analysis, and when `varInfo: Var.t -> VarLattice.t`, we | |
220 | have something analogous to the cont analysis. Maybe there is | |
221 | something analogous to the dominator approach (and therefore superior | |
222 | to the previous analyses). | |
223 | ||
224 | And this turns out to be the case. Construct the graph `G` as follows: | |
225 | ---- | |
226 | nodes(G) = {Root} U Var.t | |
227 | edges(G) = {Root -> v | v bound in a Statement.t or | |
228 | in the Function.t args} U | |
229 | {xi -> ai | L(x1, ..., xn) transfer where (a1, ..., an) | |
230 | are the formals of L} U | |
231 | {Root -> a | a is a block argument used in an unknown context} | |
232 | ---- | |
233 | ||
234 | Let `idom(x)` be the immediate dominator of `x` in `G` with root | |
235 | `Root`. Now, any block argument a such that `idom(a) = x <> Root` can | |
236 | be optimized by setting `a = x` at the beginning of the block and | |
237 | dropping the argument from `Goto` transfers. | |
238 | ||
239 | Furthermore, experimental evidence suggests (and we are confident that | |
240 | a formal presentation could prove) that the dominator analysis | |
241 | subsumes the "syntactic" and "fixpoint" based analyses in this context | |
242 | as well and that the dominator analysis gets "everything" in one go. | |
243 | ||
244 | === Final Thoughts === | |
245 | ||
246 | I must admit, I was rather surprised at this progression and final | |
247 | result. At the outset, I never would have thought of a connection | |
248 | between <:Contify:> and <:CommonArg:> optimizations. They would seem | |
249 | to be two completely different optimizations. Although, this may not | |
250 | really be the case. As one of the reviewers of the ICFP paper said: | |
251 | ____ | |
252 | I understand that such a form of CPS might be convenient in some | |
253 | cases, but when we're talking about analyzing code to detect that some | |
254 | continuation is constant, I think it makes a lot more sense to make | |
255 | all the continuation arguments completely explicit. | |
256 | ||
257 | I believe that making all the continuation arguments explicit will | |
258 | show that the optimization can be generalized to eliminating constant | |
259 | arguments, whether continuations or not. | |
260 | ____ | |
261 | ||
262 | What I think the common argument optimization shows is that the | |
263 | dominator analysis does slightly better than the reviewer puts it: we | |
264 | find more than just constant continuations, we find common | |
265 | continuations. And I think this is further justified by the fact that | |
266 | I have observed common argument eliminate some `env_X` arguments which | |
267 | would appear to correspond to determining that while the closure being | |
268 | executed isn't constant it is at least the same as the closure being | |
269 | passed elsewhere. | |
270 | ||
271 | At first, I was curious whether or not we had missed a bigger picture | |
272 | with the dominator analysis. When we wrote the contification paper, I | |
273 | assumed that the dominator analysis was a specialized solution to a | |
274 | specialized problem; we never suggested that it was a technique suited | |
275 | to a larger class of analyses. After initially finding a connection | |
276 | between <:Contify:> and <:CommonArg:> (and thinking that the only | |
277 | connection was the technique), I wondered if the dominator technique | |
278 | really was applicable to a larger class of analyses. That is still a | |
279 | question, but after writing up the above, I'm suspecting that the | |
280 | "real story" is that the dominator analysis is a solution to the | |
281 | common argument optimization, and that the <:Contify:> optimization is | |
282 | specializing <:CommonArg:> to the case of continuation arguments (with | |
283 | a different transformation at the end). (Note, a whole-program, | |
284 | inter-procedural common argument analysis doesn't really make sense | |
285 | (in our <:SSA:> <:IntermediateLanguage:>), because the only way of | |
286 | passing values between functions is as arguments. (Unless of course | |
287 | in the case that the common argument is also a constant argument, in | |
288 | which case <:ConstantPropagation:> could lift it to a global.) The | |
289 | inter-procedural <:Contify:> optimization works out because there we | |
290 | move the function to the argument.) | |
291 | ||
292 | Anyways, it's still unclear to me whether or not the dominator based | |
293 | approach solves other kinds of problems. | |
294 | ||
295 | === Phase Ordering === | |
296 | ||
297 | On the downside, the optimization doesn't have a huge impact on | |
298 | runtime, although it does predictably saved some code size. I stuck | |
299 | it in the optimization sequence after <:Flatten:> and (the third round | |
300 | of) <:LocalFlatten:>, since it seems to me that we could have cases | |
301 | where some components of a tuple used as an argument are common, but | |
302 | the whole tuple isn't. I think it makes sense to add it after | |
303 | <:IntroduceLoops:> and <:LoopInvariant:> (even though <:CommonArg:> | |
304 | get some things that <:LoopInvariant:> gets, it doesn't get all of | |
305 | them). I also think that it makes sense to add it before | |
306 | <:CommonSubexp:>, since identifying variables could expose more common | |
307 | subexpressions. I would think a similar thought applies to | |
308 | <:RedundantTests:>. |