Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / localhost / CommonArg
1 <!DOCTYPE html>
2 <html lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta name="generator" content="AsciiDoc 8.6.9">
6 <title>CommonArg</title>
7 <link rel="stylesheet" href="./asciidoc.css" type="text/css">
8 <link rel="stylesheet" href="./pygments.css" type="text/css">
9
10
11 <script type="text/javascript" src="./asciidoc.js"></script>
12 <script type="text/javascript">
13 /*<![CDATA[*/
14 asciidoc.install();
15 /*]]>*/
16 </script>
17 <link rel="stylesheet" href="./mlton.css" type="text/css">
18 </head>
19 <body class="article">
20 <div id="banner">
21 <div id="banner-home">
22 <a href="./Home">MLton 20180207</a>
23 </div>
24 </div>
25 <div id="header">
26 <h1>CommonArg</h1>
27 </div>
28 <div id="content">
29 <div id="preamble">
30 <div class="sectionbody">
31 <div class="paragraph"><p><a href="CommonArg">CommonArg</a> is an optimization pass for the <a href="SSA">SSA</a>
32 <a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>
33 </div>
34 </div>
35 <div class="sect1">
36 <h2 id="_description">Description</h2>
37 <div class="sectionbody">
38 <div class="paragraph"><p>It optimizes instances of <span class="monospaced">Goto</span> transfers that pass the same
39 arguments to the same label; e.g.</p></div>
40 <div class="listingblock">
41 <div class="content monospaced">
42 <pre>L_1 ()
43 ...
44 z1 = ?
45 ...
46 L_3 (x, y, z1)
47 L_2 ()
48 ...
49 z2 = ?
50 ...
51 L_3 (x, y, z2)
52 L_3 (a, b, c)
53 ...</pre>
54 </div></div>
55 <div class="paragraph"><p>This code can be simplified to:</p></div>
56 <div class="listingblock">
57 <div class="content monospaced">
58 <pre>L_1 ()
59 ...
60 z1 = ?
61 ...
62 L_3 (z1)
63 L_2 ()
64 ...
65 z2 = ?
66 ...
67 L_3 (z2)
68 L_3 (c)
69 a = x
70 b = y</pre>
71 </div></div>
72 <div class="paragraph"><p>which saves a number of resources: time of setting up the arguments
73 for the jump to <span class="monospaced">L_3</span>, space (either stack or pseudo-registers) for
74 the arguments of <span class="monospaced">L_3</span>, etc. It may also expose some other
75 optimizations, if more information is known about <span class="monospaced">x</span> or <span class="monospaced">y</span>.</p></div>
76 </div>
77 </div>
78 <div class="sect1">
79 <h2 id="_implementation">Implementation</h2>
80 <div class="sectionbody">
81 <div class="ulist"><ul>
82 <li>
83 <p>
84 <a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/common-arg.fun"><span class="monospaced">common-arg.fun</span></a>
85 </p>
86 </li>
87 </ul></div>
88 </div>
89 </div>
90 <div class="sect1">
91 <h2 id="_details_and_notes">Details and Notes</h2>
92 <div class="sectionbody">
93 <div class="paragraph"><p>Three analyses were originally proposed to drive the optimization
94 transformation. Only the <em>Dominator Analysis</em> is currently
95 implemented. (Implementations of the other analyses are available in
96 the <a href="Sources">repository history</a>.)</p></div>
97 <div class="sect2">
98 <h3 id="_syntactic_analysis">Syntactic Analysis</h3>
99 <div class="paragraph"><p>The simplest analysis I could think of maintains</p></div>
100 <div class="listingblock">
101 <div class="content monospaced">
102 <pre>varInfo: Var.t -&gt; Var.t option list ref</pre>
103 </div></div>
104 <div class="paragraph"><p>initialized to <span class="monospaced">[]</span>.</p></div>
105 <div class="ulist"><ul>
106 <li>
107 <p>
108 For each variable <span class="monospaced">v</span> bound in a <span class="monospaced">Statement.t</span> or in the
109 <span class="monospaced">Function.t</span> args, then <span class="monospaced">List.push(varInfo v, NONE)</span>.
110 </p>
111 </li>
112 <li>
113 <p>
114 For each <span class="monospaced">L (x1, ..., xn)</span> transfer where <span class="monospaced">(a1, ..., an)</span> are the
115 formals of <span class="monospaced">L</span>, then <span class="monospaced">List.push(varInfo ai, SOME xi)</span>.
116 </p>
117 </li>
118 <li>
119 <p>
120 For each block argument a used in an unknown context (e.g.,
121 arguments of blocks used as continuations, handlers, arith success,
122 runtime return, or case switch labels), then
123 <span class="monospaced">List.push(varInfo a, NONE)</span>.
124 </p>
125 </li>
126 </ul></div>
127 <div class="paragraph"><p>Now, any block argument <span class="monospaced">a</span> such that <span class="monospaced">varInfo a = xs</span>, where all of
128 the elements of <span class="monospaced">xs</span> are equal to <span class="monospaced">SOME x</span>, can be optimized by
129 setting <span class="monospaced">a = x</span> at the beginning of the block and dropping the
130 argument from <span class="monospaced">Goto</span> transfers.</p></div>
131 <div class="paragraph"><p>That takes care of the example above. We can clearly do slightly
132 better, by changing the transformation criteria to the following: any
133 block argument a such that <span class="monospaced">varInfo a = xs</span>, where all of the elements
134 of <span class="monospaced">xs</span> are equal to <span class="monospaced">SOME x</span> <em>or</em> are equal to <span class="monospaced">SOME a</span>, can be
135 optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
136 dropping the argument from <span class="monospaced">Goto</span> transfers. This optimizes a case
137 like:</p></div>
138 <div class="listingblock">
139 <div class="content monospaced">
140 <pre>L_1 ()
141 ... z1 = ? ...
142 L_3 (x, y, z1)
143 L_2 ()
144 ... z2 = ? ...
145 L_3(x, y, z2)
146 L_3 (a, b, c)
147 ... w = ? ...
148 case w of
149 true =&gt; L_4 | false =&gt; L_5
150 L_4 ()
151 ...
152 L_3 (a, b, w)
153 L_5 ()
154 ...</pre>
155 </div></div>
156 <div class="paragraph"><p>where a common argument is passed to a loop (and is invariant through
157 the loop). Of course, the <a href="LoopInvariant">LoopInvariant</a> optimization pass would
158 normally introduce a local loop and essentially reduce this to the
159 first example, but I have seen this in practice, which suggests that
160 some optimizations after <a href="LoopInvariant">LoopInvariant</a> do enough simplifications
161 to introduce (new) loop invariant arguments.</p></div>
162 </div>
163 <div class="sect2">
164 <h3 id="_fixpoint_analysis">Fixpoint Analysis</h3>
165 <div class="paragraph"><p>However, the above analysis and transformation doesn&#8217;t cover the cases
166 where eliminating one common argument exposes the opportunity to
167 eliminate other common arguments. For example:</p></div>
168 <div class="listingblock">
169 <div class="content monospaced">
170 <pre>L_1 ()
171 ...
172 L_3 (x)
173 L_2 ()
174 ...
175 L_3 (x)
176 L_3 (a)
177 ...
178 L_5 (a)
179 L_4 ()
180 ...
181 L_5 (x)
182 L_5 (b)
183 ...</pre>
184 </div></div>
185 <div class="paragraph"><p>One pass of analysis and transformation would eliminate the argument
186 to <span class="monospaced">L_3</span> and rewrite the <span class="monospaced">L_5(a)</span> transfer to <span class="monospaced">L_5 (x)</span>, thereby
187 exposing the opportunity to eliminate the common argument to <span class="monospaced">L_5</span>.</p></div>
188 <div class="paragraph"><p>The interdependency the arguments to <span class="monospaced">L_3</span> and <span class="monospaced">L_5</span> suggest
189 performing some sort of fixed-point analysis. This analysis is
190 relatively simple; maintain</p></div>
191 <div class="listingblock">
192 <div class="content monospaced">
193 <pre>varInfo: Var.t -&gt; VarLattice.t</pre>
194 </div></div>
195 <div class="paragraph"><p>where</p></div>
196 <div class="listingblock">
197 <div class="content monospaced">
198 <pre>VarLattice.t ~=~ Bot | Point of Var.t | Top</pre>
199 </div></div>
200 <div class="paragraph"><p>(but is implemented by the <a href="FlatLattice">FlatLattice</a> functor with a <span class="monospaced">lessThan</span>
201 list and <span class="monospaced">value ref</span> under the hood), initialized to <span class="monospaced">Bot</span>.</p></div>
202 <div class="ulist"><ul>
203 <li>
204 <p>
205 For each variable <span class="monospaced">v</span> bound in a <span class="monospaced">Statement.t</span> or in the
206 <span class="monospaced">Function.t</span> args, then <span class="monospaced">VarLattice.&lt;= (Point v, varInfo v)</span>
207 </p>
208 </li>
209 <li>
210 <p>
211 For each <span class="monospaced">L (x1, ..., xn)</span> transfer where <span class="monospaced">(a1, ..., an)</span> are the
212 formals of <span class="monospaced">L</span>}, then <span class="monospaced">VarLattice.&lt;= (varInfo xi, varInfo ai)</span>.
213 </p>
214 </li>
215 <li>
216 <p>
217 For each block argument a used in an unknown context, then
218 <span class="monospaced">VarLattice.&lt;= (Point a, varInfo a)</span>.
219 </p>
220 </li>
221 </ul></div>
222 <div class="paragraph"><p>Now, any block argument a such that <span class="monospaced">varInfo a = Point x</span> can be
223 optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
224 dropping the argument from <span class="monospaced">Goto</span> transfers.</p></div>
225 <div class="paragraph"><p>Now, with the last example, we introduce the ordering constraints:</p></div>
226 <div class="listingblock">
227 <div class="content monospaced">
228 <pre>varInfo x &lt;= varInfo a
229 varInfo a &lt;= varInfo b
230 varInfo x &lt;= varInfo b</pre>
231 </div></div>
232 <div class="paragraph"><p>Assuming that <span class="monospaced">varInfo x = Point x</span>, then we get <span class="monospaced">varInfo a = Point x</span>
233 and <span class="monospaced">varInfo b = Point x</span>, and we optimize the example as desired.</p></div>
234 <div class="paragraph"><p>But, that is a rather weak assumption. It&#8217;s quite possible for
235 <span class="monospaced">varInfo x = Top</span>. For example, consider:</p></div>
236 <div class="listingblock">
237 <div class="content monospaced">
238 <pre>G_1 ()
239 ... n = 1 ...
240 L_0 (n)
241 G_2 ()
242 ... m = 2 ...
243 L_0 (m)
244 L_0 (x)
245 ...
246 L_1 ()
247 ...
248 L_3 (x)
249 L_2 ()
250 ...
251 L_3 (x)
252 L_3 (a)
253 ...
254 L_5(a)
255 L_4 ()
256 ...
257 L_5(x)
258 L_5 (b)
259 ...</pre>
260 </div></div>
261 <div class="paragraph"><p>Now <span class="monospaced">varInfo x = varInfo a = varInfo b = Top</span>. What went wrong here?
262 When <span class="monospaced">varInfo x</span> went to <span class="monospaced">Top</span>, it got propagated all the way through
263 to <span class="monospaced">a</span> and <span class="monospaced">b</span>, and prevented the elimination of any common arguments.
264 What we&#8217;d like to do instead is when <span class="monospaced">varInfo x</span> goes to <span class="monospaced">Top</span>,
265 propagate on <span class="monospaced">Point x</span>&#8201;&#8212;&#8201;we have no hope of eliminating <span class="monospaced">x</span>, but if
266 we hold <span class="monospaced">x</span> constant, then we have a chance of eliminating arguments
267 for which <span class="monospaced">x</span> is passed as an actual.</p></div>
268 </div>
269 <div class="sect2">
270 <h3 id="_dominator_analysis">Dominator Analysis</h3>
271 <div class="paragraph"><p>Does anyone see where this is going yet? Pausing for a little
272 thought, <a href="MatthewFluet">MatthewFluet</a> realized that he had once before tried
273 proposing this kind of "fix" to a fixed-point analysis&#8201;&#8212;&#8201;when we were
274 first investigating the <a href="Contify">Contify</a> optimization in light of John
275 Reppy&#8217;s CWS paper. Of course, that "fix" failed because it defined a
276 non-monotonic function and one couldn&#8217;t take the fixed point. But,
277 <a href="StephenWeeks">StephenWeeks</a> suggested a dominator based approach, and we were
278 able to show that, indeed, the dominator analysis subsumed both the
279 previous call based analysis and the cont based analysis. And, a
280 moment&#8217;s reflection reveals further parallels: when
281 <span class="monospaced">varInfo: Var.t -&gt; Var.t option list ref</span>, we have something analogous
282 to the call analysis, and when <span class="monospaced">varInfo: Var.t -&gt; VarLattice.t</span>, we
283 have something analogous to the cont analysis. Maybe there is
284 something analogous to the dominator approach (and therefore superior
285 to the previous analyses).</p></div>
286 <div class="paragraph"><p>And this turns out to be the case. Construct the graph <span class="monospaced">G</span> as follows:</p></div>
287 <div class="listingblock">
288 <div class="content monospaced">
289 <pre>nodes(G) = {Root} U Var.t
290 edges(G) = {Root -&gt; v | v bound in a Statement.t or
291 in the Function.t args} U
292 {xi -&gt; ai | L(x1, ..., xn) transfer where (a1, ..., an)
293 are the formals of L} U
294 {Root -&gt; a | a is a block argument used in an unknown context}</pre>
295 </div></div>
296 <div class="paragraph"><p>Let <span class="monospaced">idom(x)</span> be the immediate dominator of <span class="monospaced">x</span> in <span class="monospaced">G</span> with root
297 <span class="monospaced">Root</span>. Now, any block argument a such that <span class="monospaced">idom(a) = x &lt;&gt; Root</span> can
298 be optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
299 dropping the argument from <span class="monospaced">Goto</span> transfers.</p></div>
300 <div class="paragraph"><p>Furthermore, experimental evidence suggests (and we are confident that
301 a formal presentation could prove) that the dominator analysis
302 subsumes the "syntactic" and "fixpoint" based analyses in this context
303 as well and that the dominator analysis gets "everything" in one go.</p></div>
304 </div>
305 <div class="sect2">
306 <h3 id="_final_thoughts">Final Thoughts</h3>
307 <div class="paragraph"><p>I must admit, I was rather surprised at this progression and final
308 result. At the outset, I never would have thought of a connection
309 between <a href="Contify">Contify</a> and <a href="CommonArg">CommonArg</a> optimizations. They would seem
310 to be two completely different optimizations. Although, this may not
311 really be the case. As one of the reviewers of the ICFP paper said:</p></div>
312 <div class="quoteblock">
313 <div class="content">
314 <div class="paragraph"><p>I understand that such a form of CPS might be convenient in some
315 cases, but when we&#8217;re talking about analyzing code to detect that some
316 continuation is constant, I think it makes a lot more sense to make
317 all the continuation arguments completely explicit.</p></div>
318 <div class="paragraph"><p>I believe that making all the continuation arguments explicit will
319 show that the optimization can be generalized to eliminating constant
320 arguments, whether continuations or not.</p></div>
321 </div>
322 <div class="attribution">
323 </div></div>
324 <div class="paragraph"><p>What I think the common argument optimization shows is that the
325 dominator analysis does slightly better than the reviewer puts it: we
326 find more than just constant continuations, we find common
327 continuations. And I think this is further justified by the fact that
328 I have observed common argument eliminate some <span class="monospaced">env_X</span> arguments which
329 would appear to correspond to determining that while the closure being
330 executed isn&#8217;t constant it is at least the same as the closure being
331 passed elsewhere.</p></div>
332 <div class="paragraph"><p>At first, I was curious whether or not we had missed a bigger picture
333 with the dominator analysis. When we wrote the contification paper, I
334 assumed that the dominator analysis was a specialized solution to a
335 specialized problem; we never suggested that it was a technique suited
336 to a larger class of analyses. After initially finding a connection
337 between <a href="Contify">Contify</a> and <a href="CommonArg">CommonArg</a> (and thinking that the only
338 connection was the technique), I wondered if the dominator technique
339 really was applicable to a larger class of analyses. That is still a
340 question, but after writing up the above, I&#8217;m suspecting that the
341 "real story" is that the dominator analysis is a solution to the
342 common argument optimization, and that the <a href="Contify">Contify</a> optimization is
343 specializing <a href="CommonArg">CommonArg</a> to the case of continuation arguments (with
344 a different transformation at the end). (Note, a whole-program,
345 inter-procedural common argument analysis doesn&#8217;t really make sense
346 (in our <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>), because the only way of
347 passing values between functions is as arguments. (Unless of course
348 in the case that the common argument is also a constant argument, in
349 which case <a href="ConstantPropagation">ConstantPropagation</a> could lift it to a global.) The
350 inter-procedural <a href="Contify">Contify</a> optimization works out because there we
351 move the function to the argument.)</p></div>
352 <div class="paragraph"><p>Anyways, it&#8217;s still unclear to me whether or not the dominator based
353 approach solves other kinds of problems.</p></div>
354 </div>
355 <div class="sect2">
356 <h3 id="_phase_ordering">Phase Ordering</h3>
357 <div class="paragraph"><p>On the downside, the optimization doesn&#8217;t have a huge impact on
358 runtime, although it does predictably saved some code size. I stuck
359 it in the optimization sequence after <a href="Flatten">Flatten</a> and (the third round
360 of) <a href="LocalFlatten">LocalFlatten</a>, since it seems to me that we could have cases
361 where some components of a tuple used as an argument are common, but
362 the whole tuple isn&#8217;t. I think it makes sense to add it after
363 <a href="IntroduceLoops">IntroduceLoops</a> and <a href="LoopInvariant">LoopInvariant</a> (even though <a href="CommonArg">CommonArg</a>
364 get some things that <a href="LoopInvariant">LoopInvariant</a> gets, it doesn&#8217;t get all of
365 them). I also think that it makes sense to add it before
366 <a href="CommonSubexp">CommonSubexp</a>, since identifying variables could expose more common
367 subexpressions. I would think a similar thought applies to
368 <a href="RedundantTests">RedundantTests</a>.</p></div>
369 </div>
370 </div>
371 </div>
372 </div>
373 <div id="footnotes"><hr></div>
374 <div id="footer">
375 <div id="footer-text">
376 </div>
377 <div id="footer-badges">
378 </div>
379 </div>
380 </body>
381 </html>