Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / localhost / ShareZeroVec
CommitLineData
7f918cf1
CE
1<!DOCTYPE html>\r
2<html lang="en">\r
3<head>\r
4<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r
5<meta name="generator" content="AsciiDoc 8.6.9">\r
6<title>ShareZeroVec</title>\r
7<link rel="stylesheet" href="./asciidoc.css" type="text/css">\r
8<link rel="stylesheet" href="./pygments.css" type="text/css">\r
9\r
10\r
11<script type="text/javascript" src="./asciidoc.js"></script>\r
12<script type="text/javascript">\r
13/*<![CDATA[*/\r
14asciidoc.install();\r
15/*]]>*/\r
16</script>\r
17<link rel="stylesheet" href="./mlton.css" type="text/css">\r
18</head>\r
19<body class="article">\r
20<div id="banner">\r
21<div id="banner-home">\r
22<a href="./Home">MLton 20180207</a>\r
23</div>\r
24</div>\r
25<div id="header">\r
26<h1>ShareZeroVec</h1>\r
27</div>\r
28<div id="content">\r
29<div id="preamble">\r
30<div class="sectionbody">\r
31<div class="paragraph"><p><a href="ShareZeroVec">ShareZeroVec</a> is an optimization pass for the <a href="SSA">SSA</a>\r
32<a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>\r
33</div>\r
34</div>\r
35<div class="sect1">\r
36<h2 id="_description">Description</h2>\r
37<div class="sectionbody">\r
38<div class="paragraph"><p>An SSA optimization to share zero-length vectors.</p></div>\r
39<div class="paragraph"><p>From <a href="https://github.com/MLton/mlton/commit/be8c5f576"><span class="monospaced">be8c5f576</span></a>, which replaced the use of the\r
40<span class="monospaced">Array_array0Const</span> primitive in the Basis Library implementation with a\r
41(nullary) <span class="monospaced">Vector_vector</span> primitive:</p></div>\r
42<div class="quoteblock">\r
43<div class="content">\r
44<div class="paragraph"><p>The original motivation for the <span class="monospaced">Array_array0Const</span> primitive was to share the\r
45heap space required for zero-length vectors among all vectors (of a given type).\r
46It was claimed that this optimization is important, e.g., in a self-compile,\r
47where vectors are used for lots of syntax tree elements and many of those\r
48vectors are empty. See:\r
49<a href="http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html">http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html</a></p></div>\r
50<div class="paragraph"><p>Curiously, the full effect of this optimization has been missing for quite some\r
51time (perhaps since the port of <a href="ConstantPropagation">ConstantPropagation</a> to the SSA IL). While\r
52<a href="ConstantPropagation">ConstantPropagation</a> has "globalized" the nullary application of the\r
53<span class="monospaced">Array_array0Const</span> primitive, it also simultaneously transformed it to an\r
54application of the <span class="monospaced">Array_uninit</span> (previously, the <span class="monospaced">Array_array</span>) primitive to\r
55the zero constant. The hash-consing of globals, meant to create exactly one\r
56global for each distinct constant, treats <span class="monospaced">Array_uninit</span> primitives as unequal\r
57(appropriately, since <span class="monospaced">Array_uninit</span> allocates an array with identity (though\r
58the identity may be supressed by a subsequent <span class="monospaced">Array_toVector</span>)), hence each\r
59distinct <span class="monospaced">Array_array0Const</span> primitive in the program remained as distinct\r
60globals. The limited amount of inlining prior to <a href="ConstantPropagation">ConstantPropagation</a> meant\r
61that there were typically fewer than a dozen "copies" of the same empty vector\r
62in a program for a given type.</p></div>\r
63<div class="paragraph"><p>As a "functional" primitive, a nullary <span class="monospaced">Vector_vector</span> is globalized by\r
64ClosureConvert, but is further recognized by ConstantPropagation and hash-consed\r
65into a unique instance for each type.</p></div>\r
66</div>\r
67<div class="attribution">\r
68</div></div>\r
69<div class="paragraph"><p>However, a single, shared, global <span class="monospaced">Vector_vector ()</span> inhibits the\r
70coercion-based optimizations of <span class="monospaced">Useless</span>. For example, consider the\r
71following program:</p></div>\r
72<div class="listingblock">\r
73<div class="content"><div class="highlight"><pre><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valOf</span><span class="w"> </span><span class="p">(</span><span class="n">Int</span><span class="p">.</span><span class="n">fromString</span><span class="w"> </span><span class="p">(</span><span class="n">hd</span><span class="w"> </span><span class="p">(</span><span class="n">CommandLine</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">())))</span><span class="w"></span>\r
74\r
75<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">tabulate</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"></span>\r
76<span class="w"> </span><span class="k">let</span><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">fromInt</span><span class="w"> </span><span class="n">i</span><span class="w"></span>\r
77<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mh">0wx1</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="mh">0wx1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w</span><span class="p">)</span><span class="w"></span>\r
78<span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"></span>\r
79<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx2</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx3</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w3</span><span class="p">))</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>\r
80<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">vector</span><span class="w"> </span><span class="p">(</span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">slice</span><span class="w"> </span><span class="p">(</span><span class="n">v1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">SOME</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span><span class="w"></span>\r
81<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((</span><span class="n">w1</span><span class="p">,</span><span class="n">w2</span><span class="p">,</span><span class="n">w3</span><span class="p">),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>\r
82<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v2</span><span class="w"></span>\r
83<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v3</span><span class="w"></span>\r
84\r
85<span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="p">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">print</span><span class="w"> </span><span class="p">(</span><span class="n">concat</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;ans1 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"></span>\r
86<span class="w"> </span><span class="s">&quot;ans2 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans2</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"></span>\r
87<span class="w"> </span><span class="s">&quot;ans3 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans3</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">])</span><span class="w"></span>\r
88</pre></div></div></div>\r
89<div class="paragraph"><p>We would like <span class="monospaced">v2</span> and <span class="monospaced">v3</span> to be optimized from\r
90<span class="monospaced">(word16 * word16 * word16) vector</span> to <span class="monospaced">word16 vector</span> because only\r
91the 2nd component of the elements is needed to compute the answer.</p></div>\r
92<div class="paragraph"><p>With <span class="monospaced">Array_array0Const</span>, each distinct occurrence of\r
93<span class="monospaced">Array_array0Const((word16 * word16 * word16))</span> arising from\r
94polyvariance and inlining remained a distinct\r
95<span class="monospaced">Array_uninit((word16 * word16 * word16)) (0x0)</span> global, which\r
96resulted in distinct occurrences for the\r
97<span class="monospaced">val v1 = Vector.tabulate ...</span> and for the\r
98<span class="monospaced">val v2 = Vector.map ...</span>. The latter could be optimized to\r
99<span class="monospaced">Array_uninit(word16) (0x0)</span> by <span class="monospaced">Useless</span>, because its result only\r
100flows to places requiring the 2nd component of the elements.</p></div>\r
101<div class="paragraph"><p>With <span class="monospaced">Vector_vector ()</span>, the distinct occurrences of\r
102<span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> arising from\r
103polyvariance are globalized during <span class="monospaced">ClosureConvert</span>, those global\r
104references may be further duplicated by inlining, but the distinct\r
105occurrences of <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> are\r
106merged to a single occurrence. Because this result flows to places\r
107requiring all three components of the elements, it remains\r
108<span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> after\r
109<span class="monospaced">Useless</span>. Furthermore, because one cannot (in constant time) coerce a\r
110<span class="monospaced">(word16 * word16 * word16) vector</span> to a <span class="monospaced">word16 vector</span>, the <span class="monospaced">v2</span>\r
111value remains of type <span class="monospaced">(word16 * word16 * word16) vector</span>.</p></div>\r
112<div class="paragraph"><p>One option would be to drop the 0-element vector "optimization"\r
113entirely. This costs some space (no sharing of empty vectors) and\r
114some time (allocation and garbage collection of empty vectors).</p></div>\r
115<div class="paragraph"><p>Another option would be to reinstate the <span class="monospaced">Array_array0Const</span> primitive\r
116and associated <span class="monospaced">ConstantPropagation</span> treatment. But, the semantics\r
117and purpose of <span class="monospaced">Array_array0Const</span> was poorly understood, resulting in\r
118this break.</p></div>\r
119<div class="paragraph"><p>The <a href="ShareZeroVec">ShareZeroVec</a> pass pursues a different approach: perform the 0-element\r
120vector "optimization" as a separate optimization, after\r
121<span class="monospaced">ConstantPropagation</span> and <span class="monospaced">Useless</span>. A trivial static analysis is\r
122used to match <span class="monospaced">val v: t vector = Array_toVector(t) (a)</span> with\r
123corresponding <span class="monospaced">val a: array = Array_uninit(t) (l)</span> and the later are\r
124expanded to\r
125<span class="monospaced">val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)</span>\r
126with a single global <span class="monospaced">val zeroArr_[t] = Array_uninit(t) (0)</span> created\r
127for each distinct type (after coercion-based optimizations).</p></div>\r
128<div class="paragraph"><p>One disadvantage of this approach, compared to the <span class="monospaced">Vector_vector(t) ()</span>\r
129approach, is that <span class="monospaced">Array_toVector</span> is applied each time a vector\r
130is created, even if it is being applied to the <span class="monospaced">zeroArr_[t]</span>\r
131zero-length array. (Although, this was the behavior of the\r
132<span class="monospaced">Array_array0Const</span> approach.) This updates the object header each\r
133time, whereas the <span class="monospaced">Vector_vector(t) ()</span> approach would have updated\r
134the object header once, when the global was created, and the\r
135<span class="monospaced">zeroVec_[t]</span> global and the <span class="monospaced">Array_toVector</span> result would flow to the\r
136join point.</p></div>\r
137<div class="paragraph"><p>It would be possible to properly share zero-length vectors, but doing\r
138so is a more sophisticated analysis and transformation, because there\r
139can be arbitrary code between the\r
140<span class="monospaced">val a: t array = Array_uninit(t) (l)</span> and the corresponding\r
141<span class="monospaced">val v: v vector = Array_toVector(t) (a)</span>, although, in practice,\r
142nothing happens when a zero-length vector is created. It may be best\r
143to pursue a more general "array to vector" optimization that\r
144transforms creations of static-length vectors (e.g., all the\r
145<span class="monospaced">Vector.new&lt;N&gt;</span> functions) into <span class="monospaced">Vector_vector</span> primitives (some of\r
146which could be globalized).</p></div>\r
147</div>\r
148</div>\r
149<div class="sect1">\r
150<h2 id="_implementation">Implementation</h2>\r
151<div class="sectionbody">\r
152<div class="ulist"><ul>\r
153<li>\r
154<p>\r
155<a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/share-zero-vec.fun"><span class="monospaced">share-zero-vec.fun</span></a>\r
156</p>\r
157</li>\r
158</ul></div>\r
159</div>\r
160</div>\r
161<div class="sect1">\r
162<h2 id="_details_and_notes">Details and Notes</h2>\r
163<div class="sectionbody">\r
164<div class="paragraph"><p></p></div>\r
165</div>\r
166</div>\r
167</div>\r
168<div id="footnotes"><hr></div>\r
169<div id="footer">\r
170<div id="footer-text">\r
171</div>\r
172<div id="footer-badges">\r
173</div>\r
174</div>\r
175</body>\r
176</html>\r