Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / localhost / ShareZeroVec
1 <!DOCTYPE html>
2 <html lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta name="generator" content="AsciiDoc 8.6.9">
6 <title>ShareZeroVec</title>
7 <link rel="stylesheet" href="./asciidoc.css" type="text/css">
8 <link rel="stylesheet" href="./pygments.css" type="text/css">
9
10
11 <script type="text/javascript" src="./asciidoc.js"></script>
12 <script type="text/javascript">
13 /*<![CDATA[*/
14 asciidoc.install();
15 /*]]>*/
16 </script>
17 <link rel="stylesheet" href="./mlton.css" type="text/css">
18 </head>
19 <body class="article">
20 <div id="banner">
21 <div id="banner-home">
22 <a href="./Home">MLton 20180207</a>
23 </div>
24 </div>
25 <div id="header">
26 <h1>ShareZeroVec</h1>
27 </div>
28 <div id="content">
29 <div id="preamble">
30 <div class="sectionbody">
31 <div class="paragraph"><p><a href="ShareZeroVec">ShareZeroVec</a> is an optimization pass for the <a href="SSA">SSA</a>
32 <a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>
33 </div>
34 </div>
35 <div class="sect1">
36 <h2 id="_description">Description</h2>
37 <div class="sectionbody">
38 <div class="paragraph"><p>An SSA optimization to share zero-length vectors.</p></div>
39 <div class="paragraph"><p>From <a href="https://github.com/MLton/mlton/commit/be8c5f576"><span class="monospaced">be8c5f576</span></a>, which replaced the use of the
40 <span class="monospaced">Array_array0Const</span> primitive in the Basis Library implementation with a
41 (nullary) <span class="monospaced">Vector_vector</span> primitive:</p></div>
42 <div class="quoteblock">
43 <div class="content">
44 <div class="paragraph"><p>The original motivation for the <span class="monospaced">Array_array0Const</span> primitive was to share the
45 heap space required for zero-length vectors among all vectors (of a given type).
46 It was claimed that this optimization is important, e.g., in a self-compile,
47 where vectors are used for lots of syntax tree elements and many of those
48 vectors are empty. See:
49 <a href="http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html">http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html</a></p></div>
50 <div class="paragraph"><p>Curiously, the full effect of this optimization has been missing for quite some
51 time (perhaps since the port of <a href="ConstantPropagation">ConstantPropagation</a> to the SSA IL). While
52 <a href="ConstantPropagation">ConstantPropagation</a> has "globalized" the nullary application of the
53 <span class="monospaced">Array_array0Const</span> primitive, it also simultaneously transformed it to an
54 application of the <span class="monospaced">Array_uninit</span> (previously, the <span class="monospaced">Array_array</span>) primitive to
55 the zero constant. The hash-consing of globals, meant to create exactly one
56 global for each distinct constant, treats <span class="monospaced">Array_uninit</span> primitives as unequal
57 (appropriately, since <span class="monospaced">Array_uninit</span> allocates an array with identity (though
58 the identity may be supressed by a subsequent <span class="monospaced">Array_toVector</span>)), hence each
59 distinct <span class="monospaced">Array_array0Const</span> primitive in the program remained as distinct
60 globals. The limited amount of inlining prior to <a href="ConstantPropagation">ConstantPropagation</a> meant
61 that there were typically fewer than a dozen "copies" of the same empty vector
62 in a program for a given type.</p></div>
63 <div class="paragraph"><p>As a "functional" primitive, a nullary <span class="monospaced">Vector_vector</span> is globalized by
64 ClosureConvert, but is further recognized by ConstantPropagation and hash-consed
65 into a unique instance for each type.</p></div>
66 </div>
67 <div class="attribution">
68 </div></div>
69 <div class="paragraph"><p>However, a single, shared, global <span class="monospaced">Vector_vector ()</span> inhibits the
70 coercion-based optimizations of <span class="monospaced">Useless</span>. For example, consider the
71 following program:</p></div>
72 <div class="listingblock">
73 <div class="content"><div class="highlight"><pre><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valOf</span><span class="w"> </span><span class="p">(</span><span class="n">Int</span><span class="p">.</span><span class="n">fromString</span><span class="w"> </span><span class="p">(</span><span class="n">hd</span><span class="w"> </span><span class="p">(</span><span class="n">CommandLine</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">())))</span><span class="w"></span>
74
75 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">tabulate</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"></span>
76 <span class="w"> </span><span class="k">let</span><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">fromInt</span><span class="w"> </span><span class="n">i</span><span class="w"></span>
77 <span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mh">0wx1</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="mh">0wx1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w</span><span class="p">)</span><span class="w"></span>
78 <span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"></span>
79 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx2</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx3</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w3</span><span class="p">))</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>
80 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">vector</span><span class="w"> </span><span class="p">(</span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">slice</span><span class="w"> </span><span class="p">(</span><span class="n">v1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">SOME</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span><span class="w"></span>
81 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((</span><span class="n">w1</span><span class="p">,</span><span class="n">w2</span><span class="p">,</span><span class="n">w3</span><span class="p">),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>
82 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v2</span><span class="w"></span>
83 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v3</span><span class="w"></span>
84
85 <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="p">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">print</span><span class="w"> </span><span class="p">(</span><span class="n">concat</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;ans1 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"></span>
86 <span class="w"> </span><span class="s">&quot;ans2 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans2</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"></span>
87 <span class="w"> </span><span class="s">&quot;ans3 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans3</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">])</span><span class="w"></span>
88 </pre></div></div></div>
89 <div class="paragraph"><p>We would like <span class="monospaced">v2</span> and <span class="monospaced">v3</span> to be optimized from
90 <span class="monospaced">(word16 * word16 * word16) vector</span> to <span class="monospaced">word16 vector</span> because only
91 the 2nd component of the elements is needed to compute the answer.</p></div>
92 <div class="paragraph"><p>With <span class="monospaced">Array_array0Const</span>, each distinct occurrence of
93 <span class="monospaced">Array_array0Const((word16 * word16 * word16))</span> arising from
94 polyvariance and inlining remained a distinct
95 <span class="monospaced">Array_uninit((word16 * word16 * word16)) (0x0)</span> global, which
96 resulted in distinct occurrences for the
97 <span class="monospaced">val v1 = Vector.tabulate ...</span> and for the
98 <span class="monospaced">val v2 = Vector.map ...</span>. The latter could be optimized to
99 <span class="monospaced">Array_uninit(word16) (0x0)</span> by <span class="monospaced">Useless</span>, because its result only
100 flows to places requiring the 2nd component of the elements.</p></div>
101 <div class="paragraph"><p>With <span class="monospaced">Vector_vector ()</span>, the distinct occurrences of
102 <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> arising from
103 polyvariance are globalized during <span class="monospaced">ClosureConvert</span>, those global
104 references may be further duplicated by inlining, but the distinct
105 occurrences of <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> are
106 merged to a single occurrence. Because this result flows to places
107 requiring all three components of the elements, it remains
108 <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> after
109 <span class="monospaced">Useless</span>. Furthermore, because one cannot (in constant time) coerce a
110 <span class="monospaced">(word16 * word16 * word16) vector</span> to a <span class="monospaced">word16 vector</span>, the <span class="monospaced">v2</span>
111 value remains of type <span class="monospaced">(word16 * word16 * word16) vector</span>.</p></div>
112 <div class="paragraph"><p>One option would be to drop the 0-element vector "optimization"
113 entirely. This costs some space (no sharing of empty vectors) and
114 some time (allocation and garbage collection of empty vectors).</p></div>
115 <div class="paragraph"><p>Another option would be to reinstate the <span class="monospaced">Array_array0Const</span> primitive
116 and associated <span class="monospaced">ConstantPropagation</span> treatment. But, the semantics
117 and purpose of <span class="monospaced">Array_array0Const</span> was poorly understood, resulting in
118 this break.</p></div>
119 <div class="paragraph"><p>The <a href="ShareZeroVec">ShareZeroVec</a> pass pursues a different approach: perform the 0-element
120 vector "optimization" as a separate optimization, after
121 <span class="monospaced">ConstantPropagation</span> and <span class="monospaced">Useless</span>. A trivial static analysis is
122 used to match <span class="monospaced">val v: t vector = Array_toVector(t) (a)</span> with
123 corresponding <span class="monospaced">val a: array = Array_uninit(t) (l)</span> and the later are
124 expanded to
125 <span class="monospaced">val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)</span>
126 with a single global <span class="monospaced">val zeroArr_[t] = Array_uninit(t) (0)</span> created
127 for each distinct type (after coercion-based optimizations).</p></div>
128 <div class="paragraph"><p>One disadvantage of this approach, compared to the <span class="monospaced">Vector_vector(t) ()</span>
129 approach, is that <span class="monospaced">Array_toVector</span> is applied each time a vector
130 is created, even if it is being applied to the <span class="monospaced">zeroArr_[t]</span>
131 zero-length array. (Although, this was the behavior of the
132 <span class="monospaced">Array_array0Const</span> approach.) This updates the object header each
133 time, whereas the <span class="monospaced">Vector_vector(t) ()</span> approach would have updated
134 the object header once, when the global was created, and the
135 <span class="monospaced">zeroVec_[t]</span> global and the <span class="monospaced">Array_toVector</span> result would flow to the
136 join point.</p></div>
137 <div class="paragraph"><p>It would be possible to properly share zero-length vectors, but doing
138 so is a more sophisticated analysis and transformation, because there
139 can be arbitrary code between the
140 <span class="monospaced">val a: t array = Array_uninit(t) (l)</span> and the corresponding
141 <span class="monospaced">val v: v vector = Array_toVector(t) (a)</span>, although, in practice,
142 nothing happens when a zero-length vector is created. It may be best
143 to pursue a more general "array to vector" optimization that
144 transforms creations of static-length vectors (e.g., all the
145 <span class="monospaced">Vector.new&lt;N&gt;</span> functions) into <span class="monospaced">Vector_vector</span> primitives (some of
146 which could be globalized).</p></div>
147 </div>
148 </div>
149 <div class="sect1">
150 <h2 id="_implementation">Implementation</h2>
151 <div class="sectionbody">
152 <div class="ulist"><ul>
153 <li>
154 <p>
155 <a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/share-zero-vec.fun"><span class="monospaced">share-zero-vec.fun</span></a>
156 </p>
157 </li>
158 </ul></div>
159 </div>
160 </div>
161 <div class="sect1">
162 <h2 id="_details_and_notes">Details and Notes</h2>
163 <div class="sectionbody">
164 <div class="paragraph"><p></p></div>
165 </div>
166 </div>
167 </div>
168 <div id="footnotes"><hr></div>
169 <div id="footer">
170 <div id="footer-text">
171 </div>
172 <div id="footer-badges">
173 </div>
174 </div>
175 </body>
176 </html>