Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | <!DOCTYPE html>\r |
2 | <html lang="en">\r | |
3 | <head>\r | |
4 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r | |
5 | <meta name="generator" content="AsciiDoc 8.6.9">\r | |
6 | <title>ShareZeroVec</title>\r | |
7 | <link rel="stylesheet" href="./asciidoc.css" type="text/css">\r | |
8 | <link rel="stylesheet" href="./pygments.css" type="text/css">\r | |
9 | \r | |
10 | \r | |
11 | <script type="text/javascript" src="./asciidoc.js"></script>\r | |
12 | <script type="text/javascript">\r | |
13 | /*<![CDATA[*/\r | |
14 | asciidoc.install();\r | |
15 | /*]]>*/\r | |
16 | </script>\r | |
17 | <link rel="stylesheet" href="./mlton.css" type="text/css">\r | |
18 | </head>\r | |
19 | <body class="article">\r | |
20 | <div id="banner">\r | |
21 | <div id="banner-home">\r | |
22 | <a href="./Home">MLton 20180207</a>\r | |
23 | </div>\r | |
24 | </div>\r | |
25 | <div id="header">\r | |
26 | <h1>ShareZeroVec</h1>\r | |
27 | </div>\r | |
28 | <div id="content">\r | |
29 | <div id="preamble">\r | |
30 | <div class="sectionbody">\r | |
31 | <div class="paragraph"><p><a href="ShareZeroVec">ShareZeroVec</a> is an optimization pass for the <a href="SSA">SSA</a>\r | |
32 | <a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>\r | |
33 | </div>\r | |
34 | </div>\r | |
35 | <div class="sect1">\r | |
36 | <h2 id="_description">Description</h2>\r | |
37 | <div class="sectionbody">\r | |
38 | <div class="paragraph"><p>An SSA optimization to share zero-length vectors.</p></div>\r | |
39 | <div class="paragraph"><p>From <a href="https://github.com/MLton/mlton/commit/be8c5f576"><span class="monospaced">be8c5f576</span></a>, which replaced the use of the\r | |
40 | <span class="monospaced">Array_array0Const</span> primitive in the Basis Library implementation with a\r | |
41 | (nullary) <span class="monospaced">Vector_vector</span> primitive:</p></div>\r | |
42 | <div class="quoteblock">\r | |
43 | <div class="content">\r | |
44 | <div class="paragraph"><p>The original motivation for the <span class="monospaced">Array_array0Const</span> primitive was to share the\r | |
45 | heap space required for zero-length vectors among all vectors (of a given type).\r | |
46 | It was claimed that this optimization is important, e.g., in a self-compile,\r | |
47 | where vectors are used for lots of syntax tree elements and many of those\r | |
48 | vectors are empty. See:\r | |
49 | <a href="http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html">http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html</a></p></div>\r | |
50 | <div class="paragraph"><p>Curiously, the full effect of this optimization has been missing for quite some\r | |
51 | time (perhaps since the port of <a href="ConstantPropagation">ConstantPropagation</a> to the SSA IL). While\r | |
52 | <a href="ConstantPropagation">ConstantPropagation</a> has "globalized" the nullary application of the\r | |
53 | <span class="monospaced">Array_array0Const</span> primitive, it also simultaneously transformed it to an\r | |
54 | application of the <span class="monospaced">Array_uninit</span> (previously, the <span class="monospaced">Array_array</span>) primitive to\r | |
55 | the zero constant. The hash-consing of globals, meant to create exactly one\r | |
56 | global for each distinct constant, treats <span class="monospaced">Array_uninit</span> primitives as unequal\r | |
57 | (appropriately, since <span class="monospaced">Array_uninit</span> allocates an array with identity (though\r | |
58 | the identity may be supressed by a subsequent <span class="monospaced">Array_toVector</span>)), hence each\r | |
59 | distinct <span class="monospaced">Array_array0Const</span> primitive in the program remained as distinct\r | |
60 | globals. The limited amount of inlining prior to <a href="ConstantPropagation">ConstantPropagation</a> meant\r | |
61 | that there were typically fewer than a dozen "copies" of the same empty vector\r | |
62 | in a program for a given type.</p></div>\r | |
63 | <div class="paragraph"><p>As a "functional" primitive, a nullary <span class="monospaced">Vector_vector</span> is globalized by\r | |
64 | ClosureConvert, but is further recognized by ConstantPropagation and hash-consed\r | |
65 | into a unique instance for each type.</p></div>\r | |
66 | </div>\r | |
67 | <div class="attribution">\r | |
68 | </div></div>\r | |
69 | <div class="paragraph"><p>However, a single, shared, global <span class="monospaced">Vector_vector ()</span> inhibits the\r | |
70 | coercion-based optimizations of <span class="monospaced">Useless</span>. For example, consider the\r | |
71 | following program:</p></div>\r | |
72 | <div class="listingblock">\r | |
73 | <div class="content"><div class="highlight"><pre><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valOf</span><span class="w"> </span><span class="p">(</span><span class="n">Int</span><span class="p">.</span><span class="n">fromString</span><span class="w"> </span><span class="p">(</span><span class="n">hd</span><span class="w"> </span><span class="p">(</span><span class="n">CommandLine</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">())))</span><span class="w"></span>\r | |
74 | \r | |
75 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">tabulate</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">=></span><span class="w"></span>\r | |
76 | <span class="w"> </span><span class="k">let</span><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">fromInt</span><span class="w"> </span><span class="n">i</span><span class="w"></span>\r | |
77 | <span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mh">0wx1</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="mh">0wx1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w</span><span class="p">)</span><span class="w"></span>\r | |
78 | <span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"></span>\r | |
79 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx2</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx3</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w3</span><span class="p">))</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>\r | |
80 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">v3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">vector</span><span class="w"> </span><span class="p">(</span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">slice</span><span class="w"> </span><span class="p">(</span><span class="n">v1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">SOME</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span><span class="w"></span>\r | |
81 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((</span><span class="n">w1</span><span class="p">,</span><span class="n">w2</span><span class="p">,</span><span class="n">w3</span><span class="p">),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>\r | |
82 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v2</span><span class="w"></span>\r | |
83 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">ans3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v3</span><span class="w"></span>\r | |
84 | \r | |
85 | <span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="p">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">print</span><span class="w"> </span><span class="p">(</span><span class="n">concat</span><span class="w"> </span><span class="p">[</span><span class="s">"ans1 = "</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans1</span><span class="p">,</span><span class="w"> </span><span class="s">" "</span><span class="p">,</span><span class="w"></span>\r | |
86 | <span class="w"> </span><span class="s">"ans2 = "</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans2</span><span class="p">,</span><span class="w"> </span><span class="s">" "</span><span class="p">,</span><span class="w"></span>\r | |
87 | <span class="w"> </span><span class="s">"ans3 = "</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans3</span><span class="p">,</span><span class="w"> </span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">])</span><span class="w"></span>\r | |
88 | </pre></div></div></div>\r | |
89 | <div class="paragraph"><p>We would like <span class="monospaced">v2</span> and <span class="monospaced">v3</span> to be optimized from\r | |
90 | <span class="monospaced">(word16 * word16 * word16) vector</span> to <span class="monospaced">word16 vector</span> because only\r | |
91 | the 2nd component of the elements is needed to compute the answer.</p></div>\r | |
92 | <div class="paragraph"><p>With <span class="monospaced">Array_array0Const</span>, each distinct occurrence of\r | |
93 | <span class="monospaced">Array_array0Const((word16 * word16 * word16))</span> arising from\r | |
94 | polyvariance and inlining remained a distinct\r | |
95 | <span class="monospaced">Array_uninit((word16 * word16 * word16)) (0x0)</span> global, which\r | |
96 | resulted in distinct occurrences for the\r | |
97 | <span class="monospaced">val v1 = Vector.tabulate ...</span> and for the\r | |
98 | <span class="monospaced">val v2 = Vector.map ...</span>. The latter could be optimized to\r | |
99 | <span class="monospaced">Array_uninit(word16) (0x0)</span> by <span class="monospaced">Useless</span>, because its result only\r | |
100 | flows to places requiring the 2nd component of the elements.</p></div>\r | |
101 | <div class="paragraph"><p>With <span class="monospaced">Vector_vector ()</span>, the distinct occurrences of\r | |
102 | <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> arising from\r | |
103 | polyvariance are globalized during <span class="monospaced">ClosureConvert</span>, those global\r | |
104 | references may be further duplicated by inlining, but the distinct\r | |
105 | occurrences of <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> are\r | |
106 | merged to a single occurrence. Because this result flows to places\r | |
107 | requiring all three components of the elements, it remains\r | |
108 | <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> after\r | |
109 | <span class="monospaced">Useless</span>. Furthermore, because one cannot (in constant time) coerce a\r | |
110 | <span class="monospaced">(word16 * word16 * word16) vector</span> to a <span class="monospaced">word16 vector</span>, the <span class="monospaced">v2</span>\r | |
111 | value remains of type <span class="monospaced">(word16 * word16 * word16) vector</span>.</p></div>\r | |
112 | <div class="paragraph"><p>One option would be to drop the 0-element vector "optimization"\r | |
113 | entirely. This costs some space (no sharing of empty vectors) and\r | |
114 | some time (allocation and garbage collection of empty vectors).</p></div>\r | |
115 | <div class="paragraph"><p>Another option would be to reinstate the <span class="monospaced">Array_array0Const</span> primitive\r | |
116 | and associated <span class="monospaced">ConstantPropagation</span> treatment. But, the semantics\r | |
117 | and purpose of <span class="monospaced">Array_array0Const</span> was poorly understood, resulting in\r | |
118 | this break.</p></div>\r | |
119 | <div class="paragraph"><p>The <a href="ShareZeroVec">ShareZeroVec</a> pass pursues a different approach: perform the 0-element\r | |
120 | vector "optimization" as a separate optimization, after\r | |
121 | <span class="monospaced">ConstantPropagation</span> and <span class="monospaced">Useless</span>. A trivial static analysis is\r | |
122 | used to match <span class="monospaced">val v: t vector = Array_toVector(t) (a)</span> with\r | |
123 | corresponding <span class="monospaced">val a: array = Array_uninit(t) (l)</span> and the later are\r | |
124 | expanded to\r | |
125 | <span class="monospaced">val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)</span>\r | |
126 | with a single global <span class="monospaced">val zeroArr_[t] = Array_uninit(t) (0)</span> created\r | |
127 | for each distinct type (after coercion-based optimizations).</p></div>\r | |
128 | <div class="paragraph"><p>One disadvantage of this approach, compared to the <span class="monospaced">Vector_vector(t) ()</span>\r | |
129 | approach, is that <span class="monospaced">Array_toVector</span> is applied each time a vector\r | |
130 | is created, even if it is being applied to the <span class="monospaced">zeroArr_[t]</span>\r | |
131 | zero-length array. (Although, this was the behavior of the\r | |
132 | <span class="monospaced">Array_array0Const</span> approach.) This updates the object header each\r | |
133 | time, whereas the <span class="monospaced">Vector_vector(t) ()</span> approach would have updated\r | |
134 | the object header once, when the global was created, and the\r | |
135 | <span class="monospaced">zeroVec_[t]</span> global and the <span class="monospaced">Array_toVector</span> result would flow to the\r | |
136 | join point.</p></div>\r | |
137 | <div class="paragraph"><p>It would be possible to properly share zero-length vectors, but doing\r | |
138 | so is a more sophisticated analysis and transformation, because there\r | |
139 | can be arbitrary code between the\r | |
140 | <span class="monospaced">val a: t array = Array_uninit(t) (l)</span> and the corresponding\r | |
141 | <span class="monospaced">val v: v vector = Array_toVector(t) (a)</span>, although, in practice,\r | |
142 | nothing happens when a zero-length vector is created. It may be best\r | |
143 | to pursue a more general "array to vector" optimization that\r | |
144 | transforms creations of static-length vectors (e.g., all the\r | |
145 | <span class="monospaced">Vector.new<N></span> functions) into <span class="monospaced">Vector_vector</span> primitives (some of\r | |
146 | which could be globalized).</p></div>\r | |
147 | </div>\r | |
148 | </div>\r | |
149 | <div class="sect1">\r | |
150 | <h2 id="_implementation">Implementation</h2>\r | |
151 | <div class="sectionbody">\r | |
152 | <div class="ulist"><ul>\r | |
153 | <li>\r | |
154 | <p>\r | |
155 | <a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/share-zero-vec.fun"><span class="monospaced">share-zero-vec.fun</span></a>\r | |
156 | </p>\r | |
157 | </li>\r | |
158 | </ul></div>\r | |
159 | </div>\r | |
160 | </div>\r | |
161 | <div class="sect1">\r | |
162 | <h2 id="_details_and_notes">Details and Notes</h2>\r | |
163 | <div class="sectionbody">\r | |
164 | <div class="paragraph"><p></p></div>\r | |
165 | </div>\r | |
166 | </div>\r | |
167 | </div>\r | |
168 | <div id="footnotes"><hr></div>\r | |
169 | <div id="footer">\r | |
170 | <div id="footer-text">\r | |
171 | </div>\r | |
172 | <div id="footer-badges">\r | |
173 | </div>\r | |
174 | </div>\r | |
175 | </body>\r | |
176 | </html>\r |