Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / localhost / GoogleSummerOfCode2015
1 <!DOCTYPE html>
2 <html lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta name="generator" content="AsciiDoc 8.6.9">
6 <title>Google Summer of Code (2015)</title>
7 <link rel="stylesheet" href="./asciidoc.css" type="text/css">
8 <link rel="stylesheet" href="./pygments.css" type="text/css">
9
10
11 <script type="text/javascript" src="./asciidoc.js"></script>
12 <script type="text/javascript">
13 /*<![CDATA[*/
14 asciidoc.install(2);
15 /*]]>*/
16 </script>
17 <link rel="stylesheet" href="./mlton.css" type="text/css">
18 </head>
19 <body class="article">
20 <div id="banner">
21 <div id="banner-home">
22 <a href="./Home">MLton 20180207</a>
23 </div>
24 </div>
25 <div id="header">
26 <h1>Google Summer of Code (2015)</h1>
27 <div id="toc">
28 <div id="toctitle">Table of Contents</div>
29 <noscript><p><b>JavaScript must be enabled in your browser to display the table of contents.</b></p></noscript>
30 </div>
31 </div>
32 <div id="content">
33 <div class="sect1">
34 <h2 id="_mentors">Mentors</h2>
35 <div class="sectionbody">
36 <div class="paragraph"><p>The following developers have agreed to serve as mentors for the 2015 Google Summer of Code:</p></div>
37 <div class="ulist"><ul>
38 <li>
39 <p>
40 <a href="http://www.cs.rit.edu/%7Emtf">Matthew Fluet</a>
41 </p>
42 </li>
43 <li>
44 <p>
45 <a href="http://www.cse.buffalo.edu/%7Elziarek/">Lukasz (Luke) Ziarek</a>
46 </p>
47 </li>
48 </ul></div>
49 </div>
50 </div>
51 <div class="sect1">
52 <h2 id="_ideas_list">Ideas List</h2>
53 <div class="sectionbody">
54 <div class="sect2">
55 <h3 id="_design_and_implement_a_heap_profiler">Design and Implement a Heap Profiler</h3>
56 <div class="paragraph"><p>A heap profile is a description of the space usage of a program. A
57 heap profile is concerned with the allocation, retention, and
58 deallocation (via garbage collection) of heap data during the
59 execution of a program. A heap profile can be used to diagnose
60 performance problems in a functional program that arise from space
61 leaks. This project aims to design and implement a heap profiler for
62 MLton compiled programs.</p></div>
63 <div class="paragraph"><p>Background:</p></div>
64 <div class="openblock">
65 <div class="content">
66 <div class="ulist"><ul>
67 <li>
68 <p>
69 <a href="http://portal.acm.org/citation.cfm?doid=583854.582451">GCspy: an adaptable heap visualisation framework</a>; Tony Printezis and Richard Jones
70 </p>
71 </li>
72 <li>
73 <p>
74 <a href="http://journals.cambridge.org/action/displayAbstract?aid=1349892">New dimensions in heap profiling</a>; Colin Runciman and Niklas R&ouml;jemo
75 </p>
76 </li>
77 <li>
78 <p>
79 <a href="http://www.springerlink.com/content/710501660722gw37/">Heap profiling for space efficiency</a>; Colin Runciman and Niklas R&ouml;jemo
80 </p>
81 </li>
82 <li>
83 <p>
84 <a href="http://journals.cambridge.org/action/displayAbstract?aid=1323096">Heap profiling of lazy functional programs</a>; Colin Runciman and David Wakeling
85 </p>
86 </li>
87 </ul></div>
88 </div></div>
89 <div class="paragraph"><p>Recommended Skills: C and SML programming experience; some experience with UI and visualization</p></div>
90 </div>
91 <div class="sect2">
92 <h3 id="_garbage_collector_improvements">Garbage Collector Improvements</h3>
93 <div class="paragraph"><p>The garbage collector plays a significant role in the performance of
94 functional languages. Garbage collect too often, and program
95 performance suffers due to the excessive time spent in the garbage
96 collector. Garbage collect not often enough, and program performance
97 suffers due to the excessive space used by the uncollected
98 garbage. One particular issue is ensuring that a program utilizing a
99 garbage collector "plays nice" with other processes on the system, by
100 not using too much or too little physical memory. While there are some
101 reasonable theoretical results about garbage collections with heaps of
102 fixed size, there seems to be insufficient work that really looks
103 carefully at the question of dynamically resizing the heap in response
104 to the live data demands of the application and, similarly, in
105 response to the behavior of the operating system and other
106 processes. This project aims to investigate improvements to the memory
107 behavior of MLton compiled programs through better tuning of the
108 garbage collector.</p></div>
109 <div class="paragraph"><p>Background:</p></div>
110 <div class="openblock">
111 <div class="content">
112 <div class="ulist"><ul>
113 <li>
114 <p>
115 <a href="http://gchandbook.org/">The Garbage Collection Handbook: The Art of Automatic Memory Management</a>; Richard Jones, Antony Hosking, Eliot Moss
116 </p>
117 </li>
118 <li>
119 <p>
120 <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1020">Dual-Mode Garbage Collection</a>; Patrick Sansom
121 </p>
122 </li>
123 <li>
124 <p>
125 <a href="http://portal.acm.org/citation.cfm?doid=1029873.1029881">Automatic Heap Sizing: Taking Real Memory into Account</a>; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
126 </p>
127 </li>
128 <li>
129 <p>
130 <a href="http://portal.acm.org/citation.cfm?doid=1152649.1152652">Controlling Garbage Collection and Heap Growth to Reduce the Execution Time of Java Applications</a>; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
131 </p>
132 </li>
133 <li>
134 <p>
135 <a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125">Isla Vista Heap Sizing: Using Feedback to Avoid Paging</a>; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
136 </p>
137 </li>
138 <li>
139 <p>
140 <a href="http://portal.acm.org/citation.cfm?doid=1806651.1806669">The Economics of Garbage Collection</a>; Jeremy Singer, Richard E. Jones, Gavin Brown, and Mikel Luján
141 </p>
142 </li>
143 <li>
144 <p>
145 <a href="http://www.dcs.gla.ac.uk/%7Ejsinger/pdfs/tfp12.pdf">Automated Heap Sizing in the Poly/ML Runtime (Position Paper)</a>; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
146 </p>
147 </li>
148 <li>
149 <p>
150 <a href="http://portal.acm.org/citation.cfm?doid=2555670.2466481">Control Theory for Principled Heap Sizing</a>; David R. White, Jeremy Singer, Jonathan M. Aitken, and Richard E. Jones
151 </p>
152 </li>
153 </ul></div>
154 </div></div>
155 <div class="paragraph"><p>Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience</p></div>
156 </div>
157 <div class="sect2">
158 <h3 id="_heap_allocated_activation_records">Heap-allocated Activation Records</h3>
159 <div class="paragraph"><p>Activation records (a.k.a., stack frames) are traditionally allocated
160 on a stack. This naturally corresponds to the call-return pattern of
161 function invocation. However, there are some disadvantages to
162 stack-allocated activation records. In a functional programming
163 language, functions may be deeply recursive, resulting in call stacks
164 that are much larger than typically supported by the operating system;
165 hence, a functional programming language implementation will typically
166 store its stack in its heap. Furthermore, a functional programming
167 language implementation must handle and recover from stack overflow,
168 by allocating a larger stack (again, in its heap) and copying
169 activation records from the old stack to the new stack. In the
170 presence of threads, stacks must be allocated in a heap and, in the
171 presence of a garbage collector, should be garbage collected when
172 unreachable. While heap-allocated activation records avoid many of
173 these disadvantages, they have not been widely implemented. This
174 project aims to implement and evaluate heap-allocated activation
175 records in the MLton compiler.</p></div>
176 <div class="paragraph"><p>Background:</p></div>
177 <div class="openblock">
178 <div class="content">
179 <div class="ulist"><ul>
180 <li>
181 <p>
182 <a href="http://journals.cambridge.org/action/displayAbstract?aid=1295104">Empirical and Analytic Study of Stack Versus Heap Cost for Languages with Closures</a>; Andrew W. Appel and Zhong Shao
183 </p>
184 </li>
185 <li>
186 <p>
187 <a href="http://portal.acm.org/citation.cfm?doid=182590.156783">Space-efficient closure representations</a>; Zhong Shao and Andrew W. Appel
188 </p>
189 </li>
190 <li>
191 <p>
192 <a href="http://portal.acm.org/citation.cfm?doid=93548.93554">Representing control in the presence of first-class continuations</a>; R. Hieb, R. Kent Dybvig, and Carl Bruggeman
193 </p>
194 </li>
195 </ul></div>
196 </div></div>
197 <div class="paragraph"><p>Recommended Skills: SML programming experience; some middle- and back-end compiler experience</p></div>
198 </div>
199 <div class="sect2">
200 <h3 id="_correctly_rounded_floating_point_binary_to_decimal_and_decimal_to_binary_conversion_routines_in_standard_ml">Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML</h3>
201 <div class="paragraph"><p>The
202 <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">IEEE Standard for Floating-Point Arithmetic (IEEE 754)</a>
203 is the de facto representation for floating-point computation.
204 However, it is a <em>binary</em> (base 2) representation of floating-point
205 values, while many applications call for input and output of
206 floating-point values in <em>decimal</em> (base 10) representation. The
207 <em>decimal-to-binary</em> conversion problem takes a decimal floating-point
208 representation (e.g., a string like <span class="monospaced">"0.1"</span>) and returns the best
209 binary floating-point representation of that number. The
210 <em>binary-to-decimal</em> conversion problem takes a binary floating-point
211 representation and returns a decimal floating-point representation
212 using the smallest number of digits that allow the decimal
213 floating-point representation to be converted to the original binary
214 floating-point representation. For both conversion routines, "best"
215 is dependent upon the current floating-point rounding mode.</p></div>
216 <div class="paragraph"><p>MLton uses David Gay&#8217;s
217 <a href="http://www.netlib.org/fp/gdtoa.tgz">gdtoa library</a> for floating-point
218 conversions. While this is an exellent library, it generalizes the
219 decimal-to-binary and binary-to-decimal conversion routines beyond
220 what is required by the
221 <a href="http://standardml.org/Basis/">Standard ML Basis Library</a> and induces an
222 external dependency on the compiler. Native implementations of these
223 conversion routines in Standard ML would obviate the dependency on the
224 <span class="monospaced">gdtoa</span> library, while also being able to take advantage of Standard
225 ML features in the implementation (e.g., the published algorithms
226 often require use of infinite precision arithmetic, which is provided
227 by the <span class="monospaced">IntInf</span> structure in Standard ML, but is provided in an ad hoc
228 fasion in the <span class="monospaced">gdtoa</span> library).</p></div>
229 <div class="paragraph"><p>This project aims to develop a native implementation of the conversion
230 routines in Standard ML.</p></div>
231 <div class="paragraph"><p>Background:</p></div>
232 <div class="openblock">
233 <div class="content">
234 <div class="ulist"><ul>
235 <li>
236 <p>
237 <a href="http://dl.acm.org/citation.cfm?doid=103162.103163">What every computer scientist should know about floating-point arithmetic</a>; David Goldberg
238 </p>
239 </li>
240 <li>
241 <p>
242 <a href="http://dl.acm.org/citation.cfm?doid=93542.93559">How to print floating-point numbers accurately</a>; Guy L. Steele, Jr. and Jon L. White
243 </p>
244 </li>
245 <li>
246 <p>
247 <a href="http://dl.acm.org/citation.cfm?doid=93542.93557">How to read floating point numbers accurately</a>; William D. Clinger
248 </p>
249 </li>
250 <li>
251 <p>
252 <a href="http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz">Correctly Rounded Binary-Decimal and Decimal-Binary Conversions</a>; David Gay
253 </p>
254 </li>
255 <li>
256 <p>
257 <a href="http://dl.acm.org/citation.cfm?doid=249069.231397">Printing floating-point numbers quickly and accurately</a>; Robert G. Burger and R. Kent Dybvig
258 </p>
259 </li>
260 <li>
261 <p>
262 <a href="http://dl.acm.org/citation.cfm?doid=1806596.1806623">Printing floating-point numbers quickly and accurately with integers</a>; Florian Loitsch
263 </p>
264 </li>
265 </ul></div>
266 </div></div>
267 <div class="paragraph"><p>Recommended Skills: SML programming experience; algorithm design and implementation</p></div>
268 </div>
269 <div class="sect2">
270 <h3 id="_implement_source_level_debugging">Implement Source-level Debugging</h3>
271 <div class="paragraph"><p>Debugging is a fact of programming life. Unfortunately, most SML
272 implementations (including MLton) provide little to no source-level
273 debugging support. This project aims to add basic to intermediate
274 source-level debugging support to the MLton compiler. MLton already
275 supports source-level profiling, which can be used to attribute bytes
276 allocated or time spent in source functions. It should be relatively
277 straightforward to leverage this source-level information into basic
278 source-level debugging support, with the ability to set/unset
279 breakpoints and step through declarations and functions. It may be
280 possible to also provide intermediate source-level debugging support,
281 with the ability to inspect in-scope variables of basic types (e.g.,
282 types compatible with MLton&#8217;s foreign function interface).</p></div>
283 <div class="paragraph"><p>Background:</p></div>
284 <div class="openblock">
285 <div class="content">
286 <div class="ulist"><ul>
287 <li>
288 <p>
289 <a href="http://mlton.org/HowProfilingWorks">MLton&#8201;&#8212;&#8201;How Profiling Works</a>
290 </p>
291 </li>
292 <li>
293 <p>
294 <a href="http://mlton.org/ForeignFunctionInterfaceTypes">MLton&#8201;&#8212;&#8201;Foreign Function Interface Types</a>
295 </p>
296 </li>
297 <li>
298 <p>
299 <a href="http://dwarfstd.org/">DWARF Debugging Standard</a>
300 </p>
301 </li>
302 <li>
303 <p>
304 <a href="http://sourceware.org/gdb/current/onlinedocs/stabs/index.html">STABS Debugging Format</a>
305 </p>
306 </li>
307 </ul></div>
308 </div></div>
309 <div class="paragraph"><p>Recommended Skills: SML programming experience; some compiler experience</p></div>
310 </div>
311 <div class="sect2">
312 <h3 id="_region_based_memory_management">Region Based Memory Management</h3>
313 <div class="paragraph"><p>Region based memory management is an alternative automatic memory
314 management scheme to garbage collection. Regions can be inferred by
315 the compiler (e.g., Cyclone and MLKit) or provided to the programmer
316 through a library. Since many students do not have extensive
317 experience with compilers we plan on adopting the later approach.
318 Creating a viable region based memory solution requires the removal of
319 the GC and changes to the allocator. Additionally, write barriers
320 will be necessary to ensure references between two ML objects is never
321 established if the left hand side of the assignment has a longer
322 lifetime than the right hand side. Students will need to come up with
323 an appropriate interface for creating, entering, and exiting regions
324 (examples include RTSJ scoped memory and SCJ scoped memory).</p></div>
325 <div class="paragraph"><p>Background:</p></div>
326 <div class="openblock">
327 <div class="content">
328 <div class="ulist"><ul>
329 <li>
330 <p>
331 Cyclone
332 </p>
333 </li>
334 <li>
335 <p>
336 MLKit
337 </p>
338 </li>
339 <li>
340 <p>
341 RTSJ + SCJ scopes
342 </p>
343 </li>
344 </ul></div>
345 </div></div>
346 <div class="paragraph"><p>Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience</p></div>
347 </div>
348 <div class="sect2">
349 <h3 id="_adding_real_time_capabilities">Adding Real-Time Capabilities</h3>
350 <div class="paragraph"><p>This project focuses on exposing real-time APIs from a real-time OS
351 kernel at the SML level. This will require mapping the current MLton
352 (or <a href="http://multimlton.cs.purdue.edu">MultiMLton</a>) threading framework
353 to real-time threads that the RTOS provides. This will include
354 associating priorities with MLton threads and building priority based
355 scheduling algorithms. Additionally, support for perdioc, aperiodic,
356 and sporadic tasks should be supported. A real-time SML library will
357 need to be created to provide a forward facing interface for
358 programmers. Stretch goals include reworking the MLton <span class="monospaced">atomic</span>
359 statement and associated synchronization primitives built on top of
360 the MLton <span class="monospaced">atomic</span> statement.</p></div>
361 <div class="paragraph"><p>Recommended Skills: SML programming experience; C programming experience; real-time experience a plus but not required</p></div>
362 </div>
363 <div class="sect2">
364 <h3 id="_real_time_garbage_collection">Real-Time Garbage Collection</h3>
365 <div class="paragraph"><p>This project focuses on modifications to the MLton GC to support
366 real-time garbage collection. We will model the real-time GC on the
367 Schism RTGC. The first task will be to create a fixed size runtime
368 object representation. Large structures will need to be represented
369 as a linked lists of fixed sized objects. Arrays and vectors will be
370 transferred into dense trees. Compaction and copying can therefore be
371 removed from the GC algorithms that MLton currently supports. Lastly,
372 the GC will be made concurrent, allowing for the execution of the GC
373 threads as the lowest priority task in the system. Stretch goals
374 include a priority aware mechanism for the GC to signal to real-time
375 ML threads that it needs to scan their stack and identification of
376 places where the stack is shallow to bound priority inversion during
377 this procedure.</p></div>
378 <div class="paragraph"><p>Recommended Skills: C programming experience; garbage collector experience a plus but not required</p></div>
379 </div>
380 </div>
381 </div>
382 </div>
383 <div id="footnotes"><hr></div>
384 <div id="footer">
385 <div id="footer-text">
386 </div>
387 <div id="footer-badges">
388 </div>
389 </div>
390 </body>
391 </html>