Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / GoogleSummerOfCode2015.adoc
1 Google Summer of Code (2015)
2 ============================
3 :toc:
4
5 == Mentors ==
6
7 The following developers have agreed to serve as mentors for the 2015 Google Summer of Code:
8
9 * http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
10 * http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
11 /////
12 * http://people.cs.uchicago.edu/~jhr/[John Reppy]
13 * http://www.cs.purdue.edu/homes/chandras[KC Sivaramakrishnan]
14 * http://www.cs.purdue.edu/homes/suresh/[Suresh Jagannathan]
15 /////
16
17 == Ideas List ==
18
19 /////
20 === Implement a Partial Redundancy Elimination (PRE) Optimization ===
21
22 Partial redundancy elimination (PRE) is a program transformation that
23 removes operations that are redundant on some, but not necessarily all
24 paths, through the program. PRE can subsume both common subexpression
25 elimination and loop-invariant code motion, and is therefore a
26 potentially powerful optimization. However, a naïve implementation of
27 PRE on a program in static single assignment (SSA) form is unlikely to
28 be effective. This project aims to adapt and implement the GVN-PRE
29 algorithm of Thomas VanDrunen in MLton's SSA intermediate language.
30
31 Background:
32 --
33 * http://cs.wheaton.edu/%7Etvandrun/writings/thesis.pdf[Partial Redundancy Elimination for Global Value Numbering]; Thomas VanDrunen
34 * http://www.cs.purdue.edu/research/technical_reports/2003/TR%2003-032.pdf[Corner-cases in Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
35 * http://www.springerlink.com/content/w06m3cw453nphm1u/[Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
36 * http://onlinelibrary.wiley.com/doi/10.1002/spe.618/abstract[Anticipation-based Partial Redundancy Elimination for Static Single Assignment Form]; Thomas VanDrunen and Antony L. Hosking
37 * http://portal.acm.org/citation.cfm?doid=319301.319348[Partial Redundancy Elimination in SSA Form]; Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred Chow
38 --
39
40 Recommended Skills: SML programming experience; some middle-end compiler experience
41
42 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
43 /////
44
45 === Design and Implement a Heap Profiler ===
46
47 A heap profile is a description of the space usage of a program. A
48 heap profile is concerned with the allocation, retention, and
49 deallocation (via garbage collection) of heap data during the
50 execution of a program. A heap profile can be used to diagnose
51 performance problems in a functional program that arise from space
52 leaks. This project aims to design and implement a heap profiler for
53 MLton compiled programs.
54
55 Background:
56 --
57 * http://portal.acm.org/citation.cfm?doid=583854.582451[GCspy: an adaptable heap visualisation framework]; Tony Printezis and Richard Jones
58 * http://journals.cambridge.org/action/displayAbstract?aid=1349892[New dimensions in heap profiling]; Colin Runciman and Niklas Röjemo
59 * http://www.springerlink.com/content/710501660722gw37/[Heap profiling for space efficiency]; Colin Runciman and Niklas Röjemo
60 * http://journals.cambridge.org/action/displayAbstract?aid=1323096[Heap profiling of lazy functional programs]; Colin Runciman and David Wakeling
61 --
62
63 Recommended Skills: C and SML programming experience; some experience with UI and visualization
64
65 /////
66 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
67 /////
68
69 === Garbage Collector Improvements ===
70
71 The garbage collector plays a significant role in the performance of
72 functional languages. Garbage collect too often, and program
73 performance suffers due to the excessive time spent in the garbage
74 collector. Garbage collect not often enough, and program performance
75 suffers due to the excessive space used by the uncollected
76 garbage. One particular issue is ensuring that a program utilizing a
77 garbage collector "plays nice" with other processes on the system, by
78 not using too much or too little physical memory. While there are some
79 reasonable theoretical results about garbage collections with heaps of
80 fixed size, there seems to be insufficient work that really looks
81 carefully at the question of dynamically resizing the heap in response
82 to the live data demands of the application and, similarly, in
83 response to the behavior of the operating system and other
84 processes. This project aims to investigate improvements to the memory
85 behavior of MLton compiled programs through better tuning of the
86 garbage collector.
87
88 Background:
89 --
90 * http://gchandbook.org/[The Garbage Collection Handbook: The Art of Automatic Memory Management]; Richard Jones, Antony Hosking, Eliot Moss
91 * http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1020[Dual-Mode Garbage Collection]; Patrick Sansom
92 * http://portal.acm.org/citation.cfm?doid=1029873.1029881[Automatic Heap Sizing: Taking Real Memory into Account]; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
93 * http://portal.acm.org/citation.cfm?doid=1152649.1152652[Controlling Garbage Collection and Heap Growth to Reduce the Execution Time of Java Applications]; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
94 * http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125[Isla Vista Heap Sizing: Using Feedback to Avoid Paging]; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
95 * http://portal.acm.org/citation.cfm?doid=1806651.1806669[The Economics of Garbage Collection]; Jeremy Singer, Richard E. Jones, Gavin Brown, and Mikel Luján
96 * http://www.dcs.gla.ac.uk/%7Ejsinger/pdfs/tfp12.pdf[Automated Heap Sizing in the Poly/ML Runtime (Position Paper)]; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
97 * http://portal.acm.org/citation.cfm?doid=2555670.2466481[Control Theory for Principled Heap Sizing]; David R. White, Jeremy Singer, Jonathan M. Aitken, and Richard E. Jones
98 --
99
100 Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience
101
102 /////
103 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
104 /////
105
106 === Heap-allocated Activation Records ===
107
108 Activation records (a.k.a., stack frames) are traditionally allocated
109 on a stack. This naturally corresponds to the call-return pattern of
110 function invocation. However, there are some disadvantages to
111 stack-allocated activation records. In a functional programming
112 language, functions may be deeply recursive, resulting in call stacks
113 that are much larger than typically supported by the operating system;
114 hence, a functional programming language implementation will typically
115 store its stack in its heap. Furthermore, a functional programming
116 language implementation must handle and recover from stack overflow,
117 by allocating a larger stack (again, in its heap) and copying
118 activation records from the old stack to the new stack. In the
119 presence of threads, stacks must be allocated in a heap and, in the
120 presence of a garbage collector, should be garbage collected when
121 unreachable. While heap-allocated activation records avoid many of
122 these disadvantages, they have not been widely implemented. This
123 project aims to implement and evaluate heap-allocated activation
124 records in the MLton compiler.
125
126 Background:
127 --
128 * http://journals.cambridge.org/action/displayAbstract?aid=1295104[Empirical and Analytic Study of Stack Versus Heap Cost for Languages with Closures]; Andrew W. Appel and Zhong Shao
129 * http://portal.acm.org/citation.cfm?doid=182590.156783[Space-efficient closure representations]; Zhong Shao and Andrew W. Appel
130 * http://portal.acm.org/citation.cfm?doid=93548.93554[Representing control in the presence of first-class continuations]; R. Hieb, R. Kent Dybvig, and Carl Bruggeman
131 --
132
133 Recommended Skills: SML programming experience; some middle- and back-end compiler experience
134
135 /////
136 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
137 /////
138
139 === Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML ===
140
141 The
142 http://en.wikipedia.org/wiki/IEEE_754-2008[IEEE Standard for Floating-Point Arithmetic (IEEE 754)]
143 is the de facto representation for floating-point computation.
144 However, it is a _binary_ (base 2) representation of floating-point
145 values, while many applications call for input and output of
146 floating-point values in _decimal_ (base 10) representation. The
147 _decimal-to-binary_ conversion problem takes a decimal floating-point
148 representation (e.g., a string like +"0.1"+) and returns the best
149 binary floating-point representation of that number. The
150 _binary-to-decimal_ conversion problem takes a binary floating-point
151 representation and returns a decimal floating-point representation
152 using the smallest number of digits that allow the decimal
153 floating-point representation to be converted to the original binary
154 floating-point representation. For both conversion routines, "best"
155 is dependent upon the current floating-point rounding mode.
156
157 MLton uses David Gay's
158 http://www.netlib.org/fp/gdtoa.tgz[gdtoa library] for floating-point
159 conversions. While this is an exellent library, it generalizes the
160 decimal-to-binary and binary-to-decimal conversion routines beyond
161 what is required by the
162 http://standardml.org/Basis/[Standard ML Basis Library] and induces an
163 external dependency on the compiler. Native implementations of these
164 conversion routines in Standard ML would obviate the dependency on the
165 +gdtoa+ library, while also being able to take advantage of Standard
166 ML features in the implementation (e.g., the published algorithms
167 often require use of infinite precision arithmetic, which is provided
168 by the +IntInf+ structure in Standard ML, but is provided in an ad hoc
169 fasion in the +gdtoa+ library).
170
171 This project aims to develop a native implementation of the conversion
172 routines in Standard ML.
173
174 Background:
175 --
176 * http://dl.acm.org/citation.cfm?doid=103162.103163[What every computer scientist should know about floating-point arithmetic]; David Goldberg
177 * http://dl.acm.org/citation.cfm?doid=93542.93559[How to print floating-point numbers accurately]; Guy L. Steele, Jr. and Jon L. White
178 * http://dl.acm.org/citation.cfm?doid=93542.93557[How to read floating point numbers accurately]; William D. Clinger
179 * http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz[Correctly Rounded Binary-Decimal and Decimal-Binary Conversions]; David Gay
180 * http://dl.acm.org/citation.cfm?doid=249069.231397[Printing floating-point numbers quickly and accurately]; Robert G. Burger and R. Kent Dybvig
181 * http://dl.acm.org/citation.cfm?doid=1806596.1806623[Printing floating-point numbers quickly and accurately with integers]; Florian Loitsch
182 --
183
184 Recommended Skills: SML programming experience; algorithm design and implementation
185
186 /////
187 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
188 /////
189
190 === Implement Source-level Debugging ===
191
192 Debugging is a fact of programming life. Unfortunately, most SML
193 implementations (including MLton) provide little to no source-level
194 debugging support. This project aims to add basic to intermediate
195 source-level debugging support to the MLton compiler. MLton already
196 supports source-level profiling, which can be used to attribute bytes
197 allocated or time spent in source functions. It should be relatively
198 straightforward to leverage this source-level information into basic
199 source-level debugging support, with the ability to set/unset
200 breakpoints and step through declarations and functions. It may be
201 possible to also provide intermediate source-level debugging support,
202 with the ability to inspect in-scope variables of basic types (e.g.,
203 types compatible with MLton's foreign function interface).
204
205 Background:
206 --
207 * http://mlton.org/HowProfilingWorks[MLton -- How Profiling Works]
208 * http://mlton.org/ForeignFunctionInterfaceTypes[MLton -- Foreign Function Interface Types]
209 * http://dwarfstd.org/[DWARF Debugging Standard]
210 * http://sourceware.org/gdb/current/onlinedocs/stabs/index.html[STABS Debugging Format]
211 --
212
213 Recommended Skills: SML programming experience; some compiler experience
214
215 /////
216 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
217 /////
218
219 === Region Based Memory Management ===
220
221 Region based memory management is an alternative automatic memory
222 management scheme to garbage collection. Regions can be inferred by
223 the compiler (e.g., Cyclone and MLKit) or provided to the programmer
224 through a library. Since many students do not have extensive
225 experience with compilers we plan on adopting the later approach.
226 Creating a viable region based memory solution requires the removal of
227 the GC and changes to the allocator. Additionally, write barriers
228 will be necessary to ensure references between two ML objects is never
229 established if the left hand side of the assignment has a longer
230 lifetime than the right hand side. Students will need to come up with
231 an appropriate interface for creating, entering, and exiting regions
232 (examples include RTSJ scoped memory and SCJ scoped memory).
233
234 Background:
235 --
236 * Cyclone
237 * MLKit
238 * RTSJ + SCJ scopes
239 --
240
241 Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience
242
243 /////
244 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
245 /////
246
247 === Adding Real-Time Capabilities ===
248
249 This project focuses on exposing real-time APIs from a real-time OS
250 kernel at the SML level. This will require mapping the current MLton
251 (or http://multimlton.cs.purdue.edu[MultiMLton]) threading framework
252 to real-time threads that the RTOS provides. This will include
253 associating priorities with MLton threads and building priority based
254 scheduling algorithms. Additionally, support for perdioc, aperiodic,
255 and sporadic tasks should be supported. A real-time SML library will
256 need to be created to provide a forward facing interface for
257 programmers. Stretch goals include reworking the MLton +atomic+
258 statement and associated synchronization primitives built on top of
259 the MLton +atomic+ statement.
260
261 Recommended Skills: SML programming experience; C programming experience; real-time experience a plus but not required
262
263 /////
264 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
265 /////
266
267 === Real-Time Garbage Collection ===
268
269 This project focuses on modifications to the MLton GC to support
270 real-time garbage collection. We will model the real-time GC on the
271 Schism RTGC. The first task will be to create a fixed size runtime
272 object representation. Large structures will need to be represented
273 as a linked lists of fixed sized objects. Arrays and vectors will be
274 transferred into dense trees. Compaction and copying can therefore be
275 removed from the GC algorithms that MLton currently supports. Lastly,
276 the GC will be made concurrent, allowing for the execution of the GC
277 threads as the lowest priority task in the system. Stretch goals
278 include a priority aware mechanism for the GC to signal to real-time
279 ML threads that it needs to scan their stack and identification of
280 places where the stack is shallow to bound priority inversion during
281 this procedure.
282
283 Recommended Skills: C programming experience; garbage collector experience a plus but not required
284
285 /////
286 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
287 /////
288
289 /////
290 === Concurrent{nbsp}ML Improvements ===
291
292 http://cml.cs.uchicago.edu/[Concurrent ML] is an SML concurrency
293 library based on synchronous message passing. MLton has a partial
294 implementation of the CML message-passing primitives, but its use in
295 real-world applications has been stymied by the lack of completeness
296 and thread-safe I/O libraries. This project would aim to flesh out
297 the CML implementation in MLton to be fully compatible with the
298 "official" version distributed as part of SML/NJ. Furthermore, time
299 permitting, runtime system support could be added to allow use of
300 modern OS features, such as asynchronous I/O, in the implementation of
301 CML's system interfaces.
302
303 Background
304 --
305 * http://cml.cs.uchicago.edu/
306 * http://mlton.org/ConcurrentML
307 * http://mlton.org/ConcurrentMLImplementation
308 --
309
310 Recommended Skills: SML programming experience; knowledge of concurrent programming; some operating systems and/or systems programming experience
311
312 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
313 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
314 /////
315
316 /////
317 === SML3d Development ===
318
319 The SML3d Project is a collection of libraries to support 3D graphics
320 programming using Standard ML and the http://opengl.org/[OpenGL]
321 graphics API. It currently requires the MLton implementation of SML
322 and is supported on Linux, Mac OS X, and Microsoft Windows. There is
323 also support for http://www.khronos.org/opencl/[OpenCL]. This project
324 aims to continue development of the SML3d Project.
325
326 Background
327 --
328 * http://sml3d.cs.uchicago.edu/
329 --
330
331 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
332 /////