Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / GoogleSummerOfCode2015.adoc
CommitLineData
7f918cf1
CE
1Google Summer of Code (2015)
2============================
3:toc:
4
5== Mentors ==
6
7The following developers have agreed to serve as mentors for the 2015 Google Summer of Code:
8
9* http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
10* http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
11/////
12* http://people.cs.uchicago.edu/~jhr/[John Reppy]
13* http://www.cs.purdue.edu/homes/chandras[KC Sivaramakrishnan]
14* http://www.cs.purdue.edu/homes/suresh/[Suresh Jagannathan]
15/////
16
17== Ideas List ==
18
19/////
20=== Implement a Partial Redundancy Elimination (PRE) Optimization ===
21
22Partial redundancy elimination (PRE) is a program transformation that
23removes operations that are redundant on some, but not necessarily all
24paths, through the program. PRE can subsume both common subexpression
25elimination and loop-invariant code motion, and is therefore a
26potentially powerful optimization. However, a naïve implementation of
27PRE on a program in static single assignment (SSA) form is unlikely to
28be effective. This project aims to adapt and implement the GVN-PRE
29algorithm of Thomas VanDrunen in MLton's SSA intermediate language.
30
31Background:
32--
33* http://cs.wheaton.edu/%7Etvandrun/writings/thesis.pdf[Partial Redundancy Elimination for Global Value Numbering]; Thomas VanDrunen
34* http://www.cs.purdue.edu/research/technical_reports/2003/TR%2003-032.pdf[Corner-cases in Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
35* http://www.springerlink.com/content/w06m3cw453nphm1u/[Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
36* http://onlinelibrary.wiley.com/doi/10.1002/spe.618/abstract[Anticipation-based Partial Redundancy Elimination for Static Single Assignment Form]; Thomas VanDrunen and Antony L. Hosking
37* http://portal.acm.org/citation.cfm?doid=319301.319348[Partial Redundancy Elimination in SSA Form]; Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred Chow
38--
39
40Recommended Skills: SML programming experience; some middle-end compiler experience
41
42Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
43/////
44
45=== Design and Implement a Heap Profiler ===
46
47A heap profile is a description of the space usage of a program. A
48heap profile is concerned with the allocation, retention, and
49deallocation (via garbage collection) of heap data during the
50execution of a program. A heap profile can be used to diagnose
51performance problems in a functional program that arise from space
52leaks. This project aims to design and implement a heap profiler for
53MLton compiled programs.
54
55Background:
56--
57* http://portal.acm.org/citation.cfm?doid=583854.582451[GCspy: an adaptable heap visualisation framework]; Tony Printezis and Richard Jones
58* http://journals.cambridge.org/action/displayAbstract?aid=1349892[New dimensions in heap profiling]; Colin Runciman and Niklas Röjemo
59* http://www.springerlink.com/content/710501660722gw37/[Heap profiling for space efficiency]; Colin Runciman and Niklas Röjemo
60* http://journals.cambridge.org/action/displayAbstract?aid=1323096[Heap profiling of lazy functional programs]; Colin Runciman and David Wakeling
61--
62
63Recommended Skills: C and SML programming experience; some experience with UI and visualization
64
65/////
66Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
67/////
68
69=== Garbage Collector Improvements ===
70
71The garbage collector plays a significant role in the performance of
72functional languages. Garbage collect too often, and program
73performance suffers due to the excessive time spent in the garbage
74collector. Garbage collect not often enough, and program performance
75suffers due to the excessive space used by the uncollected
76garbage. One particular issue is ensuring that a program utilizing a
77garbage collector "plays nice" with other processes on the system, by
78not using too much or too little physical memory. While there are some
79reasonable theoretical results about garbage collections with heaps of
80fixed size, there seems to be insufficient work that really looks
81carefully at the question of dynamically resizing the heap in response
82to the live data demands of the application and, similarly, in
83response to the behavior of the operating system and other
84processes. This project aims to investigate improvements to the memory
85behavior of MLton compiled programs through better tuning of the
86garbage collector.
87
88Background:
89--
90* http://gchandbook.org/[The Garbage Collection Handbook: The Art of Automatic Memory Management]; Richard Jones, Antony Hosking, Eliot Moss
91* http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1020[Dual-Mode Garbage Collection]; Patrick Sansom
92* http://portal.acm.org/citation.cfm?doid=1029873.1029881[Automatic Heap Sizing: Taking Real Memory into Account]; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
93* http://portal.acm.org/citation.cfm?doid=1152649.1152652[Controlling Garbage Collection and Heap Growth to Reduce the Execution Time of Java Applications]; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
94* http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125[Isla Vista Heap Sizing: Using Feedback to Avoid Paging]; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
95* http://portal.acm.org/citation.cfm?doid=1806651.1806669[The Economics of Garbage Collection]; Jeremy Singer, Richard E. Jones, Gavin Brown, and Mikel Luján
96* http://www.dcs.gla.ac.uk/%7Ejsinger/pdfs/tfp12.pdf[Automated Heap Sizing in the Poly/ML Runtime (Position Paper)]; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
97* http://portal.acm.org/citation.cfm?doid=2555670.2466481[Control Theory for Principled Heap Sizing]; David R. White, Jeremy Singer, Jonathan M. Aitken, and Richard E. Jones
98--
99
100Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience
101
102/////
103Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
104/////
105
106=== Heap-allocated Activation Records ===
107
108Activation records (a.k.a., stack frames) are traditionally allocated
109on a stack. This naturally corresponds to the call-return pattern of
110function invocation. However, there are some disadvantages to
111stack-allocated activation records. In a functional programming
112language, functions may be deeply recursive, resulting in call stacks
113that are much larger than typically supported by the operating system;
114hence, a functional programming language implementation will typically
115store its stack in its heap. Furthermore, a functional programming
116language implementation must handle and recover from stack overflow,
117by allocating a larger stack (again, in its heap) and copying
118activation records from the old stack to the new stack. In the
119presence of threads, stacks must be allocated in a heap and, in the
120presence of a garbage collector, should be garbage collected when
121unreachable. While heap-allocated activation records avoid many of
122these disadvantages, they have not been widely implemented. This
123project aims to implement and evaluate heap-allocated activation
124records in the MLton compiler.
125
126Background:
127--
128* http://journals.cambridge.org/action/displayAbstract?aid=1295104[Empirical and Analytic Study of Stack Versus Heap Cost for Languages with Closures]; Andrew W. Appel and Zhong Shao
129* http://portal.acm.org/citation.cfm?doid=182590.156783[Space-efficient closure representations]; Zhong Shao and Andrew W. Appel
130* http://portal.acm.org/citation.cfm?doid=93548.93554[Representing control in the presence of first-class continuations]; R. Hieb, R. Kent Dybvig, and Carl Bruggeman
131--
132
133Recommended Skills: SML programming experience; some middle- and back-end compiler experience
134
135/////
136Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
137/////
138
139=== Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML ===
140
141The
142http://en.wikipedia.org/wiki/IEEE_754-2008[IEEE Standard for Floating-Point Arithmetic (IEEE 754)]
143is the de facto representation for floating-point computation.
144However, it is a _binary_ (base 2) representation of floating-point
145values, while many applications call for input and output of
146floating-point values in _decimal_ (base 10) representation. The
147_decimal-to-binary_ conversion problem takes a decimal floating-point
148representation (e.g., a string like +"0.1"+) and returns the best
149binary floating-point representation of that number. The
150_binary-to-decimal_ conversion problem takes a binary floating-point
151representation and returns a decimal floating-point representation
152using the smallest number of digits that allow the decimal
153floating-point representation to be converted to the original binary
154floating-point representation. For both conversion routines, "best"
155is dependent upon the current floating-point rounding mode.
156
157MLton uses David Gay's
158http://www.netlib.org/fp/gdtoa.tgz[gdtoa library] for floating-point
159conversions. While this is an exellent library, it generalizes the
160decimal-to-binary and binary-to-decimal conversion routines beyond
161what is required by the
162http://standardml.org/Basis/[Standard ML Basis Library] and induces an
163external dependency on the compiler. Native implementations of these
164conversion routines in Standard ML would obviate the dependency on the
165+gdtoa+ library, while also being able to take advantage of Standard
166ML features in the implementation (e.g., the published algorithms
167often require use of infinite precision arithmetic, which is provided
168by the +IntInf+ structure in Standard ML, but is provided in an ad hoc
169fasion in the +gdtoa+ library).
170
171This project aims to develop a native implementation of the conversion
172routines in Standard ML.
173
174Background:
175--
176* http://dl.acm.org/citation.cfm?doid=103162.103163[What every computer scientist should know about floating-point arithmetic]; David Goldberg
177* http://dl.acm.org/citation.cfm?doid=93542.93559[How to print floating-point numbers accurately]; Guy L. Steele, Jr. and Jon L. White
178* http://dl.acm.org/citation.cfm?doid=93542.93557[How to read floating point numbers accurately]; William D. Clinger
179* http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz[Correctly Rounded Binary-Decimal and Decimal-Binary Conversions]; David Gay
180* http://dl.acm.org/citation.cfm?doid=249069.231397[Printing floating-point numbers quickly and accurately]; Robert G. Burger and R. Kent Dybvig
181* http://dl.acm.org/citation.cfm?doid=1806596.1806623[Printing floating-point numbers quickly and accurately with integers]; Florian Loitsch
182--
183
184Recommended Skills: SML programming experience; algorithm design and implementation
185
186/////
187Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
188/////
189
190=== Implement Source-level Debugging ===
191
192Debugging is a fact of programming life. Unfortunately, most SML
193implementations (including MLton) provide little to no source-level
194debugging support. This project aims to add basic to intermediate
195source-level debugging support to the MLton compiler. MLton already
196supports source-level profiling, which can be used to attribute bytes
197allocated or time spent in source functions. It should be relatively
198straightforward to leverage this source-level information into basic
199source-level debugging support, with the ability to set/unset
200breakpoints and step through declarations and functions. It may be
201possible to also provide intermediate source-level debugging support,
202with the ability to inspect in-scope variables of basic types (e.g.,
203types compatible with MLton's foreign function interface).
204
205Background:
206--
207* http://mlton.org/HowProfilingWorks[MLton -- How Profiling Works]
208* http://mlton.org/ForeignFunctionInterfaceTypes[MLton -- Foreign Function Interface Types]
209* http://dwarfstd.org/[DWARF Debugging Standard]
210* http://sourceware.org/gdb/current/onlinedocs/stabs/index.html[STABS Debugging Format]
211--
212
213Recommended Skills: SML programming experience; some compiler experience
214
215/////
216Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
217/////
218
219=== Region Based Memory Management ===
220
221Region based memory management is an alternative automatic memory
222management scheme to garbage collection. Regions can be inferred by
223the compiler (e.g., Cyclone and MLKit) or provided to the programmer
224through a library. Since many students do not have extensive
225experience with compilers we plan on adopting the later approach.
226Creating a viable region based memory solution requires the removal of
227the GC and changes to the allocator. Additionally, write barriers
228will be necessary to ensure references between two ML objects is never
229established if the left hand side of the assignment has a longer
230lifetime than the right hand side. Students will need to come up with
231an appropriate interface for creating, entering, and exiting regions
232(examples include RTSJ scoped memory and SCJ scoped memory).
233
234Background:
235--
236* Cyclone
237* MLKit
238* RTSJ + SCJ scopes
239--
240
241Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience
242
243/////
244Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
245/////
246
247=== Adding Real-Time Capabilities ===
248
249This project focuses on exposing real-time APIs from a real-time OS
250kernel at the SML level. This will require mapping the current MLton
251(or http://multimlton.cs.purdue.edu[MultiMLton]) threading framework
252to real-time threads that the RTOS provides. This will include
253associating priorities with MLton threads and building priority based
254scheduling algorithms. Additionally, support for perdioc, aperiodic,
255and sporadic tasks should be supported. A real-time SML library will
256need to be created to provide a forward facing interface for
257programmers. Stretch goals include reworking the MLton +atomic+
258statement and associated synchronization primitives built on top of
259the MLton +atomic+ statement.
260
261Recommended Skills: SML programming experience; C programming experience; real-time experience a plus but not required
262
263/////
264Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
265/////
266
267=== Real-Time Garbage Collection ===
268
269This project focuses on modifications to the MLton GC to support
270real-time garbage collection. We will model the real-time GC on the
271Schism RTGC. The first task will be to create a fixed size runtime
272object representation. Large structures will need to be represented
273as a linked lists of fixed sized objects. Arrays and vectors will be
274transferred into dense trees. Compaction and copying can therefore be
275removed from the GC algorithms that MLton currently supports. Lastly,
276the GC will be made concurrent, allowing for the execution of the GC
277threads as the lowest priority task in the system. Stretch goals
278include a priority aware mechanism for the GC to signal to real-time
279ML threads that it needs to scan their stack and identification of
280places where the stack is shallow to bound priority inversion during
281this procedure.
282
283Recommended Skills: C programming experience; garbage collector experience a plus but not required
284
285/////
286Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
287/////
288
289/////
290=== Concurrent{nbsp}ML Improvements ===
291
292http://cml.cs.uchicago.edu/[Concurrent ML] is an SML concurrency
293library based on synchronous message passing. MLton has a partial
294implementation of the CML message-passing primitives, but its use in
295real-world applications has been stymied by the lack of completeness
296and thread-safe I/O libraries. This project would aim to flesh out
297the CML implementation in MLton to be fully compatible with the
298"official" version distributed as part of SML/NJ. Furthermore, time
299permitting, runtime system support could be added to allow use of
300modern OS features, such as asynchronous I/O, in the implementation of
301CML's system interfaces.
302
303Background
304--
305* http://cml.cs.uchicago.edu/
306* http://mlton.org/ConcurrentML
307* http://mlton.org/ConcurrentMLImplementation
308--
309
310Recommended Skills: SML programming experience; knowledge of concurrent programming; some operating systems and/or systems programming experience
311
312Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
313Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
314/////
315
316/////
317=== SML3d Development ===
318
319The SML3d Project is a collection of libraries to support 3D graphics
320programming using Standard ML and the http://opengl.org/[OpenGL]
321graphics API. It currently requires the MLton implementation of SML
322and is supported on Linux, Mac OS X, and Microsoft Windows. There is
323also support for http://www.khronos.org/opencl/[OpenCL]. This project
324aims to continue development of the SML3d Project.
325
326Background
327--
328* http://sml3d.cs.uchicago.edu/
329--
330
331Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
332/////