Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / HowProfilingWorks.adoc
1 HowProfilingWorks
2 =================
3
4 Here's how <:Profiling:> works. If profiling is on, the front end
5 (elaborator) inserts `Enter` and `Leave` statements into the source
6 program for function entry and exit. For example,
7 [source,sml]
8 ----
9 fun f n = if n = 0 then 0 else 1 + f (n - 1)
10 ----
11 becomes
12 [source,sml]
13 ----
14 fun f n =
15 let
16 val () = Enter "f"
17 val res = (if n = 0 then 0 else 1 + f (n - 1))
18 handle e => (Leave "f"; raise e)
19 val () = Leave "f"
20 in
21 res
22 end
23 ----
24
25 Actually there is a bit more information than just the source function
26 name; there is also lexical nesting and file position.
27
28 Most of the middle of the compiler ignores, but preserves, `Enter` and
29 `Leave`. However, so that profiling preserves tail calls, the
30 <:Shrink:SSA shrinker> has an optimization that notices when the only
31 operations that cause a call to be a nontail call are profiling
32 operations, and if so, moves them before the call, turning it into a
33 tail call. If you observe a program that has a tail call that appears
34 to be turned into a nontail when compiled with profiling, please
35 <:Bug:report a bug>.
36
37 There is the `checkProf` function in
38 <!ViewGitFile(mlton,master,mlton/ssa/type-check.fun)>, which checks that
39 the `Enter`/`Leave` statements match up.
40
41 In the backend, just before translating to the <:Machine: Machine IL>,
42 the profiler uses the `Enter`/`Leave` statements to infer the "local"
43 portion of the control stack at each program point. The profiler then
44 removes the ++Enter++s/++Leave++s and inserts different information
45 depending on which kind of profiling is happening. For time profiling
46 (with the <:AMD64Codegen:> and <:X86Codegen:>), the profiler inserts labels that cover the
47 code (i.e. each statement has a unique label in its basic block that
48 prefixes it) and associates each label with the local control stack.
49 For time profiling (with the <:CCodegen:> and <:LLVMCodegen:>), the profiler
50 inserts code that sets a global field that records the local control
51 stack. For allocation profiling, the profiler inserts calls to a C
52 function that will maintain byte counts. With stack profiling, the
53 profiler also inserts a call to a C function at each nontail call in
54 order to maintain information at runtime about what SML functions are
55 on the stack.
56
57 At run time, the profiler associates counters (either clock ticks or
58 byte counts) with source functions. When the program finishes, the
59 profiler writes the counts out to the `mlmon.out` file. Then,
60 `mlprof` uses source information stored in the executable to
61 associate the counts in the `mlmon.out` file with source
62 functions.
63
64 For time profiling, the profiler catches the `SIGPROF` signal 100
65 times per second and increments the appropriate counter, determined by
66 looking at the label prefixing the current program counter and mapping
67 that to the current source function.
68
69 == Caveats ==
70
71 There may be a few missed clock ticks or bytes allocated at the very
72 end of the program after the data is written.
73
74 Profiling has not been tested with signals or threads. In particular,
75 stack profiling may behave strangely.