Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | HowProfilingWorks |
2 | ================= | |
3 | ||
4 | Here's how <:Profiling:> works. If profiling is on, the front end | |
5 | (elaborator) inserts `Enter` and `Leave` statements into the source | |
6 | program for function entry and exit. For example, | |
7 | [source,sml] | |
8 | ---- | |
9 | fun f n = if n = 0 then 0 else 1 + f (n - 1) | |
10 | ---- | |
11 | becomes | |
12 | [source,sml] | |
13 | ---- | |
14 | fun f n = | |
15 | let | |
16 | val () = Enter "f" | |
17 | val res = (if n = 0 then 0 else 1 + f (n - 1)) | |
18 | handle e => (Leave "f"; raise e) | |
19 | val () = Leave "f" | |
20 | in | |
21 | res | |
22 | end | |
23 | ---- | |
24 | ||
25 | Actually there is a bit more information than just the source function | |
26 | name; there is also lexical nesting and file position. | |
27 | ||
28 | Most of the middle of the compiler ignores, but preserves, `Enter` and | |
29 | `Leave`. However, so that profiling preserves tail calls, the | |
30 | <:Shrink:SSA shrinker> has an optimization that notices when the only | |
31 | operations that cause a call to be a nontail call are profiling | |
32 | operations, and if so, moves them before the call, turning it into a | |
33 | tail call. If you observe a program that has a tail call that appears | |
34 | to be turned into a nontail when compiled with profiling, please | |
35 | <:Bug:report a bug>. | |
36 | ||
37 | There is the `checkProf` function in | |
38 | <!ViewGitFile(mlton,master,mlton/ssa/type-check.fun)>, which checks that | |
39 | the `Enter`/`Leave` statements match up. | |
40 | ||
41 | In the backend, just before translating to the <:Machine: Machine IL>, | |
42 | the profiler uses the `Enter`/`Leave` statements to infer the "local" | |
43 | portion of the control stack at each program point. The profiler then | |
44 | removes the ++Enter++s/++Leave++s and inserts different information | |
45 | depending on which kind of profiling is happening. For time profiling | |
46 | (with the <:AMD64Codegen:> and <:X86Codegen:>), the profiler inserts labels that cover the | |
47 | code (i.e. each statement has a unique label in its basic block that | |
48 | prefixes it) and associates each label with the local control stack. | |
49 | For time profiling (with the <:CCodegen:> and <:LLVMCodegen:>), the profiler | |
50 | inserts code that sets a global field that records the local control | |
51 | stack. For allocation profiling, the profiler inserts calls to a C | |
52 | function that will maintain byte counts. With stack profiling, the | |
53 | profiler also inserts a call to a C function at each nontail call in | |
54 | order to maintain information at runtime about what SML functions are | |
55 | on the stack. | |
56 | ||
57 | At run time, the profiler associates counters (either clock ticks or | |
58 | byte counts) with source functions. When the program finishes, the | |
59 | profiler writes the counts out to the `mlmon.out` file. Then, | |
60 | `mlprof` uses source information stored in the executable to | |
61 | associate the counts in the `mlmon.out` file with source | |
62 | functions. | |
63 | ||
64 | For time profiling, the profiler catches the `SIGPROF` signal 100 | |
65 | times per second and increments the appropriate counter, determined by | |
66 | looking at the label prefixing the current program counter and mapping | |
67 | that to the current source function. | |
68 | ||
69 | == Caveats == | |
70 | ||
71 | There may be a few missed clock ticks or bytes allocated at the very | |
72 | end of the program after the data is written. | |
73 | ||
74 | Profiling has not been tested with signals or threads. In particular, | |
75 | stack profiling may behave strangely. |