Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | Copyright (c) 2001, 2002, Lucent Technologies, Bell Laboratories |
2 | ||
3 | author: Matthias Blume (blume@research.bell-labs.com) | |
4 | ||
5 | This directory contains ML-NLFFI-Gen, a glue-code generator for | |
6 | the new "NLFFI" foreign function interface. The generator reads | |
7 | C source code and emits ML code along with a description file for CM. | |
8 | ||
9 | Compiling this generator requires the C-Kit ($/ckit-lib.cm) to be | |
10 | installed. | |
11 | ||
12 | --------------------------------------------------------------------- | |
13 | ||
14 | February 21, 2002: Major changes: | |
15 | ||
16 | I reworked the glue code generator in a way that lets generated code | |
17 | scale better -- at the expense of some (mostly academic) generality. | |
18 | ||
19 | Changes involve the following: | |
20 | ||
21 | 1. The functorization is gone. | |
22 | ||
23 | 2. Every top-level C declaration results in a separate top-level | |
24 | ML equivalent (implemented by its own ML source file). | |
25 | ||
26 | 3. Incomplete pointer types are treated just like their complete | |
27 | versions -- the only difference being that no RTTI will be | |
28 | available for them. In the "light" interface, this rules out | |
29 | precisely those operations over them that C would disallow. | |
30 | ||
31 | 4. All related C sources must be supplied to ml-nlffigen together. | |
32 | Types incomplete in one source but complete in another get | |
33 | automatically completed in a cross-file fashion. | |
34 | ||
35 | 5. The handle for the shared library to link to is now abstracted as | |
36 | a function closure. Moreover, it must be supplied as a top-level | |
37 | variable (by the programmer). For this purpose, ml-nlffigen has | |
38 | corresponding command-line options. | |
39 | ||
40 | These changes mean that even very large (in number of exported definitions) | |
41 | libraries such as, e.g., GTK can now be handled gracefully without | |
42 | reaching the limits of the ML compiler's abilities. | |
43 | ||
44 | [The example of GTK -- for which ml-nlffigen creates several thousands (!) | |
45 | of separate ML source files -- puts an unusal burden on CM, though. | |
46 | However, aside from running a bit longer than usual, CM handles loads | |
47 | of this magnitute just fine. Stabilizing the resulting library solves | |
48 | the problem entirely as far as later clients are concerned.] | |
49 | ||
50 | ||
51 | Sketch of translation- (and naming-) scheme: | |
52 | ||
53 | struct foo { ... } | |
54 | --> structure ST_foo in st-foo.sml (not exported) | |
55 | basic type info (name, size) | |
56 | & structure S_foo in s-foo.sml | |
57 | abstract interface to the type | |
58 | field accessors f_xxx (unless -light) | |
59 | and f_xxx' (unless -heavy) | |
60 | field types t_f_xxx | |
61 | field RTTI typ_f_xxx | |
62 | ||
63 | & (unless "-nosucvt" was set) | |
64 | structures IS_foo in <a>/is-foo.sml | |
65 | (see discussion of struct *foo below) | |
66 | ||
67 | union foo { ... } | |
68 | --> structure UT_foo in ut-foo.sml (not exported) | |
69 | basic type info (name, size) | |
70 | & structure U_foo in u-foo.sml | |
71 | abstract interface to the type | |
72 | field accessors f_xxx (unless -light) | |
73 | and f_xxx' (unless -heavy) | |
74 | field types t_f_xxx | |
75 | field RTTI typ_f_xxx | |
76 | ||
77 | & (unless "-nosucvt" was set) | |
78 | structures IU_foo in <a>/iu-foo.sml | |
79 | (see discussion of union *foo below) | |
80 | ||
81 | struct { ... } | |
82 | like struct <n> { ... }, where <n> is a fresh integer or 'bar | |
83 | if 'struct { ... }' occurs in the context of a | |
84 | 'typedef struct { ... } bar' | |
85 | ||
86 | union { ... } | |
87 | like union <n> { ... }, where <n> is a fresh integer or 'bar | |
88 | if 'union { ... }' occurs in the context of a | |
89 | 'typedef union { ... } bar' | |
90 | ||
91 | ||
92 | enum foo { ... } | |
93 | --> structure E_foo in e-foo.sml | |
94 | external type mlrep with | |
95 | enum constants e_xxx | |
96 | conversion functions between tag enum and mlrep | |
97 | between mlrep and sint | |
98 | access functions (get/set) that operate on mlrep | |
99 | (as an alternative to C.Get.enum/C.Set.enum which | |
100 | operate on sint) | |
101 | ||
102 | If the command-line optino "-ec" ("-enum-constructors") was set | |
103 | and the values of all enum constants are different from each | |
104 | other, then mlrep will be a datatype (thus making it possible | |
105 | to pattern-match). | |
106 | ||
107 | enum { ... } | |
108 | If this construct appears in the context of a surrounding | |
109 | (non-anonymous) struct or union or typedef, the enumeration gets | |
110 | assigned an artificial tag (just like similar structs and unions, | |
111 | see above). | |
112 | ||
113 | Unless the command-line option "-nocollect" was specified, then | |
114 | all constants in other (truly) unnamed enumerations will be | |
115 | collected into a single enumeration represented by structure E_'. | |
116 | This single enumeration is then treated like a regular enumeration | |
117 | (including handling of "-ec" -- see above). | |
118 | ||
119 | The default behavior ("collect") is to assign a fresh integer | |
120 | tag (again, just like in the struct/union case). | |
121 | ||
122 | T foo (T, ..., T) (global function/function prototype) | |
123 | --> structure F_foo in f-foo.sml | |
124 | containing three/four members: | |
125 | typ : RTTI | |
126 | fptr: thunkified fptr representing the C function | |
127 | maybe f' : light-weight function wrapper around fptr | |
128 | Turned off by -heavy (see below). | |
129 | maybe f : heavy-weight function wrapper around fptr | |
130 | Turned off by -light (see below). | |
131 | ||
132 | T foo; (global variable) | |
133 | --> structure G_foo in g-foo.sml | |
134 | containing three members: | |
135 | t : type | |
136 | typ : RTTI | |
137 | obj : thunkified object representing the C variable | |
138 | ||
139 | struct foo * (without existing definition of struct foo; incomplete type) | |
140 | --> an internal structure ST_foo with a type "tag" (just like in | |
141 | the struct foo { ... } case) | |
142 | The difference is that no structure S_foo will be generated, | |
143 | so there is no field-access interface and no RTTI (size or typ) | |
144 | for this. All "light-weight" functions referring to this | |
145 | pointer type will be generated, heavy-weight functions will | |
146 | be generated only if they do not require access to RTTI. | |
147 | ||
148 | If "-heavy" was specified but a heavy interface function | |
149 | cannot be generated because of incomplete types, then its | |
150 | light counterpart will be issued generated anyway. | |
151 | ||
152 | union foo * Same as with struct foo *, but replace S_foo with U_foo | |
153 | and ST_foo with UT_foo. | |
154 | ||
155 | Additional files for implementing function entry sequences are created | |
156 | and used internally. They do not contribute exports, though. | |
157 | ||
158 | ||
159 | Command-line options for ml-nlffigen: | |
160 | ||
161 | General syntax: ml-nlffigen <option> ... [--] <C-file> ... | |
162 | ||
163 | Environment variables: | |
164 | ||
165 | Ml-nlffigen looks at the environment variable FFIGEN_CPP to obtain | |
166 | the template string for the cpp command line. If FFIGEN_CPP is not | |
167 | set, the template defaults to "gcc -E -U__GNUC__ %o %s > %t". | |
168 | The actual command line is obtained by substituting occurences of | |
169 | %s with the name of the source, and %t with the name of a temporary | |
170 | file holding the pre-processed code. | |
171 | ||
172 | Options: | |
173 | ||
174 | -dir <dir> output directory where all generated files are placed | |
175 | -d <dir> default: "NLFFI-Generated" | |
176 | ||
177 | -allSU instructs ml-nlffigen to include all structs and unions, | |
178 | even those that are defined in included files (as opposed | |
179 | to files explicitly listed as arguments) | |
180 | default: off | |
181 | ||
182 | -width <w> sets output line width (just a guess) to <w> | |
183 | -w <w> default: 75 | |
184 | ||
185 | -smloption <x> instructs ml-nlffigen to include <x> into the list | |
186 | of options to annotate .sml entries in the generated .cm | |
187 | file with. By default, the list consists just of "noguid". | |
188 | -guid Removes the default "noguid" from the list of sml options. | |
189 | (This re-enables strict handling of type- and object-identity | |
190 | but can have negative impact on CM cutoff recompilation | |
191 | performance if the programmer routinely removes the entire | |
192 | tree of ml-nlffigen-generated files during development.) | |
193 | ||
194 | (* | |
195 | -lambdasplit <x> instructs ml-nlffigen to generate "lambdasplit" | |
196 | -ls <x> options for all ML files (see CM manual for what this means; | |
197 | it does not currently work anyway because cross-module | |
198 | inlining is broken). | |
199 | default: nothing | |
200 | *) | |
201 | ||
202 | -target <t> Sets the target to <t> (which must be one of "sparc-unix", | |
203 | -t <t> "x86-unix", or "x86-win32"). | |
204 | default: current architecture | |
205 | ||
206 | -light suppress "heavy" versions of function wrappers and | |
207 | -l field accessors; also resets any earlier -heavy to default | |
208 | default: not suppressed | |
209 | ||
210 | -heavy suppress "light" versions of function wrappers and | |
211 | -h field accessors; also resets any earlier -light to default | |
212 | default: not suppressed | |
213 | ||
214 | -namedargs instruct ml-nlffigen to generated function wrappers that | |
215 | -na use named arguments (ML records) instead of tuples if | |
216 | there is enough information for this in the C source; | |
217 | (this is not always very useful) | |
218 | default: off | |
219 | ||
220 | -nocollect Do not do the following: | |
221 | Collect enum constants from truly unnamed enumerations | |
222 | (those without tags that occur at toplevel or in an | |
223 | unnamed context, i.e., not in a typedef or another | |
224 | named struct or union) into a single artificial | |
225 | enumeration tagged by ' (single apostrohe). The corresponding | |
226 | ML-side representative will be a structure named E_'. | |
227 | ||
228 | -enum-constructors | |
229 | -ec When possible (i.e., if all values of a given enumeration | |
230 | are different from each other), make the ML representation | |
231 | type of the enumeration a datatype. The default (and | |
232 | fallback) is to make that type the same as MLRep.Signed.int. | |
233 | ||
234 | -libhandle <h> Use the variable <h> to refer to the handle to the | |
235 | -lh <h> shared library object. Given the constraints of CM, <h> | |
236 | must have the form of a long ML identifier, e.g., | |
237 | MyLibrary.libhandle. | |
238 | default: Library.libh | |
239 | ||
240 | -include <f> Mention file <f> in the generated .cm file. This option | |
241 | -add <f> is necessary at least once for providing the library handle. | |
242 | It can be used arbitrarily many times, resulting in more | |
243 | than one such programmer-supplied file to be mentioned. | |
244 | If <f> is relative, then it must be relative to the directory | |
245 | specified in the -dir <dir> option. | |
246 | ||
247 | -cmfile <f> Specify name of the generated .cm file, relative to | |
248 | -cm <f> the directory specified by the -dir <dir> option. | |
249 | default: nlffi-generated.cm | |
250 | ||
251 | -cppopt <o> The string <o> gets added to the list of options to be | |
252 | passed to cpp (the C preprocessor). The list of options | |
253 | gets substituted for %o in the cpp command line template. | |
254 | ||
255 | -U<x> The string -U<x> gets added to the list of cpp options. | |
256 | ||
257 | -D<x> The string -D<x> gets added to the list of cpp options. | |
258 | ||
259 | -I<x> The string -I<x> gets added to the list of cpp options. | |
260 | ||
261 | -version Just write the version number of ml-nlffigen to standard | |
262 | output and then quit. | |
263 | ||
264 | -match <r> Normally ml-nlffigen will include ML definitions for a C | |
265 | -m <r> declaration if the C declaration textually appears in | |
266 | one of the files specified at the command line. Definitions | |
267 | in #include-d files will normally not appear (unless | |
268 | their absence would lead to inconsistencies). | |
269 | By specifying -match <r>, ml-nlffigen will also include | |
270 | definitions that occur in recursively #include-d files | |
271 | for which the AWK-style regular expression <r> matches | |
272 | their names. | |
273 | ||
274 | -prefix <p> Generated ML structure names will all have prefix <p> | |
275 | -p <p> (in addition to the usual "S_" or "U_" or "F_" ...) | |
276 | ||
277 | -gensym <g> Names "gensym-ed" by ml-nlffigen (for anonymous struct/union/ | |
278 | -g <g> enums) will get an additional suffix _<g>. (This should | |
279 | be used if output from several indepdendent runs of | |
280 | ml-nlffigen are to coexist in the same ML program.) | |
281 | ||
282 | -- Terminate processing of options, remaining arguments are | |
283 | taken to be C sources. | |
284 | ||
285 | ---------------------------------------------------------------------- | |
286 | ||
287 | Sample usage: | |
288 | ||
289 | Suppose we have a C interface defined in foo.h. | |
290 | ||
291 | 1. Running ml-nlffigen: | |
292 | ||
293 | It is best to let a tool such as Unix' "make" handle the invocation of | |
294 | ml-nlffigen. The following "Makefile" can be used as a template for | |
295 | other projects: | |
296 | ||
297 | +---------------------------------------------------------- | |
298 | |FILES = foo.h | |
299 | |H = FooH.libh | |
300 | |D = FFI | |
301 | |HF = ../foo-h.sml | |
302 | |CF = foo.cm | |
303 | | | |
304 | |$(D)/$(CF): $(FILES) | |
305 | | ml-nlffigen -include $(HF) -libhandle $(H) -dir $(D) -cmfile $(CF) $^ | |
306 | +---------------------------------------------------------- | |
307 | ||
308 | Suppose the above file is stored as "foo.make". Running | |
309 | ||
310 | $ make -f foo.make | |
311 | ||
312 | will generate a subdirectory "FFI" full of ML files corresponding to | |
313 | the definitions in foo.h. Access to the generated ML code is gained | |
314 | by refering to the CM library FFI/foo.cm; the .cm-file (foo.cm) is | |
315 | also produced by ml-nlffigen. | |
316 | ||
317 | 2. The ML code uses the library handle specified in the command line | |
318 | (here: FooH.libh) for dynamic linking. The type of FooH.libh must | |
319 | be: | |
320 | ||
321 | FooH.libh : string -> unit -> CMemory.addr | |
322 | ||
323 | That is, FooH.libh takes the name of a symbol and produces that | |
324 | symbol's suspended address. | |
325 | ||
326 | The code that implements FooH.libh must be provided by the programmer. | |
327 | In the above example, we assume that it is stored in file foo-h.sml. | |
328 | The name of that file must appear in the generated .cm-file, hence the | |
329 | "-include" command-line argument. | |
330 | ||
331 | Notice that the name provided to ml-nlffigen must be relative to the | |
332 | output directory. Therefore, in our case it is "../foo-h.sml" and not | |
333 | just foo-h.sml (because the full path would be FFI/../foo-h.sml). | |
334 | ||
335 | 3. To actually implement FooH.libh, use the "DynLinkage" module. | |
336 | Suppose the shared library's name is "/usr/lib/foo.so". Here is | |
337 | the corresponding contents of foo-h.sml: | |
338 | ||
339 | +------------------------------------------------------------- | |
340 | |structure FooH = struct | |
341 | | local | |
342 | | val lh = DynLinkage.open_lib | |
343 | | { name = "/usr/lib/foo.so", global = true, lazy = true } | |
344 | | in | |
345 | | fun libh s = let | |
346 | | val sh = DynLinkage.lib_symbol (lh, s) | |
347 | | in | |
348 | | fn () => DynLinkage.addr sh | |
349 | | end | |
350 | | end | |
351 | |end | |
352 | +------------------------------------------------------------- | |
353 | ||
354 | If all the symbols you are linking to are already available within | |
355 | the ML runtime system, then you don't need to open a new shared | |
356 | object. As a result, your FooH implementation would look like this: | |
357 | ||
358 | +------------------------------------------------------------- | |
359 | |structure FooH = struct | |
360 | | fun libh s = let | |
361 | | val sh = DynLinkage.lib_symbol (DynLinkage.main_lib, s) | |
362 | | in | |
363 | | fn () => DynLinkage.addr sh | |
364 | | end | |
365 | |end | |
366 | +------------------------------------------------------------- | |
367 | ||
368 | If the symbols your are accessing are strewn across several separate | |
369 | shared objects, then there are two possible solutions: | |
370 | ||
371 | a) Open several shared libraries and perform a trial-and-error search | |
372 | for every symbol you are looking up. (The DynLinkage module raises | |
373 | an exception (DynLinkError of string) if the lookup fails. This | |
374 | could be used to daisy-chain lookup operations.) | |
375 | ||
376 | [Be careful: Sometimes there are non-obvious inter-dependencies | |
377 | between shared libraries. Consider using DynLinkage.open_lib' | |
378 | to express those.] | |
379 | ||
380 | b) A simpler and more robust way of accessing several shared libraries | |
381 | is to create a new "summary" library object at the OS level. | |
382 | Supposed you are trying to access /usr/lib/foo.so and /usr/lib/bar.so. | |
383 | The solution is to make a "foobar.so" object by saying: | |
384 | ||
385 | $ ld -shared -o foobar.so /usr/lib/foo.so /usr/lib/bar.so | |
386 | ||
387 | The ML code then referes to foobar.so and the Linux dynamic loader | |
388 | does the rest. | |
389 | ||
390 | 4. To put it all together, let's wrap it up in a .cm-file. For example, | |
391 | if we simply want to directly make the ml-nlffigen-generated definitions | |
392 | available to the "end user", we could write this wrapper .cm-file | |
393 | (let's call it foo.cm): | |
394 | ||
395 | +------------------------------------------------------------- | |
396 | |library | |
397 | | library(FFI/foo.cm) | |
398 | |is | |
399 | | $/basis.cm | |
400 | | $/c.cm | |
401 | | FFI/foo.cm : make (-f foo.make) | |
402 | +------------------------------------------------------------- | |
403 | ||
404 | Now, saying | |
405 | ||
406 | $ sml -m foo.cm | |
407 | ||
408 | is all one need's to do in order to compile. (CM will automatically | |
409 | invoke "make", so you don't have to run "make" separately.) | |
410 | ||
411 | If the goal is not to export the "raw" ml-nlffigen-generated stuff | |
412 | but rather something more nicely "wrapped", consider writing wrapper | |
413 | ML code. Suppose you have wrapper definitions for structure Foo_a | |
414 | and structure Foo_b with code for those in wrap-foo-a.sml and | |
415 | wrap-foo-b.sml. In this case the corresponding .cm-file would | |
416 | look like the following: | |
417 | ||
418 | +------------------------------------------------------------- | |
419 | |library | |
420 | | structure Foo_a | |
421 | | structure Foo_b | |
422 | |is | |
423 | | $/basis.cm | |
424 | | $/c.cm | |
425 | | FFI/foo.cm : make (-f foo.make) | |
426 | | wrapper-foo-a.sml | |
427 | | wrapper-foo-b.sml | |
428 | +------------------------------------------------------------- |