Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | MLNLFFIImplementation |
2 | ===================== | |
3 | ||
4 | MLton's implementation(s) of the <:MLNLFFI:> library differs from the | |
5 | SML/NJ implementation in two important ways: | |
6 | ||
7 | * MLton cannot utilize the `Unsafe.cast` "cheat" described in Section | |
8 | 3.7 of <!Cite(Blume01)>. (MLton's representation of | |
9 | <:Closure:closures> and | |
10 | <:PackedRepresentation:aggressive representation> optimizations make | |
11 | an `Unsafe.cast` even more "unsafe" than in other implementations.) | |
12 | + | |
13 | -- | |
14 | We have considered two solutions: | |
15 | ||
16 | ** One solution is to utilize an additional type parameter (as | |
17 | described in Section 3.7 of <!Cite(Blume01)>): | |
18 | + | |
19 | -- | |
20 | __________ | |
21 | [source,sml] | |
22 | ---- | |
23 | signature C = sig | |
24 | type ('t, 'f, 'c) obj | |
25 | eqtype ('t, 'f, 'c) obj' | |
26 | ... | |
27 | type ('o, 'f) ptr | |
28 | eqtype ('o, 'f) ptr' | |
29 | ... | |
30 | type 'f fptr | |
31 | type 'f ptr' | |
32 | ... | |
33 | structure T : sig | |
34 | type ('t, 'f) typ | |
35 | ... | |
36 | end | |
37 | end | |
38 | ---- | |
39 | ||
40 | The rule for `('t, 'f, 'c) obj`,`('t, 'f, 'c) ptr`, and also `('t, 'f) | |
41 | T.typ` is that whenever `F fptr` occurs within the instantiation of | |
42 | `'t`, then `'f` must be instantiated to `F`. In all other cases, `'f` | |
43 | will be instantiated to `unit`. | |
44 | __________ | |
45 | ||
46 | (In the actual MLton implementation, an abstract type `naf` | |
47 | (not-a-function) is used instead of `unit`.) | |
48 | ||
49 | While this means that type-annotated programs may not type-check under | |
50 | both the SML/NJ implementation and the MLton implementation, this | |
51 | should not be a problem in practice. Tools, like `ml-nlffigen`, which | |
52 | are necessarily implementation dependent (in order to make | |
53 | <:CallingFromSMLToCFunctionPointer:calls through a C function | |
54 | pointer>), may be easily extended to emit the additional type | |
55 | parameter. Client code which uses such generated glue-code (e.g., | |
56 | Section 1 of <!Cite(Blume01)>) need rarely write type-annotations, | |
57 | thanks to the magic of type inference. | |
58 | -- | |
59 | ||
60 | ** The above implementation suffers from two disadvantages. | |
61 | + | |
62 | -- | |
63 | First, it changes the MLNLFFI Library interface, meaning that the same | |
64 | program may not type-check under both the SML/NJ implementation and | |
65 | the MLton implementation (though, in light of type inference and the | |
66 | richer `MLRep` structure provided by MLton, this point is mostly | |
67 | moot). | |
68 | ||
69 | Second, it appears to unnecessarily duplicate type information. For | |
70 | example, an external C variable of type `int (* f[3])(int)` (that is, | |
71 | an array of three function pointers), would be represented by the SML | |
72 | type `(((sint -> sint) fptr, dec dg3) arr, sint -> sint, rw) obj`. | |
73 | One might well ask why the `'f` instantiation (`sint -> sint` in this | |
74 | case) cannot be _extracted_ from the `'t` instantiation | |
75 | (`((sint -> sint) fptr, dec dg3) arr` in this case), obviating the | |
76 | need for a separate _function-type_ type argument. There are a number | |
77 | of components to an complete answer to this question. Foremost is the | |
78 | fact that <:StandardML: Standard ML> supports neither (general) | |
79 | type-level functions nor intensional polymorphism. | |
80 | ||
81 | A more direct answer for MLNLFFI is that in the SML/NJ implemention, | |
82 | the definition of the types `('t, 'c) obj` and `('t, 'c) ptr` are made | |
83 | in such a way that the type variables `'t` and `'c` are <:PhantomType: | |
84 | phantom> (not contributing to the run-time representation of an | |
85 | `('t, 'c) obj` or `('t, 'c) ptr` value), despite the fact that the | |
86 | types `((sint -> sint) fptr, rw) ptr` and | |
87 | `((double -> double) fptr, rw) ptr` necessarily carry distinct (and | |
88 | type incompatible) run-time (C-)type information (RTTI), corresponding | |
89 | to the different calling conventions of the two C functions. The | |
90 | `Unsafe.cast` "cheat" overcomes the type incompatibility without | |
91 | introducing a new type variable (as in the first solution above). | |
92 | ||
93 | Hence, the reason that _function-type_ type cannot be extracted from | |
94 | the `'t` type variable instantiation is that the type of the | |
95 | representation of RTTI doesn't even _see_ the (phantom) `'t` type | |
96 | variable. The solution which presents itself is to give up on the | |
97 | phantomness of the `'t` type variable, making it available to the | |
98 | representation of RTTI. | |
99 | ||
100 | This is not without some small drawbacks. Because many of the types | |
101 | used to instantiate `'t` carry more structure than is strictly | |
102 | necessary for `'t`'s RTTI, it is sometimes necessary to wrap and | |
103 | unwrap RTTI to accommodate the additional structure. (In the other | |
104 | implementations, the corresponding operations can pass along the RTTI | |
105 | unchanged.) However, these coercions contribute minuscule overhead; | |
106 | in fact, in a majority of cases, MLton's optimizations will completely | |
107 | eliminate the RTTI from the final program. | |
108 | -- | |
109 | ||
110 | The implementation distributed with MLton uses the second solution. | |
111 | ||
112 | Bonus question: Why can't one use a <:UniversalType: universal type> | |
113 | to eliminate the use of `Unsafe.cast`? | |
114 | ||
115 | ** Answer: ??? | |
116 | -- | |
117 | ||
118 | * MLton (in both of the above implementations) provides a richer | |
119 | `MLRep` structure, utilizing ++Int__<N>__++ and ++Word__<N>__++ | |
120 | structures. | |
121 | + | |
122 | -- | |
123 | [source,sml] | |
124 | ----- | |
125 | structure MLRep = struct | |
126 | structure Char = | |
127 | struct | |
128 | structure Signed = Int8 | |
129 | structure Unsigned = Word8 | |
130 | (* word-style bit-operations on integers... *) | |
131 | structure <:SignedBitops:> = IntBitOps(structure I = Signed | |
132 | structure W = Unsigned) | |
133 | end | |
134 | structure Short = | |
135 | struct | |
136 | structure Signed = Int16 | |
137 | structure Unsigned = Word16 | |
138 | (* word-style bit-operations on integers... *) | |
139 | structure <:SignedBitops:> = IntBitOps(structure I = Signed | |
140 | structure W = Unsigned) | |
141 | end | |
142 | structure Int = | |
143 | struct | |
144 | structure Signed = Int32 | |
145 | structure Unsigned = Word32 | |
146 | (* word-style bit-operations on integers... *) | |
147 | structure <:SignedBitops:> = IntBitOps(structure I = Signed | |
148 | structure W = Unsigned) | |
149 | end | |
150 | structure Long = | |
151 | struct | |
152 | structure Signed = Int32 | |
153 | structure Unsigned = Word32 | |
154 | (* word-style bit-operations on integers... *) | |
155 | structure <:SignedBitops:> = IntBitOps(structure I = Signed | |
156 | structure W = Unsigned) | |
157 | end | |
158 | structure <:LongLong:> = | |
159 | struct | |
160 | structure Signed = Int64 | |
161 | structure Unsigned = Word64 | |
162 | (* word-style bit-operations on integers... *) | |
163 | structure <:SignedBitops:> = IntBitOps(structure I = Signed | |
164 | structure W = Unsigned) | |
165 | end | |
166 | structure Float = Real32 | |
167 | structure Double = Real64 | |
168 | end | |
169 | ---- | |
170 | ||
171 | This would appear to be a better interface, even when an | |
172 | implementation must choose `Int32` and `Word32` as the representation | |
173 | for smaller C-types. | |
174 | -- |