Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / MLNLFFIImplementation.adoc
1 MLNLFFIImplementation
2 =====================
3
4 MLton's implementation(s) of the <:MLNLFFI:> library differs from the
5 SML/NJ implementation in two important ways:
6
7 * MLton cannot utilize the `Unsafe.cast` "cheat" described in Section
8 3.7 of <!Cite(Blume01)>. (MLton's representation of
9 <:Closure:closures> and
10 <:PackedRepresentation:aggressive representation> optimizations make
11 an `Unsafe.cast` even more "unsafe" than in other implementations.)
12 +
13 --
14 We have considered two solutions:
15
16 ** One solution is to utilize an additional type parameter (as
17 described in Section 3.7 of <!Cite(Blume01)>):
18 +
19 --
20 __________
21 [source,sml]
22 ----
23 signature C = sig
24 type ('t, 'f, 'c) obj
25 eqtype ('t, 'f, 'c) obj'
26 ...
27 type ('o, 'f) ptr
28 eqtype ('o, 'f) ptr'
29 ...
30 type 'f fptr
31 type 'f ptr'
32 ...
33 structure T : sig
34 type ('t, 'f) typ
35 ...
36 end
37 end
38 ----
39
40 The rule for `('t, 'f, 'c) obj`,`('t, 'f, 'c) ptr`, and also `('t, 'f)
41 T.typ` is that whenever `F fptr` occurs within the instantiation of
42 `'t`, then `'f` must be instantiated to `F`. In all other cases, `'f`
43 will be instantiated to `unit`.
44 __________
45
46 (In the actual MLton implementation, an abstract type `naf`
47 (not-a-function) is used instead of `unit`.)
48
49 While this means that type-annotated programs may not type-check under
50 both the SML/NJ implementation and the MLton implementation, this
51 should not be a problem in practice. Tools, like `ml-nlffigen`, which
52 are necessarily implementation dependent (in order to make
53 <:CallingFromSMLToCFunctionPointer:calls through a C function
54 pointer>), may be easily extended to emit the additional type
55 parameter. Client code which uses such generated glue-code (e.g.,
56 Section 1 of <!Cite(Blume01)>) need rarely write type-annotations,
57 thanks to the magic of type inference.
58 --
59
60 ** The above implementation suffers from two disadvantages.
61 +
62 --
63 First, it changes the MLNLFFI Library interface, meaning that the same
64 program may not type-check under both the SML/NJ implementation and
65 the MLton implementation (though, in light of type inference and the
66 richer `MLRep` structure provided by MLton, this point is mostly
67 moot).
68
69 Second, it appears to unnecessarily duplicate type information. For
70 example, an external C variable of type `int (* f[3])(int)` (that is,
71 an array of three function pointers), would be represented by the SML
72 type `(((sint -> sint) fptr, dec dg3) arr, sint -> sint, rw) obj`.
73 One might well ask why the `'f` instantiation (`sint -> sint` in this
74 case) cannot be _extracted_ from the `'t` instantiation
75 (`((sint -> sint) fptr, dec dg3) arr` in this case), obviating the
76 need for a separate _function-type_ type argument. There are a number
77 of components to an complete answer to this question. Foremost is the
78 fact that <:StandardML: Standard ML> supports neither (general)
79 type-level functions nor intensional polymorphism.
80
81 A more direct answer for MLNLFFI is that in the SML/NJ implemention,
82 the definition of the types `('t, 'c) obj` and `('t, 'c) ptr` are made
83 in such a way that the type variables `'t` and `'c` are <:PhantomType:
84 phantom> (not contributing to the run-time representation of an
85 `('t, 'c) obj` or `('t, 'c) ptr` value), despite the fact that the
86 types `((sint -> sint) fptr, rw) ptr` and
87 `((double -> double) fptr, rw) ptr` necessarily carry distinct (and
88 type incompatible) run-time (C-)type information (RTTI), corresponding
89 to the different calling conventions of the two C functions. The
90 `Unsafe.cast` "cheat" overcomes the type incompatibility without
91 introducing a new type variable (as in the first solution above).
92
93 Hence, the reason that _function-type_ type cannot be extracted from
94 the `'t` type variable instantiation is that the type of the
95 representation of RTTI doesn't even _see_ the (phantom) `'t` type
96 variable. The solution which presents itself is to give up on the
97 phantomness of the `'t` type variable, making it available to the
98 representation of RTTI.
99
100 This is not without some small drawbacks. Because many of the types
101 used to instantiate `'t` carry more structure than is strictly
102 necessary for `'t`'s RTTI, it is sometimes necessary to wrap and
103 unwrap RTTI to accommodate the additional structure. (In the other
104 implementations, the corresponding operations can pass along the RTTI
105 unchanged.) However, these coercions contribute minuscule overhead;
106 in fact, in a majority of cases, MLton's optimizations will completely
107 eliminate the RTTI from the final program.
108 --
109
110 The implementation distributed with MLton uses the second solution.
111
112 Bonus question: Why can't one use a <:UniversalType: universal type>
113 to eliminate the use of `Unsafe.cast`?
114
115 ** Answer: ???
116 --
117
118 * MLton (in both of the above implementations) provides a richer
119 `MLRep` structure, utilizing ++Int__<N>__++ and ++Word__<N>__++
120 structures.
121 +
122 --
123 [source,sml]
124 -----
125 structure MLRep = struct
126 structure Char =
127 struct
128 structure Signed = Int8
129 structure Unsigned = Word8
130 (* word-style bit-operations on integers... *)
131 structure <:SignedBitops:> = IntBitOps(structure I = Signed
132 structure W = Unsigned)
133 end
134 structure Short =
135 struct
136 structure Signed = Int16
137 structure Unsigned = Word16
138 (* word-style bit-operations on integers... *)
139 structure <:SignedBitops:> = IntBitOps(structure I = Signed
140 structure W = Unsigned)
141 end
142 structure Int =
143 struct
144 structure Signed = Int32
145 structure Unsigned = Word32
146 (* word-style bit-operations on integers... *)
147 structure <:SignedBitops:> = IntBitOps(structure I = Signed
148 structure W = Unsigned)
149 end
150 structure Long =
151 struct
152 structure Signed = Int32
153 structure Unsigned = Word32
154 (* word-style bit-operations on integers... *)
155 structure <:SignedBitops:> = IntBitOps(structure I = Signed
156 structure W = Unsigned)
157 end
158 structure <:LongLong:> =
159 struct
160 structure Signed = Int64
161 structure Unsigned = Word64
162 (* word-style bit-operations on integers... *)
163 structure <:SignedBitops:> = IntBitOps(structure I = Signed
164 structure W = Unsigned)
165 end
166 structure Float = Real32
167 structure Double = Real64
168 end
169 ----
170
171 This would appear to be a better interface, even when an
172 implementation must choose `Int32` and `Word32` as the representation
173 for smaller C-types.
174 --