| 1 | MLNLFFIImplementation |
| 2 | ===================== |
| 3 | |
| 4 | MLton's implementation(s) of the <:MLNLFFI:> library differs from the |
| 5 | SML/NJ implementation in two important ways: |
| 6 | |
| 7 | * MLton cannot utilize the `Unsafe.cast` "cheat" described in Section |
| 8 | 3.7 of <!Cite(Blume01)>. (MLton's representation of |
| 9 | <:Closure:closures> and |
| 10 | <:PackedRepresentation:aggressive representation> optimizations make |
| 11 | an `Unsafe.cast` even more "unsafe" than in other implementations.) |
| 12 | + |
| 13 | -- |
| 14 | We have considered two solutions: |
| 15 | |
| 16 | ** One solution is to utilize an additional type parameter (as |
| 17 | described in Section 3.7 of <!Cite(Blume01)>): |
| 18 | + |
| 19 | -- |
| 20 | __________ |
| 21 | [source,sml] |
| 22 | ---- |
| 23 | signature C = sig |
| 24 | type ('t, 'f, 'c) obj |
| 25 | eqtype ('t, 'f, 'c) obj' |
| 26 | ... |
| 27 | type ('o, 'f) ptr |
| 28 | eqtype ('o, 'f) ptr' |
| 29 | ... |
| 30 | type 'f fptr |
| 31 | type 'f ptr' |
| 32 | ... |
| 33 | structure T : sig |
| 34 | type ('t, 'f) typ |
| 35 | ... |
| 36 | end |
| 37 | end |
| 38 | ---- |
| 39 | |
| 40 | The rule for `('t, 'f, 'c) obj`,`('t, 'f, 'c) ptr`, and also `('t, 'f) |
| 41 | T.typ` is that whenever `F fptr` occurs within the instantiation of |
| 42 | `'t`, then `'f` must be instantiated to `F`. In all other cases, `'f` |
| 43 | will be instantiated to `unit`. |
| 44 | __________ |
| 45 | |
| 46 | (In the actual MLton implementation, an abstract type `naf` |
| 47 | (not-a-function) is used instead of `unit`.) |
| 48 | |
| 49 | While this means that type-annotated programs may not type-check under |
| 50 | both the SML/NJ implementation and the MLton implementation, this |
| 51 | should not be a problem in practice. Tools, like `ml-nlffigen`, which |
| 52 | are necessarily implementation dependent (in order to make |
| 53 | <:CallingFromSMLToCFunctionPointer:calls through a C function |
| 54 | pointer>), may be easily extended to emit the additional type |
| 55 | parameter. Client code which uses such generated glue-code (e.g., |
| 56 | Section 1 of <!Cite(Blume01)>) need rarely write type-annotations, |
| 57 | thanks to the magic of type inference. |
| 58 | -- |
| 59 | |
| 60 | ** The above implementation suffers from two disadvantages. |
| 61 | + |
| 62 | -- |
| 63 | First, it changes the MLNLFFI Library interface, meaning that the same |
| 64 | program may not type-check under both the SML/NJ implementation and |
| 65 | the MLton implementation (though, in light of type inference and the |
| 66 | richer `MLRep` structure provided by MLton, this point is mostly |
| 67 | moot). |
| 68 | |
| 69 | Second, it appears to unnecessarily duplicate type information. For |
| 70 | example, an external C variable of type `int (* f[3])(int)` (that is, |
| 71 | an array of three function pointers), would be represented by the SML |
| 72 | type `(((sint -> sint) fptr, dec dg3) arr, sint -> sint, rw) obj`. |
| 73 | One might well ask why the `'f` instantiation (`sint -> sint` in this |
| 74 | case) cannot be _extracted_ from the `'t` instantiation |
| 75 | (`((sint -> sint) fptr, dec dg3) arr` in this case), obviating the |
| 76 | need for a separate _function-type_ type argument. There are a number |
| 77 | of components to an complete answer to this question. Foremost is the |
| 78 | fact that <:StandardML: Standard ML> supports neither (general) |
| 79 | type-level functions nor intensional polymorphism. |
| 80 | |
| 81 | A more direct answer for MLNLFFI is that in the SML/NJ implemention, |
| 82 | the definition of the types `('t, 'c) obj` and `('t, 'c) ptr` are made |
| 83 | in such a way that the type variables `'t` and `'c` are <:PhantomType: |
| 84 | phantom> (not contributing to the run-time representation of an |
| 85 | `('t, 'c) obj` or `('t, 'c) ptr` value), despite the fact that the |
| 86 | types `((sint -> sint) fptr, rw) ptr` and |
| 87 | `((double -> double) fptr, rw) ptr` necessarily carry distinct (and |
| 88 | type incompatible) run-time (C-)type information (RTTI), corresponding |
| 89 | to the different calling conventions of the two C functions. The |
| 90 | `Unsafe.cast` "cheat" overcomes the type incompatibility without |
| 91 | introducing a new type variable (as in the first solution above). |
| 92 | |
| 93 | Hence, the reason that _function-type_ type cannot be extracted from |
| 94 | the `'t` type variable instantiation is that the type of the |
| 95 | representation of RTTI doesn't even _see_ the (phantom) `'t` type |
| 96 | variable. The solution which presents itself is to give up on the |
| 97 | phantomness of the `'t` type variable, making it available to the |
| 98 | representation of RTTI. |
| 99 | |
| 100 | This is not without some small drawbacks. Because many of the types |
| 101 | used to instantiate `'t` carry more structure than is strictly |
| 102 | necessary for `'t`'s RTTI, it is sometimes necessary to wrap and |
| 103 | unwrap RTTI to accommodate the additional structure. (In the other |
| 104 | implementations, the corresponding operations can pass along the RTTI |
| 105 | unchanged.) However, these coercions contribute minuscule overhead; |
| 106 | in fact, in a majority of cases, MLton's optimizations will completely |
| 107 | eliminate the RTTI from the final program. |
| 108 | -- |
| 109 | |
| 110 | The implementation distributed with MLton uses the second solution. |
| 111 | |
| 112 | Bonus question: Why can't one use a <:UniversalType: universal type> |
| 113 | to eliminate the use of `Unsafe.cast`? |
| 114 | |
| 115 | ** Answer: ??? |
| 116 | -- |
| 117 | |
| 118 | * MLton (in both of the above implementations) provides a richer |
| 119 | `MLRep` structure, utilizing ++Int__<N>__++ and ++Word__<N>__++ |
| 120 | structures. |
| 121 | + |
| 122 | -- |
| 123 | [source,sml] |
| 124 | ----- |
| 125 | structure MLRep = struct |
| 126 | structure Char = |
| 127 | struct |
| 128 | structure Signed = Int8 |
| 129 | structure Unsigned = Word8 |
| 130 | (* word-style bit-operations on integers... *) |
| 131 | structure <:SignedBitops:> = IntBitOps(structure I = Signed |
| 132 | structure W = Unsigned) |
| 133 | end |
| 134 | structure Short = |
| 135 | struct |
| 136 | structure Signed = Int16 |
| 137 | structure Unsigned = Word16 |
| 138 | (* word-style bit-operations on integers... *) |
| 139 | structure <:SignedBitops:> = IntBitOps(structure I = Signed |
| 140 | structure W = Unsigned) |
| 141 | end |
| 142 | structure Int = |
| 143 | struct |
| 144 | structure Signed = Int32 |
| 145 | structure Unsigned = Word32 |
| 146 | (* word-style bit-operations on integers... *) |
| 147 | structure <:SignedBitops:> = IntBitOps(structure I = Signed |
| 148 | structure W = Unsigned) |
| 149 | end |
| 150 | structure Long = |
| 151 | struct |
| 152 | structure Signed = Int32 |
| 153 | structure Unsigned = Word32 |
| 154 | (* word-style bit-operations on integers... *) |
| 155 | structure <:SignedBitops:> = IntBitOps(structure I = Signed |
| 156 | structure W = Unsigned) |
| 157 | end |
| 158 | structure <:LongLong:> = |
| 159 | struct |
| 160 | structure Signed = Int64 |
| 161 | structure Unsigned = Word64 |
| 162 | (* word-style bit-operations on integers... *) |
| 163 | structure <:SignedBitops:> = IntBitOps(structure I = Signed |
| 164 | structure W = Unsigned) |
| 165 | end |
| 166 | structure Float = Real32 |
| 167 | structure Double = Real64 |
| 168 | end |
| 169 | ---- |
| 170 | |
| 171 | This would appear to be a better interface, even when an |
| 172 | implementation must choose `Int32` and `Word32` as the representation |
| 173 | for smaller C-types. |
| 174 | -- |