2 Date: Tue, 23 Jul 2002 11:49:57 -0400 (EDT)
3 From: Matthew Fluet <fluet@CS.Cornell.EDU>
6 John and SML implementers,
8 Here are a loose collection of notes I've taken while starting to
9 update the MLton implementation of the SML Basis Library to the latest
10 version. They span quite a range: errata and typos, signature
11 constraint concerns, and some design questions. Thus far, I've looked
12 at the structures that had been grouped under the headings General,
13 Text, Integer, Reals, Lists, and Arrays and Vectors (i.e., excluding
14 IO, System, and Posix) in the "old" web specification.
16 A few high level comments:
18 * As an organizational principal, I liked the grouping of modules into
19 larger collections used in the "old" web specification better than
20 the long alphabetical list.
21 * I'm quite happy to see opaque signature matches for most structures.
22 In particular, I think it will help avoid porting problems between
23 implementations that provide different INTEGER structures, especially
24 when LargeInt = Int in one implementation and LargeInt = IntInf in
27 Required and optional components, Top-level:
29 * A number of structures have an opaque signature match in
30 overview.html, but not in the corresponding structure specific page:
31 General, Bool, Option, List, ListPair, IntInf,
32 Array, ArraySlice, Vector, VectorSlice.
33 * Word8Array2 is listed as required in overview.html,
34 but its signature, MONO_ARRAY2, is not required.
35 Furthermore, Word8Array2 is marked optional in mono-array2.html.
36 I don't quite see a rationale for Word8Array2 being required.
37 * With the addition of val ~ : word -> word to the WORD signature,
38 presumably ~ should be overloaded at num, rather than at intreal.
42 * In pack-float.html, the where type clauses are incorrect:
43 structure PackRealBig :> PACK_REAL
44 where type PackRealBig.real = Real.real
46 structure PackRealBig :> PACK_REAL
47 where type real = Real.real
48 * Likewise, in most places, references to basic types are unqualifed,
49 so perhaps the where clause should read
50 where type real = real
51 for the PackRealBig and PackRealLittle structures.
55 * In vector-slice.html, the description of subslice references |arr|
56 when it should reference |sl|.
57 * In {[mono-]array[-slice],[mono-]vector[-slice]}.html, the
58 description of findi references appi when it should reference findi.
59 * In mono-array-slice.html, structure CharArraySlice has the clause
60 where type array = CharVector.vector
62 where type array = CharArray.array.
63 * In mono-{vector[-slice],array[-slice],array2}.html, there are
64 Word<N> structures but no (default word) Word structures.
65 * In mono-vector.html, structure CharVector has the clause
66 where type elem = Char.char
67 while the other monomorphic vectors of basic types reference
68 the unqualified type; i.e. structure BoolVector has the clause
69 where type elem = bool.
70 * There are no "See also"'s into MONO_VECTOR_SLICE or MONO_ARRAY_SLICE
71 from MONO_VECTOR or MONO_ARRAY.
72 * A long discussion about types defined in
73 [MONO_]{ARRAY,VECTOR}[_SLICE] signatures; deferred to a separate
78 * Ordering of comparison functions (>, >=, etc.) and unary negation
79 are different within INTEGER and WORD.
80 * Ordering of functions in CHAR seems awkward.
81 * Ordering of full, slice, subslice different in ARRAY_SLICE and
83 * Ordering of foldi/fold and modifi/modify different in ARRAY2 and
86 Top-level and opaque signatures:
87 * I think it would be useful to see the entire top-level of required
88 structures written out with their respective signature constraints.
89 For example, in the description of the Math structure, the spec
90 reads: "The top-level structure Math provides these functions for
91 the default real type Real.real." Because the top-level Math
92 structure has an opaque signature match (in overview.html), then the
93 sentence above implies that there ought to be the constraint
94 where type real = real (or Real.real).
95 Granted, none of the other structures in overview.html have where
96 clauses, and most type constraints are documented in the structure
97 specific pages, but the constraint on the top-level Math.real
98 slipped my mind when I first looked at it.
102 ******************************************************************************
103 ******************************************************************************
105 Date: Tue, 23 Jul 2002 11:54:09 -0400 (EDT)
106 From: Matthew Fluet <fluet@CS.Cornell.EDU>
109 As promised, here is a longish look at the types used in Arrays and
112 Array and Vector design:
114 * The ARRAY signature includes type 'a vector.
115 Presumably, type 'a Array.vector = type 'a Vector.vector, but no
116 constraint makes this explicit.
117 * MONO_ARRAY_SLICE includes type vector and type vector_slice,
118 while the ARRAY_SLICE signature explicitly references
119 'a VectorSlice.slice and 'a Vector.vector.
120 * VECTOR_SLICE doesn't include 'a vector, but has
121 val mapi : (int * 'a -> 'b) -> 'a slice -> 'b vector
122 val map : ('a -> 'b) -> 'a slice -> 'b vector;
123 On the other hand, full, slice, base, vector, and concat
124 reference 'a Vector.vector.
126 For consistency, I'd prefer to see
128 sig type 'a vector ... end
129 signature VECTOR_SLICE =
130 sig type 'a vector type 'a slice ... end
132 sig type 'a vector type 'a array ... end
133 signature ARRAY_SLICE =
134 sig type 'a vector type 'a vector_slice
135 tyep 'a array type 'a slice ... end
136 signature MONO_VECTOR =
137 sig type elem type vector ... end
138 signature MONO_VECTOR_SLICE =
139 sig type elem type vector type slice ... end
140 signature MONO_ARRAY =
141 sig type elem type vector type array ... end
142 signature MONO_ARRAY_SLICE =
143 sig type elem type vector type vector_slice
144 type array type slice ... end
146 structure Vector :> VECTOR
147 structure VectorSlice :> VECTOR_SLICE
148 where type 'a vector = 'a Vector.vector
149 structure Array :> ARRAY
150 where type 'a vector = 'a Vector.vector
151 structure ArraySlice :> ARRAY_SLICE
152 where type 'a vector = 'a Vector
153 where type 'a vector_slice = 'a VectorSlice.slice
154 where type 'a array = 'a Array.array
155 structure BoolVector :> MONO_VECTOR
156 where type elem = bool
157 structure BoolVectorSlice :> MONO_VECTOR_SLICE
158 where type elem = bool
159 where type vector = BoolVector.vector
160 structure BoolArray :> MONO_ARRAY
161 where type elem = bool
162 where type vector = BoolVector.vector
163 structure BoolArraySlice :> MONO_ARRAY_SLICE
164 where type elem = bool
165 where type vector = BoolVector.vector
166 where type vector_slice = BoolVectorSlice.slice
167 where type array = BoolArray.array
169 While semantically, this shouldn't be any different than the
170 specification, it could effect type-error messages. For example, if I
171 have the structure Foo:
173 structure Foo = struct
176 val copyVec0 {src: vector_slice,
177 dst: array} = copyVec {src = src, dst = dst, di = 0}
180 which I decide to generalize to polymorphic array slices, then just
181 changing BoolArraySlice to ArraySlice will lead to different
182 type-error messages: either "ubound type constructor: vector_slice"
183 (under the specification) or "type constructor vector_slice given 0
184 arguments, wants 1" (under the signatures given above); and an arity
185 error for array in either case. It's not much of an argument, but I
186 need to replace vector_slice with 'a VectorSlice.slice under the
187 specification, while I only need to add 'a under the sigs above.
191 * Why not have an ARRAY2_REGION analagous to ARRAY_SLICE?
192 Likewise, how about VECTOR2 and VECTOR2_REGION?
193 I think the decision to separate Arrays and Vectors from
194 their corresponding slices is a nice design choice, and I'd be in
195 favor of extending it to multi-dimentional ones.
196 * Should ARRAY2 have findi/find, exists, all? collate?
198 ******************************************************************************
199 ******************************************************************************
201 Date: Thu, 25 Jul 2002 15:20:01 +0200
202 From: Andreas Rossberg <rossberg@ps.uni-sb.de>
205 Like Matthew I started implementing the latest version of the Basis spec
206 for Alice and Hamlet. I'm quite happy with most of the changes. It was a
207 surprise to discover the presence of a Windows structure, though :-)
209 Here is my list of comments, some of which may duplicate observations
210 already made by Matthew. They primarily cover global issues and the
211 required part of the library, though I haven't looked deeper into the IO
212 and Posix parts yet. I also included some proposals for modest additions
213 to the library, which I believe are useful and fit its spirit.
216 Trivial bugs, typos, cosmetics
217 ------------------------------
220 - INT_INF appears in the list of required signatures.
221 - WordArray2 appears under the list of required structures,
222 instead of optional ones.
225 - Typo in description of allEq: double "the".
228 - The scan example uses the deprecated "all" function.
231 - Typo in synopsis of subslice: s/opt/sz/.
232 - Typo in description of subslice: s/|arr|/|sl|/.
233 - Typo in description of findi: s/appi/findi/.
234 - Signature sometimes uses Vector.vector instead of plain vector.
235 - The equation for mapi can be simplified to:
236 Vector.fromList (foldri (fn (i,a,l) => f(i,a)::l) [] slice)
238 * MONO_VECTOR_SLICE and ARRAY_SLICE and MONO_ARRAY_SLICE:
239 - Typo in synopsis of subslice: s/opt/sz/.
240 - Typo in description of findi: s/appi/findi/.
243 - Accidental "val" keyword in synopsis of some functions.
246 - The "where" constraints contain erroneously qualified ids.
247 - The specification of the TEXT_IO signature is not valid SML'97,
248 since StreamIO is specified twice. You might want to add a
249 comment regarding that.
250 - The constraints for types vector and elem are redundant
251 (in fact, invalid), because the signature TEXT_STREAM_IO
252 already specifies the necessary equations.
254 * The use of variable names is sometimes inconsistent:
255 - Predicate arguments to higher-order functions are usually
256 named "f" (eg. List.all), sometimes "p" (eg. String.tokens,
257 StringCvt.splitl), and sometimes even "pred" (eg. ListPair.all).
258 - Similarly, fold functions mostly use "init" to name initial
259 accumulators, except in the List and ListPair modules.
263 Ambiguities / Unclear Details
264 -----------------------------
267 - The subsection about dependencies among optional modules has
268 disappeared. Does that mean that there aren't any anymore?
269 (The nice subsection about design rules and conventions also
272 * The intended meaning of opaque signature constraints is not always
273 clear to me. Sometimes the prose contains remarks about additional
274 equalities that are not appearent from the signature constraints.
275 For example, is or isn't
276 - Text.Char.char = Char.char ? (and so on for the rest of Text)
277 - LargeInt.int = IntN.int (for some structure IntN) ?
278 (likewise LargeWord.word, LargeReal.real)
279 - Char.string = String.string ?
280 - Math.real = Real.real ?
281 In particular, the spec sometimes speaks of "equal structures",
282 which has no real technical meaning in SML'97.
283 Note that from the opaque matching on the overview page one might
284 even conclude that General.unit <> {} !
286 * The type specification of String.string and CharVector.vector
288 structure String :> STRING
289 where type string = CharVector.vector
290 structure CharVector :> MONO_VECTOR
291 where type vector = String.string
292 Likewise for Substring.substring and CharVectorSlice.slice.
293 A respective defining structure should be chosen.
296 - Function fromString has a special case that is not covered by
297 implementing the function through straight-forward iterative
298 application of the Char.scan function, namely a trailing gap
299 escape (\f...f\) as in "foo\\ \\" or "foo\\ \\\000" (where \000
300 is an non-convertible character). Several implementations I
301 tried get that detail wrong, so a corresponding note might be
302 in order. Moreover, it is not completely obvious from the
303 description what the result should be for strings that contain
304 a gap escape as the only convertible sequence, e.g. "\\ \\" or
305 "\\ \\\000" - it is supposed to be SOME "", I guess.
308 - Shouldn't span raise Span if i' < i? Otherwise, contrary
309 to the prose, it in fact accepts arguments where ss' is
310 left to ss, as long as they overlap (which is rather odd).
311 - For the curried triml/trimr it is not clear whether an
312 Subscript exception has to be raised already if k < 0 but no
313 second argument is applied.
317 Naming and structuring
318 ----------------------
320 Its nicely chosen regular naming conventions and structure are two of
321 the aspects I like most about the Standard Basis. The following list
322 enumerates the few cases where I feel that the spec violates its own
326 - The fromLargeWord and toLargeWord functions should drop
327 the "Word" suffix to be consistent with the corresponding
328 functions in the REAL and INTEGER signatures.
331 - The functions contains/notContains should be moved to the
332 STRING signature, as they are similar to find/exist
333 operations and thus functionality of the aggregate. The
334 type string could then be removed from the signature.
336 * ARRAY_SLICE and MONO_ARRAY_SLICE:
337 - The function copyVec seems completely out of place: it does
338 neither operate on array slices, nor on vectors. But honestly
339 I have got no idea where else to put it :-(
341 * STRING and SUBSTRING:
342 - There is a certain asymmetry between slices and substrings
343 which tends to confuse at least myself when hacking. For more
344 consistency I propose:
345 (1) changing the type of Substring.substring to
346 string * int * int option -> substring
347 (for consistency with VectorSlice.slice),
348 (2) renaming Substring.slice to Substring.subsubstring,
349 (for consistency with VectorSlice.subslice),
350 (3) removing Substring.{app,foldl,foldr} (there are no similar
351 functions in the STRING signature, and in both cases they
352 are available through CharVector/CharVectorSlice),
353 (4) removing String.extract and Substring.extract (the same
354 functionality is available through CharVector[Slice]).
355 - I believe the deprecated Substring.all can be removed for good.
356 After all, there are more serious incompatible changes being
357 made (e.g. array copying functions).
359 * Vectors and arrays:
360 - While the lib consistently uses the to/from convention for
361 conversions on basic types, it sometimes uses adhoc conventions
362 for aggregates. I propose renaming:
363 (1) Array.vector to Array.toVector
364 (2) VectorSlice.vector to VectorSlice.toVector,
365 (3) ArraySlice.vector to ArraySlice.toVector,
366 (4) Substring.string to Substring.toString,
367 - Since the copy functions have only 3, mostly distinctly typed
368 arguments now, there no longer seems to be a strong reason to
369 require passing those by notationally heavy records.
372 - The presence of bit fiddling operators in that signature is
373 something that feels exceptionally ad-hoc. Either they should
374 be available for all integer types, or there should be a
375 separate WORD_INF, with appropriate conversions, that makes
379 - Now that there is Word.~ (which is good) it seems rather odd
380 that the toplevel ~ is not overloaded for words, i.e. does not
384 - I really like the idea of structuring the library namespace as
385 it has been done with the OS and Posix structures. I would
386 prefer to see something similar being done for the added
387 network functionality. More precisely, I propose
388 (1) moving the structures Socket, INetSock, GenericSock, and
389 the three Net*DB structures into a new wrapper structure
390 Net (renaming Net*DB to *DB),
391 (2) defining a corresponding signature NET,
392 (3) renaming the signatures SOCKET, GENERIC_SOCK and INET_SOCK
393 to NET_SOCKET, NET_GENERIC_SOCK and NET_INET_SOCK, resp.,
394 (4) moving UnixSock to the Unix structure (renamed as Socket).
398 Misc. proposals for additional functionality
399 --------------------------------------------
401 Here is a small collection of miscellaneous simple functions which I
402 believe the library is still lacking, either because they are commonly
403 useful or because they would make the library more regular.
405 * LIST and LIST_PAIR:
406 - The IMHO single most convenient extension to the library would
407 be indexed morphisms on lists, i.e. adding
408 val appi : (int * 'a -> unit) -> 'a list -> unit
409 val mapi : (int * 'a -> 'b) -> 'a list -> 'b list
410 val foldli : (int * 'a * 'b -> 'b) -> 'b -> 'a list -> 'b
411 val foldri : (int * 'a * 'b -> 'b) -> 'b -> 'a list -> 'b
412 val findi : (int * 'a -> bool) -> 'a list -> (int * 'a) option
413 - Likewise for LIST_PAIR.
414 - LIST_PAIR does not support partial mapping:
415 val mapPartial : ('a * 'b -> 'c option) ->
416 'a list * 'b list -> 'c list
418 * LIST, VECTOR, ARRAY, etc.:
419 - Another function on lists that would be very useful from my
421 val appr : ('a -> unit) -> 'a list -> unit
422 and its indexed sibling
423 val appri : (int * 'a -> unit) -> 'a list -> unit
424 which traverse the list from right to left.
425 - Likewise for all aggregate types.
426 - All aggregates come with a fromList function. I often feel the
427 need to have inverse toList functions. Use of foldr is obfuscating.
430 - Often using isSome is a bit clumsy. I thus propose adding the dual
431 val isNone : 'a option -> bool
433 * STRING and SUBSTRING:
434 - For historical reasons we have {String,Substring}.size instead
435 of *.length, which is inconsistent with all other aggregates and
436 frequently lets me mix them up when I use them side by side.
437 I propose adding aliases
442 * WideChar and WideString:
443 - There is no convenient way to convert between the standard and
444 wide character set. Would it be reasonable to introduce LargeChar
445 and LargeString structures (and so on) and have the CHAR and
446 STRING signatures enriched by fromLarge/toLarge functions, as for
447 numbers? That would also allow a program to select the widest
448 character set available (which is currently impossible within the
452 - I don't quite see the rationale for which signatures contain a
453 scan function and which don't. I believe it makes sense to have
454 scan in every signature that has fromString.
455 - There should be a function
456 val scanC : (Char.char, 'a) StringCvt.reader
457 -> (char, 'a) StringCvt.reader
458 to scan strings as C characters. This would make Char.fromCString
459 and particularly String.fromCString more modular.
460 - How about a dual writer abstraction as with
461 type ('a,'b) writer = 'a * 'b -> 'b option
462 and supporting fmt functions for basic types? Such a thing might
463 be useful for writing to streams or buffers.
466 For some time now I have been trying to use vectors more often
467 instead of an often inappropriate list representation. This is
468 sometimes made more difficult simply because the library support
469 isn't as good as for lists. It improved in the updated version
474 - Vector.append (though I guess concat is good enough),
475 - most of all: a VectorPair structure.
478 - Giving every basic type a (default) hash function in addition to
479 comparison would be quite useful in conjunction with container
482 * There is no defining structure for references. I would like to see
486 datatype ref = datatype ref
488 val := : 'a ref * 'a -> unit
489 val swap : 'a ref * 'a ref -> unit (* or :=: ? *)
490 val map : ('a -> 'a) -> 'a ref -> 'a ref
491 You might then consider removing ! and := from GENERAL.
493 * Signature conventions:
494 Some additional conventions would make use of Basis types as
495 functor arguments more convenient:
496 - Each signature defining an abstract type should make that
497 type available under the alias "t" as well (this includes
498 monomorphic types as well as polymorphic ones).
499 - Every equality type should come with an explicit equality
501 val eq : t * t -> bool
502 to move away from the reliance on eqtypes.
503 - There should be a uniform name for canonical constructor
504 functions, e.g. "new" (or at least an alias).
507 Andreas Rossberg, rossberg@ps.uni-sb.de
509 ******************************************************************************
510 ******************************************************************************
512 Date: Fri, 2 Aug 2002 14:04:16 +0100
513 From: David Matthews <David.Matthews@deanvillage.com>
516 I've been having another look at the Basis library implementation in
517 Poly/ML and in particular the I/O library. I'm still not sure I fully
518 understand the implications of the Stream IO (functional IO) layer and
519 in particular the way "canInput" works and interacts with "input".
521 The definition says that canInput(f, n) returns SOME k "if a call to
522 input would return immediately with at least k characters".
523 Specifically it does not say "if a call to inputN(f, k) would return
524 immediately". Secondly it says that it "should attempt to return as
525 large a k as possible" and gives the example of a buffer containing 10
526 characters with the user calling canInput(f, 15). This suggests that a
527 call to canInput could have the effect of committing the stream since a
528 perfectly good implementation of "input" would be to return what was
529 left of the buffer, i.e. 10 characters, and only read from the
530 underlying stream on a subsequent call to "input". Yet after a call to
531 canInput(f, 15) which returns SOME 15 the call to "input" is forced to
532 return at least 15. In other words a call to canInput changes the
533 behaviour of a subsequent call to "input". Generally, what is the
534 behaviour of canInput with an argument larger than the buffer size? How
535 far ahead is canInput expected to read?
537 A few other notes of things I've discovered, some of which are trivial:
539 The signature for TextIO.StreamIO contains duplicates of
540 where type StreamIO.reader = TextPrimIO.reader
541 where type StreamIO.writer = TextPrimIO.writer
543 There are declared constants for platformWin32Windows2000 and
544 platformWin32WindowsXP in the Windows structure. When I proposed the
545 Windows.Config structure I didn't include constants for these versions
546 of the OS because the underlying GetVersionEx function returns the same
547 value, VER_PLATFORM_WIN32_NT in the dwPlatformId field for NT, Windows
548 2000 and XP It is possible to distinguish these but only using the
549 major and minor version fields. Windows CE does give a different value
550 for the platformID. I would say it is confusing to have these here
551 because it implies that it's possible to discriminate on the basis of
552 the platformID field.
554 The example definition of input1 at the bottom of STREAM_IO returns a
555 value of type elem option * instream when the signature says it should
556 be (elem * instream) option.
558 Description of "input" function in STREAM_IO signature. The word "ay"
564 ******************************************************************************
565 ******************************************************************************
567 Date: Fri, 11 Oct 2002 17:46:59 -0400 (EDT)
568 From: Matthew Fluet <fluet@CS.Cornell.EDU>
571 Following up my previous post, here is another loose collection of
572 notes I've taken while updating the MLton implementation of the SML
573 Basis Library. This includes the structures that had been grouped
574 under the headings System, Posix, and IO in the "old" web
577 Required and optional components:
578 * The optional functors PrimIO, StreamIO, and ImperativeIO are not
579 listed among the optional components in overview.html.
582 * The discussion for the ListPair structure says:
583 "Note that a function requiring equal length arguments may determine
584 this lazily, i.e. , it may act as though the lists have equal length
585 and invoke the user-supplied function argument, but raise the
586 exception when it arrives at the end of one list before the end of the
588 Such an implementation choice seems to go against the spirit that
589 programs run under conforming implementations of the Basis Library
590 should behave the same.
593 * In posix.html, last sentence in Discussion: "onsult" instead of
596 * In posix-signal.html, in Discussion: "The name of the coressponding
597 ..." sentence is repeated.
599 * In the discussion of POSIX_ERROR:
600 "The name of a corresponding POSIX error can be derived by
601 capitalizing all letters and adding the character ``E'' as a
602 prefix. For example, the POSIX error associated with nodev is
603 ENODEV. The only exception to this rule is the error toobig, whose
604 associated POSIX error is E2BIG."
605 It isn't clear if this is the intended semantics for errorName and
609 * The type time now includes "negative values moving to the past."
610 In the absence of negative values, the text for the the
611 to{Seconds,Milliseconds,Microseconds} functions to drop fractions of
612 the time unit was unambigous. With negative values, I would
613 interpret this as rounding towards zero. Is this correct? Would it
614 be clearer to describe the rounding as such?
615 * The + and - functions are required to raise Overflow, although most
616 other "result not representable as a time value" error raises Time.
617 * The - function is written prefix instead of infix in the
619 * The scan and fromString functions do not specify how to treat a
620 value with greater precision than the internal representation;
621 should it have rounding or truncation semantics? Also, the
622 functions are required to raise Overflow for an unrepresentable
626 * The nice introduction to IO that appears at
627 http://cm.bell-labs.com/cm/cs/what/smlnj/doc/basis/pages/io-explain.html
628 doesn't seem to be included with the new pages.
629 * The functor arguments in PrimIO, StreamIO, and ImperativIO functors
630 don't match; some use structure A: MONO_ARRAY and others use
631 structure Array: MONO_ARRAY.
634 * The PRIM_IO signature requires pos to be an eqtype, but the PrimIO
635 functor argument only requires pos to be a type.
636 * readArr[NB], write{Vec,Arr}[NB] take "slices" (records of type {buf:
637 {vector,array}, i: int, sz: int option}) but no description of the
638 appropriate action to take when the slices are invalid. Presumably,
639 they should raise Subscript.
640 * There are a number of "contradictory" statments:
641 "Readers and writers should not, in general, raise the IO.Io
642 exception. It is assumed that the higher levels will appropriately
643 handle these exceptions."
644 "A reader is required to raise IO.Io if any of its functions, except
645 close or getPos, is invoked after a call to close. A writer is
646 required to raise IO.Io if any of its functions, except close, is
647 invoked after a call to close."
648 "closes the reader and frees operating system resources. Further
649 operations on the reader (besides close and getPos) raise
651 "closes the writer and frees operating system resources. Further
652 operations (other than close) raise IO.ClosedStream."
653 * The augment_reader and augment_writer functions may introduce new
654 functions. Should the synthesized operations handle IO.Io
655 exceptions and change the function field? Maybe this falls under
656 the "intentionally unspecified" clause.
658 StreamIO() and STREAM_IO:
659 * What is the difference between a terminated output stream and a
660 closed output stream? Some operations say what to do when the
661 stream is terminated or closed, but many are unspecified when the
662 other condition holds. I resolved this by looking at the IO
663 introduction mentioned above, where it discusses stream states.
664 But, closeOut is still confusing: "flushes f's buffers, marks the
665 stream closed, and closes the underlying writer. This operation has
666 no effect if f is already closed. If f is terminated, it should
667 close the underlying writer." Shouldn't closeOut always execute the
668 underlying writer's close function? The only way to terminate an
669 outstream is to getOutstream, but I would really expect
670 TextIO.closeOut to "really" close the underlying
671 file/outstream/writer.
672 * The IO structure has dropped the TerminatedStream exception, but
673 there seem to be sufficient cases when a stream should raise an
674 exception when it is terminated.
675 * The semantics of the vector returned by getReader are unclear. At
676 the very least, the source code for SML/NJ and PolyML have very
677 different interpretations, and I've chosen yet another. I think
678 part of the problem is that the word "[un]consumed" only appears in
679 the description of this function, so it's unclear what corresponds
681 * I suspect the example under endOfStream is wrong:
683 In these cases the StreamIO.instream will also have multiple EOF's;
684 that is, it can be that
686 val true = endOfStream(f)
687 val ("",f') = input f
688 val true = endOfStream(f')
689 val ("xyz",f'') = input f
691 The fact that input f can return two different values would seem to
692 violate the principal argument for functional streams! Looking at
693 the aforementioned IO introduction in the "old" pages, I see the
694 more reasonable example:
696 Consequently, the following is not guaranteed to be true:
698 let val z = TextIO.StreamIO.endOfStream f
699 val (a,f') = TextIO.StreamIO.input f
700 val x = TextIO.StreamIO.endOfStream f'
701 in x=z (* not necessarily true! *)
704 whereas the following is guaranteed to be true:
706 let val z = TextIO.StreamIO.endOfStream f
707 val (a,f') = TextIO.StreamIO.input f
708 val x = TextIO.StreamIO.endOfStream f (* note, no prime! *)
709 in x=z (* guaranteed true! *)
711 * David Matthews's post on Aug. 2 raised questions about canInput
712 which are unresolved.
715 * Various operations in IO take "slices", but aren't expressed in
716 terms of {Vector,Array}Slice structures. One difficulty with this
717 is that the slice types are not in scope within the IO signatures.
719 I would really advocate making the VectorSlice structure a
720 substructure of the Vector structure (and likewise for arrays).
721 Even if this isn't done for the polymorphic vector/array structures,
722 it would be extremely beneficial for the monomorphic structures,
723 where in the {Prim,Stream,Imperative}IO functors, it is impossible
724 to access the corresponding monomorphic vector/array slice
725 structures. I found myself using Vector.tabulate when I really
726 wanted ArraySlice.vector.
728 The "old" MONO_ARRAY signature included structure Vector:
729 MONO_VECTOR which gave access to the corresponding monomorphic
734 ******************************************************************************
735 ******************************************************************************
737 Date: Fri, 13 Dec 2002 15:57:55 +0100
738 From: Andreas Rossberg <rossberg@ps.uni-sb.de>
741 Here is a collection of issues and comments we gathered when
742 implementing the I/O stack from the Standard Basis (primitive, stream,
743 imperative I/O) for Alice. While in general the specification seems to
744 be pretty precise and complete, we sometimes found it hard to understand
745 the semantic details of stream I/O, especially since many of them can
746 only be derived indirectly from the examples in the discussion section
747 and there appear to be some minor ambiguities and inconsistencies. Also,
748 the PrimIO and StreamIO functors cannot always be implemented as
749 suggested, because of their parametricity in types such as position and
752 As a general note, the I/O interface does not seem to have been designed
753 with concurrency in mind. In particular, augmenting readers and writers
754 cannot be made thread-safe, AFAWCS. This is a bit of a problem for us,
755 since Alice is relying on concurrency. However, that does not seem to be
756 an issue easily solved.
758 - Leif Kornstaedt, Andreas Rossberg
766 - function field: (pedantic) The wording seems to imply that only
767 functions from STREAM_IO raise the Io exception, but this is
768 clearly not the case (consider TextIO.openIn to name just one).
770 * datatype buffer_mode:
772 - There is no specification of what precisely line buffering is
773 supposed to mean, in particular for non-text streams.
777 The PRIM_IO signature
778 ---------------------
782 - (pedantic) It says that "higher level I/O facilities do not
783 access the OS structure directly...". That's somewhat misleading
784 since OS does not provide the same functionality anyway (if any,
785 it was the Posix structure).
789 - Unlike for writers, it is not specified what the minimal set of
790 operations is that a reader must support.
792 - It is not specified whether multiple end-of-streams may occur.
793 Since they are anticipated for StreamIO, one should expect them
794 to be possible for underlying readers as well. However, this
795 requires clarification of the semantics of several operations.
797 - readArr, readArrNB: It is specified nowhere what the option for
798 sz is supposed to mean, i.e. what the semantics of NONE is
799 (presumably as for slices).
801 - readVec, readVecNB: Unlike all other similar read and write
802 functions, these two do not accept an option for the size
805 - avail: The description suggests that the function can be used as
806 a hint by inputAll. However, this information is too inaccurate
807 to be useful, since (apart from translation issues) the physical
808 size of elements cannot be obtained (in particular in the
809 StreamIO functor, which is parametric in the element type). In
810 practice, endPos seems to be more useful for this purpose. So it
811 is not clear what purpose avail could actually serve at all at
812 the abstraction level provided by readers.
815 (1) May it block? For example, when reading from terminal or
816 from another kind of stream, this can be naturally expected.
818 (2) Which position is returned if there are multiple
821 - getPos, setPos, endPos, verifyPos: Description should start with
824 - setPos, endPos: Should not raise an exception if unimplemented,
825 but rather be NONE. Actually, the implementation notes on writers
826 state that endPos *must* be implemented for readers.
828 - Implementation note, item 6: Why is it likely that the client
829 uses getPos frequently? And why should the reader count
830 *untranslated* elements (and how would there be actual elements
832 (See also comments on STREAM_IO.filePosIn)
836 - writeVec, writeArr, writeVecNB, writeArrNB:
837 (1) Again, it is not specified what the optional size means.
839 (2) When may k < sz occur without having IO failure? If it is
840 arbitrary, then there appears to be no correct way to write a
841 sequence of elements, because it is neither possible to detect
842 partial element writes (which are explained in the paragraph
843 before the Implementation Notes), nor to complete such writes.
844 This particularly implies that the StreamIO functor cannot
845 implement flushing correctly (see below).
847 - getPos, setPos, endPos, verifyPos: Description should start with
850 - getPos, setPos: Should not raise an exception if unimplemented,
853 - last paragraph before Implementation Note: Typo, double "plus".
855 - first sentence in Implementation Note: (pedantic) Why is this
856 put into the implementation notes when it actually seems to be a
857 requirement of the specification?
859 - last paragraph of Implementation Note:
860 (1) States that readers must implement getPos, which seems to be
861 contradicted by its optional type.
863 (2) Typo, double "need".
867 - Is this supposed to support random access? Note that for types
868 generated with the PrimIO functor it cannot (see below)! That
869 seems to make this function rather useless.
871 * augmentReader, augmentWriter:
873 - It is not possible to synthesize operations in a way that is
874 thread-safe in concurrent systems, hence it should be noted that
875 augmenting is potentially dangerous.
877 * There is no reference to the PrimIO functor.
886 - Since the implementation is necessarily parametric in the pos
887 type, openVector, nullRd, nullWr cannot create readers that
888 allow random access, although one would expect that at least for
893 - Structure names A and V are inconsistent with the StreamIO and
894 ImperativeIO functors.
896 - Type pos has to be an eqtype to match the result signature.
898 - Since the extract and copy functions have been removed/changed
899 from ARRAY and VECTOR signatures, the PrimIO functor now
900 naturally requires slice structures for efficient
901 implementation. (Likewise the StreamIO functor)
905 - Type sharing of the pos type is not specified, though essential
906 for this functor being useful at all.
911 The STREAM_IO signature
912 -----------------------
916 - An exception likely to be raised in by the underlying
917 reader/writer is Size, which is not mentioned. OTOH, Fail can
918 only occur in the rare case of user-supplied readers/writers, as
919 the Basis itself is supposed to never raise it.
923 - A note on the meaning of this type would be desirable, since its
924 canonical representation is (outstream * pos) rather than pos.
925 (That also may have caused confusion in the discussion of
926 imperative I/O, see below.)
930 - The signature of this function is inconsistent with all other
931 input functions. It should rather have type
933 instream -> elem option * instream
935 which in fact appears to be the type assumed in the discussion
936 example relating input1 to inputN.
944 - This function is somewhat underspecified for n=0. In particular,
945 may it block? Is it required to raise Io if the underlying
948 * input, input1, inputN, inputAll:
950 - (pedantic) Descriptions speak of "underlying system calls",
951 although the reader may not actually depend on system calls.
952 Preferably speak of "underlying reader" only.
956 - Likewise, description speaks of "releasing system resources".
957 This should be replaced by saying that it closes the underlying
958 reader (which is not even specified as is).
962 - Does the function attempt to close the stream even if flushing
965 - Why is it possible to close terminated streams? That seems to
966 allow unfortunate interference with another stream that has been
967 created from the extracted writer.
969 * mkInstream, getReader:
971 - The table seems to imply that mkInstream always augments its
972 reader. This is inappropriate for concurrent environments (see
975 - Should getReader return the original or the augmented reader?
977 - The table still includes the removed getPosIn and setPosIn
980 * mkOutstream, getWriter:
986 - There seems to be no way to implement this function for buffered
987 I/O, because the reader position that corresponds to a
988 mid-block-element is not available and cannot be calculated in
989 general. So how is this meant?
991 - Typo, s/character/element/
999 - It is non-obvious what the precise meaning of "terminating" a
1000 stream is. If this is merely setting a status flag then a
1001 corresponding note would be helpful.
1005 - May this flush the stream (and hence raise Io exceptions)?
1009 - This may raise an exception because the position has been
1010 invalidated after obtaining it (e.g. by file truncation
1011 performed by another process).
1013 - Typo, s/underlying device/underlying writer/
1015 * setBufferMode, getBufferMode:
1017 - There is no specification of the semantics of line buffering, in
1018 particular for non-text streams.
1019 (See also comments on StreamIO functor)
1021 - It is not specified whether the stream may be flushed when set
1022 to LINE_BUF mode (may cause Io exception). It seems unreasonable
1023 to require it not to do so (assuming that line buffering is
1024 intended to maintain the invariant that the buffer never
1025 contains line breaks).
1027 - The synopsis of this function uses "ostr", while all others
1028 use "f" for streams.
1030 * setPosOut, setBufferMode, getWriter:
1032 - Can raise an exception if flushing fails.
1036 - The statement that closing a stream just causes the
1037 not-yet-determined part of the stream to be empty should
1038 probably be generalised to explain what *truncating* a stream
1039 means (getReader also truncates the stream).
1041 - Example of freshly opened stream:
1042 s/mkInstream r/mkInstream(r, vector [])/
1046 s/mkInstream r/mkInstream(r, vector [])/
1049 - input1/inputN relation example:
1050 (1) Inconsistent with the actual typing of input1 (see above).
1052 (2) Typo, s/inputN f/inputN(f,1)/
1054 - Unbuffered I/O, 1st example:
1056 s/mkInstream(reader)/mkInstream(reader, vector [])/
1057 s/PrimIO.Rd{chunkSize,...}/(PrimIO.RD{chunksize,...}, v)/
1059 (2) More importantly, the actual condition appears to be
1060 incorrect. It should read:
1061 (chunkSize > 1 orelse length v = 1) andalso endOfStream f'
1063 - Unbuffered I/O, 2nd example:
1064 s/mkInstream(reader)/mkInstream(reader, vector [])/
1065 s/PrimIO.Rd{chunkSize,...}/(PrimIO.RD{chunksize,...}, v)/
1066 The condition must be corrected as above.
1068 * There is no reference to the StreamIO functor.
1072 The StreamIO functor
1073 --------------------
1077 - It is impossible for this functor to support line buffering,
1078 since it has no way of knowing which element consists a line
1079 break. This could be solved by changing the someElem functor
1080 argument to a breakElem argument.
1082 - It is also impossible to utilize reader's endPos for
1083 pre-allocation, because the functor is parametric in the
1088 - Since the extract and copy functions have been removed/changed
1089 from ARRAY and VECTOR signatures, the StreamIO functor now
1090 naturally requires slice structures for efficient
1091 implementation. (Likewise the PrimIO functor)
1095 - Type sharing of the result types is not specified.
1097 * Discussion, paragraph on flushing:
1099 - Most of this discussion rather belongs to the description of
1102 - Everything said here is not restricted to flushOut, but applies
1103 to flushing in general.
1105 - Unfortunately, it is left unspecified where flushing may happen
1106 and, consequently, where respective Io exceptions may occur.
1108 - Write retries as suggested here seem to be impossible to
1109 implement correctly using the writer interface as specified (see
1110 comments on PRIM_IO.writer).
1112 - According to the writer description, write operations may never
1113 return an element count of 0, so the last sentence is
1116 * Discussion, last paragraph:
1120 * Implementation note:
1122 - 3rd bullet: typo, s/PrimIO.augmentIn/PrimIO.augmentReader/
1124 - 5th and 6th bullet: The endPos function cannot be utilized as
1125 suggested, because the functor is necessarily parametric in the
1130 The IMPERATIVE_IO signature
1131 ---------------------------
1135 - It is unfortunate that imperative I/O is asymmetric with respect
1136 to providing (limited) random access on input vs. output streams
1137 - the former requires going down to the lower-level stream I/O.
1138 That makes imperative I/O a somewhat incomplete abstraction
1141 - Likewise, it would be desirable if there were ways for
1142 performing full-fledged random access without leaving the
1143 imperative I/O abstraction layer, at least for streams were it
1144 is suitable (e.g. BinIO). Despite the statement in the
1145 discussion this is neither available for input nor for output
1146 streams (see comments below).
1150 - Typo, s/S.closeIn/StreamIO.closeIn/
1154 - Typo, s/S.flushOut/StreamIO.flushOut/
1158 - Typo, s/S.closeOut/StreamIO.closeOut/
1162 - Equivalences, last line: s/StreamIO.output/StreamIO.flushOut/
1164 - Paragraph about random-access on output streams: It says that
1165 BinIO.StreamIO.out_pos = Position.int. This is not true, we have
1166 BinPrimIO.pos = Position.int, but that is a completely different
1167 type. In fact, it is impossible to implement out_pos as
1170 * There is no reference to the ImperativeIO functor.
1174 The ImperativeIO functor
1175 ------------------------
1179 - The Array argument is unnecessary.
1183 - Type sharing of the result types is not specified.
1187 The TEXT_STREAM_IO signature
1188 ----------------------------
1192 - Why bother separating this signature from STREAM_IO?
1193 => outputSubstr can easily be generalised to outputSlice
1195 => if line buffering is part of STREAM_IO, inputLine
1200 The TextIO structure
1201 --------------------
1205 - Systems providing WideText should also provide a WideTextIO
1206 structure (they have to provide WideTextPrimIO already, which
1207 seems inconsistent).
1211 - Duplicated type constraints for StreamIO.reader and
1217 --------------------
1221 - Type sharing with BinPrimIO is not specified (unlike for
1222 TextIO), i.e. the following constraints are missing:
1224 where type StreamIO.reader = BinPrimIO.reader
1225 where type StreamIO.writer = BinPrimIO.writer
1226 where type StreamIO.pos = BinPrimIO.pos
1228 ******************************************************************************
1229 ******************************************************************************
1230 ******************************************************************************
1231 ******************************************************************************
1233 Doing host/network byte order conversions on ML side.
1236 * Semantics of setNBIO, getNREAD, getATMARK are unclear;
1237 Don't seem to be accessible via {get,set}sockopt;
1238 Instead, using ioctl.
1240 ******************************************************************************
1241 ******************************************************************************
1244 * Within structure S, the type mode is constrained equal to flags,
1245 but flags is an eqtype.
1248 * "This is the type of positions in the underlying readers and
1249 writers. In some instantiations of this signature (e.g.,
1250 TextIO.StreamIO), pos is abstract; in others (e.g., BinIO.StreamIO)
1251 it is Position.int." But, the equality of BinIO.StreamIO.pos and
1252 Position.int is never specified in any where constraint of BinIO.
1253 * How can filePosIn be implemented with completely abstract pos?
1257 * (In general, probably a good idea to look at the entire top-level
1258 structure/signature matches and choose a consistent usage of base
1259 types. For example, Int:>INTEGER would seem to hide the top-level
1260 int; unless Int is opened afterwards. But, then what about all the
1261 other structures that reference int? Is top-level int = Int.int or
1262 is Int.int = top-level int.)
1263 --> I think I'm biased from looking at the MLton implementation,
1264 becuase I'm finding it hard to think about how to really express all
1265 of the sharing constraints in a way that will be acceptable. This
1266 might be the wrong way to look at things: the listing of structures
1267 and signatures with clauses doesn't correspond to a build order, it
1268 corresponds to the way the environment should look to the program.
1270 Sequences and Slices:
1271 Why not existsi, alli?
1274 Why no vector: int * 'a -> 'a vector?
1279 If one defines VECTOR_SLICE by including a type 'a vector and replace
1280 'a Vector.vector with the local 'a vector, but then binds
1281 structure Vector: VECTOR
1282 structure VectorSlice: VECTOR_SLICE where type 'a vector = 'a Vector.vector
1283 at the top-level, does one violate the basis spec?
1284 Rationale: it's easiset to implement Vector and VectorSlice
1285 simultaneously, say with VectorSlice as a substructure of Vector (in
1286 fact, with all of the Vector operations being dispatched to the
1287 corresponding VectorSlice ops with full slices), so Vector isn't in
1288 scope for the VECTOR_SLICE.
1289 *** No, it's not o.k., because opening VectorSlice will introduce a binding
1290 for 'a vector; but, if we're lucky, John will accept the proposal.
1293 toString prepends a #"~" even when the class is NAN?
1294 *** I guess this is o.k.; there is an explicit sign field.
1297 structure Pack<N>Big :> PACK_WORD (* OPTIONAL *)
1298 structure Pack<N>Little :> PACK_WORD (* OPTIONAL *)
1300 val subVec : Word8Vector.vector * int -> LargeWord.word
1301 i.e., reference to LargeWord.word.
1305 val subVec : Word8Vector.vector * int -> word
1307 structure Pack<N>Big :> PACK_WORD with word = Word<N>.word (* OPTIONAL *)
1308 Should there be PackBig and PackLittle with word = Word.word?
1309 Should there be PackLargeBig with word = LargeWord.word?
1310 There aren't many structures that refine on LargeXYZ; most refine on XYZ<N>.
1311 *** O.k., we always unpack into a LargeWord, which we could then
1312 Word<N>.fromLargeWord back to the size. I guess this is o.k.; It
1313 lets an implementation give more Pack<N>Big structures than there
1314 are Word<N> structures.
1317 + why are Int32_gtu and Int32_geu primitive?
1318 Why not just Word.fromInt and use Word comparisons?
1319 + Real:>REAL doesn't match basis because it may peform
1320 arithmetic at extended precision. Should this be mentioned
1322 + QUESTION: proc-env.sml
1323 + QUESTION: char.sml
1324 + check uses of {Vector,Array}Slice.slice for replacement by unsafeSlice.
1327 ******************************************************************************
1328 ******************************************************************************
1331 I'm not quite sure how the ('a, 'b) proc type is supposed to work in
1332 practice; The old Unix structure just used them as
1333 TextIO.{in,out}streams. My suspicion is that we're supposed to use
1334 Posix.IO.mk{Bin,Text}{Reader,Writer} functions and then use the type
1335 system to ensure that if we force a stream to be bin or text, then all
1336 other uses have to be the same. I also suspect that we're only
1337 supposed to lift the file_desc up to an instream/outstream once; i.e.,
1338 multiple textInstreamOf calls should continue to return the same
1339 TextIO.instream. That would seem to suggest we need an 'a option ref
1340 that can be banged at the first call to a streamOf function, and
1341 subsequent calls just return the value there.
1345 return a text or binary instream connected to the standard output
1346 stream of the process pr. Note the multiple calls to these
1347 functions on the same proc will result in multiple streams that
1348 all share the same underlying Unix stream.
1352 return a text or binary outstream connected to the standard input
1353 stream of the process pr. Note the multiple calls to these
1354 functions on the same proc will result in multiple streams that
1355 all share the same underlying Unix stream.
1358 returns a pair of input and output text streams associated with
1359 pr. This function is equivalent to (textInstream pr, textOutstream
1360 pr) and is provided for backward compatibility.