@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009,
-@c 2010, 2011 Free Software Foundation, Inc.
+@c 2010, 2011, 2013 Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
@node Input and Output
* Port Types:: Types of port and how to make them.
* R6RS I/O Ports:: The R6RS port API.
* I/O Extensions:: Using and extending ports in C.
+* BOM Handling:: Handling of Unicode byte order marks.
@end menu
@end defvr
@deffn {Scheme Procedure} port-encoding port
-@deffnx {C Function} scm_port_encoding
+@deffnx {C Function} scm_port_encoding (port)
Returns, as a string, the character encoding that @var{port} uses to interpret
its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
@end deffn
created.
@end deffn
+@deffn {Scheme Variable} %default-port-conversion-strategy
+The fluid that defines the conversion strategy for newly created ports,
+and for other conversion routines such as @code{scm_to_stringn},
+@code{scm_from_stringn}, @code{string->pointer}, and
+@code{pointer->string}.
+
+Its value must be one of the symbols described above, with the same
+semantics: @code{'error}, @code{'substitute}, or @code{'escape}.
+
+When Guile starts, its value is @code{'substitute}.
+
+Note that @code{(set-port-conversion-strategy! #f @var{sym})} is
+equivalent to @code{(fluid-set! %default-port-conversion-strategy
+@var{sym})}.
+@end deffn
@node Reading
@deffn {Scheme Procedure} unread-char cobj [port]
@deffnx {C Function} scm_unread_char (cobj, port)
-Place @var{char} in @var{port} so that it will be read by the
+Place character @var{cobj} in @var{port} so that it will be read by the
next read operation. If called multiple times, the unread characters
will be read again in last-in first-out order. If @var{port} is
not supplied, the current input port is used.
@var{message} can contain @code{~A} (was @code{%s}) and
@code{~S} (was @code{%S}) escapes. When printed,
the escapes are replaced with corresponding members of
-@var{ARGS}:
+@var{args}:
@code{~A} formats using @code{display} and @code{~S} formats
using @code{write}.
If @var{destination} is @code{#t}, then use the current output
@deffn {Scheme Procedure} seek fd_port offset whence
@deffnx {C Function} scm_seek (fd_port, offset, whence)
-Sets the current position of @var{fd/port} to the integer
+Sets the current position of @var{fd_port} to the integer
@var{offset}, which is interpreted according to the value of
@var{whence}.
@defvar SEEK_END
Seek from the end of the file.
@end defvar
-If @var{fd/port} is a file descriptor, the underlying system
+If @var{fd_port} is a file descriptor, the underlying system
call is @code{lseek}. @var{port} may be a string port.
The value returned is the new position in the file. This means
@deffn {Scheme Procedure} ftell fd_port
@deffnx {C Function} scm_ftell (fd_port)
Return an integer representing the current position of
-@var{fd/port}, measured from the beginning. Equivalent to:
+@var{fd_port}, measured from the beginning. Equivalent to:
@lisp
(seek port 0 SEEK_CUR)
@end lisp
@end deffn
+In the past, Guile did not have a procedure that would just read out all
+of the characters from a port. As a workaround, many people just called
+@code{read-delimited} with no delimiters, knowing that would produce the
+behavior they wanted. This prompted Guile developers to add some
+routines that would read all characters from a port. So it is that
+@code{(ice-9 rdelim)} is also the home for procedures that can reading
+undelimited text:
+
+@deffn {Scheme Procedure} read-string [port] [count]
+Read all of the characters out of @var{port} and return them as a
+string. If the @var{count} is present, treat it as a limit to the
+number of characters to read.
+
+By default, read from the current input port, with no size limit on the
+result. This procedure always returns a string, even if no characters
+were read.
+@end deffn
+
+@deffn {Scheme Procedure} read-string! buf [port] [start] [end]
+Fill @var{buf} with characters read from @var{port}, defaulting to the
+current input port. Return the number of characters read.
+
+If @var{start} or @var{end} are specified, store data only into the
+substring of @var{str} bounded by @var{start} and @var{end} (which
+default to the beginning and end of the string, respectively).
+@end deffn
+
Some of the aforementioned I/O functions rely on the following C
primitives. These will mainly be of interest to people hacking Guile
internals.
strongly recommended that file ports be closed explicitly when no
longer required (@pxref{Ports}).
-@deffn {Scheme Procedure} open-file filename mode
+@deffn {Scheme Procedure} open-file filename mode @
+ [#:guess-encoding=#f] [#:encoding=#f]
+@deffnx {C Function} scm_open_file_with_encoding @
+ (filename, mode, guess_encoding, encoding)
@deffnx {C Function} scm_open_file (filename, mode)
Open the file whose name is @var{filename}, and return a port
representing that file. The attributes of the port are
Scheme character.
To provide this property, the file will be opened with the 8-bit
-character encoding "ISO-8859-1", ignoring any coding declaration or port
-encoding. @xref{Ports}, for more information on port encodings.
+character encoding "ISO-8859-1", ignoring the default port encoding.
+@xref{Ports}, for more information on port encodings.
Note that while it is possible to read and write binary data as
characters or strings, it is usually better to treat bytes as octets,
because of its port encoding ramifications.
@end table
-If a file cannot be opened with the access
-requested, @code{open-file} throws an exception.
+Unless binary mode is requested, the character encoding of the new port
+is determined as follows: First, if @var{guess-encoding} is true, the
+@code{file-encoding} procedure is used to guess the encoding of the file
+(@pxref{Character Encoding of Source Files}). If @var{guess-encoding}
+is false or if @code{file-encoding} fails, @var{encoding} is used unless
+it is also false. As a last resort, the default port encoding is used.
+@xref{Ports}, for more information on port encodings. It is an error to
+pass a non-false @var{guess-encoding} or @var{encoding} if binary mode
+is requested.
+
+If a file cannot be opened with the access requested, @code{open-file}
+throws an exception.
+
+When the file is opened, its encoding is set to the current
+@code{%default-port-encoding}, unless the @code{b} flag was supplied.
+Sometimes it is desirable to honor Emacs-style coding declarations in
+files@footnote{Guile 2.0.0 to 2.0.7 would do this by default. This
+behavior was deemed inappropriate and disabled starting from Guile
+2.0.8.}. When that is the case, the @code{file-encoding} procedure can
+be used as follows (@pxref{Character Encoding of Source Files,
+@code{file-encoding}}):
-When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If a coding
-declaration is found, it will be used to interpret the file. Otherwise,
-the port's encoding will be used. To suppress this behavior, open the
-file in binary mode and then set the port encoding explicitly using
-@code{set-port-encoding!}.
+@example
+(let* ((port (open-input-file file))
+ (encoding (file-encoding port)))
+ (set-port-encoding! port (or encoding (port-encoding port))))
+@end example
In theory we could create read/write ports which were buffered
in one direction only. However this isn't included in the
@end deffn
@rnindex open-input-file
-@deffn {Scheme Procedure} open-input-file filename
-Open @var{filename} for input. Equivalent to
+@deffn {Scheme Procedure} open-input-file filename @
+ [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for input. If @var{binary} is true, open the port
+in binary mode, otherwise use text mode. @var{encoding} and
+@var{guess-encoding} determine the character encoding as described above
+for @code{open-file}. Equivalent to
@lisp
-(open-file @var{filename} "r")
+(open-file @var{filename}
+ (if @var{binary} "rb" "r")
+ #:guess-encoding @var{guess-encoding}
+ #:encoding @var{encoding})
@end lisp
@end deffn
@rnindex open-output-file
-@deffn {Scheme Procedure} open-output-file filename
-Open @var{filename} for output. Equivalent to
+@deffn {Scheme Procedure} open-output-file filename @
+ [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for output. If @var{binary} is true, open the port
+in binary mode, otherwise use text mode. @var{encoding} specifies the
+character encoding as described above for @code{open-file}. Equivalent
+to
@lisp
-(open-file @var{filename} "w")
+(open-file @var{filename}
+ (if @var{binary} "wb" "w")
+ #:encoding @var{encoding})
@end lisp
@end deffn
-@deffn {Scheme Procedure} call-with-input-file filename proc
-@deffnx {Scheme Procedure} call-with-output-file filename proc
+@deffn {Scheme Procedure} call-with-input-file filename proc @
+ [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} call-with-output-file filename proc @
+ [#:encoding=#f] [#:binary=#f]
@rnindex call-with-input-file
@rnindex call-with-output-file
Open @var{filename} for input or output, and call @code{(@var{proc}
way if not otherwise referenced.
@end deffn
-@deffn {Scheme Procedure} with-input-from-file filename thunk
-@deffnx {Scheme Procedure} with-output-to-file filename thunk
-@deffnx {Scheme Procedure} with-error-to-file filename thunk
+@deffn {Scheme Procedure} with-input-from-file filename thunk @
+ [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-output-to-file filename thunk @
+ [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-error-to-file filename thunk @
+ [#:encoding=#f] [#:binary=#f]
@rnindex with-input-from-file
@rnindex with-output-to-file
Open @var{filename} and call @code{(@var{thunk})} with the new port
The current port setting is managed with @code{dynamic-wind}, so the
previous value is restored no matter how @var{thunk} exits (eg.@: an
exception), and if @var{thunk} is re-entered (via a captured
-continuation) then it's set again to the @var{FILENAME} port.
+continuation) then it's set again to the @var{filename} port.
The port is closed when @var{thunk} returns normally, but not when
exited via an exception or new continuation. This ensures it's still
Calls the one-argument procedure @var{proc} with a newly created output
port. When the function returns, the string composed of the characters
written into the port is returned. @var{proc} should not close the port.
-
-Note that which characters can be written to a string port depend on the port's
-encoding. The default encoding of string ports is specified by the
-@code{%default-port-encoding} fluid (@pxref{Ports,
-@code{%default-port-encoding}}). For instance, it is an error to write Greek
-letter alpha to an ISO-8859-1-encoded string port since this character cannot be
-represented with ISO-8859-1:
-
-@example
-(define alpha (integer->char #x03b1)) ; GREEK SMALL LETTER ALPHA
-
-(with-fluids ((%default-port-encoding "ISO-8859-1"))
- (call-with-output-string
- (lambda (p)
- (display alpha p))))
-
-@result{}
-Throw to key `encoding-error'
-@end example
-
-Changing the string port's encoding to a Unicode-capable encoding such as UTF-8
-solves the problem.
@end deffn
@deffn {Scheme Procedure} call-with-input-string string proc
Calls the zero-argument procedure @var{thunk} with the current output
port set temporarily to a new string port. It returns a string
composed of the characters written to the current output.
-
-See @code{call-with-output-string} above for character encoding considerations.
@end deffn
@deffn {Scheme Procedure} with-input-from-string string thunk
* R6RS Textual Output:: Textual output.
@end menu
-A subset of the @code{(rnrs io ports)} module is provided by the
-@code{(ice-9 binary-ports)} module. It contains binary input/output
-procedures and does not rely on R6RS support.
+A subset of the @code{(rnrs io ports)} module, plus one non-standard
+procedure @code{unget-bytevector} (@pxref{R6RS Binary Input}), is
+provided by the @code{(ice-9 binary-ports)} module. It contains binary
+input/output procedures and does not rely on R6RS support.
@node R6RS File Names
@subsubsection File Names
An exception with this type is raised when one of the operations for
textual output to a port encounters a character that cannot be
translated into bytes by the output direction of the port's transcoder.
-@var{Char} is the character that could not be encoded.
+@var{char} is the character that could not be encoded.
@end deffn
@deffn {Scheme Syntax} error-handling-mode @var{error-handling-mode-symbol}
raised.
@quotation Note
- Only the name of @var{error-handling-style-symbol} is significant.
+ Only the name of @var{error-handling-mode-symbol} is significant.
@end quotation
The error-handling mode of a transcoder specifies the behavior
symbol.
@var{eol-style} may be omitted, in which case it defaults to the native
-end-of-line style of the underlying platform. @var{Handling-mode} may
+end-of-line style of the underlying platform. @var{handling-mode} may
be omitted, in which case it defaults to @code{replace}. The result is
a transcoder with the behavior specified by its arguments.
@end deffn
@end deffn
@deffn {Scheme Procedure} textual-port? port
-Always return @var{#t}, as all ports can be used for textual I/O in
+Always return @code{#t}, as all ports can be used for textual I/O in
Guile.
@end deffn
-@deffn {Scheme Procedure} transcoded-port obj
+@deffn {Scheme Procedure} transcoded-port binary-port transcoder
The @code{transcoded-port} procedure
returns a new textual port with the specified @var{transcoder}.
Otherwise the new textual port's state is largely the same as
@node R6RS Input Ports
@subsubsection Input Ports
-@deffn {Scheme Procedure} input-port? obj@
+@deffn {Scheme Procedure} input-port? obj
Returns @code{#t} if the argument is an input port (or a combined input
and output port), and returns @code{#f} otherwise.
@end deffn
-@deffn {Scheme Procedure} port-eof? port
+@deffn {Scheme Procedure} port-eof? input-port
Returns @code{#t}
if the @code{lookahead-u8} procedure (if @var{input-port} is a binary port)
or the @code{lookahead-char} procedure (if @var{input-port} is a textual port)
@deffnx {Scheme Procedure} open-file-input-port filename file-options
@deffnx {Scheme Procedure} open-file-input-port filename file-options buffer-mode
@deffnx {Scheme Procedure} open-file-input-port filename file-options buffer-mode maybe-transcoder
-@var{Maybe-transcoder} must be either a transcoder or @code{#f}.
+@var{maybe-transcoder} must be either a transcoder or @code{#f}.
The @code{open-file-input-port} procedure returns an
input port for the named file. The @var{file-options} and
end-of-file.
Optionally, if @var{get-position} is not @code{#f}, it must be a thunk
-that will be called when @var{port-position} is invoked on the custom
+that will be called when @code{port-position} is invoked on the custom
binary port and should return an integer indicating the position within
the underlying data stream; if @var{get-position} was not supplied, the
-returned port does not support @var{port-position}.
+returned port does not support @code{port-position}.
Likewise, if @var{set-position!} is not @code{#f}, it should be a
-one-argument procedure. When @var{set-port-position!} is invoked on the
+one-argument procedure. When @code{set-port-position!} is invoked on the
custom binary input port, @var{set-position!} is passed an integer
indicating the position of the next byte is to read.
@deffn {Scheme Procedure} get-bytevector-some port
@deffnx {C Function} scm_get_bytevector_some (port)
-Read from @var{port}, blocking as necessary, until data are available or
-and end-of-file is reached. Return either a new bytevector containing
-the data read or the end-of-file object.
+Read from @var{port}, blocking as necessary, until bytes are available
+or an end-of-file is reached. Return either the end-of-file object or a
+new bytevector containing some of the available bytes (at least one),
+and update the port position to point just past these bytes.
@end deffn
@deffn {Scheme Procedure} get-bytevector-all port
end-of-file object (if no data were available).
@end deffn
+The @code{(ice-9 binary-ports)} module provides the following procedure
+as an extension to @code{(rnrs io ports)}:
+
+@deffn {Scheme Procedure} unget-bytevector port bv [start [count]]
+@deffnx {C Function} scm_unget_bytevector (port, bv, start, count)
+Place the contents of @var{bv} in @var{port}, optionally starting at
+index @var{start} and limiting to @var{count} octets, so that its bytes
+will be read from left-to-right as the next bytes from @var{port} during
+subsequent read operations. If called multiple times, the unread bytes
+will be read again in last-in first-out order.
+@end deffn
+
@node R6RS Textual Input
@subsubsection Textual Input
-@deffn {Scheme Procedure} get-char port
+@deffn {Scheme Procedure} get-char textual-input-port
Reads from @var{textual-input-port}, blocking as necessary, until a
complete character is available from @var{textual-input-port},
or until an end of file is reached.
character is read, @code{get-char} returns the end-of-file object.
@end deffn
-@deffn {Scheme Procedure} lookahead-char port
+@deffn {Scheme Procedure} lookahead-char textual-input-port
The @code{lookahead-char} procedure is like @code{get-char}, but it does
not update @var{textual-input-port} to point past the character.
@end deffn
-@deffn {Scheme Procedure} get-string-n port count
+@deffn {Scheme Procedure} get-string-n textual-input-port count
-@var{Count} must be an exact, non-negative integer object, representing
+@var{count} must be an exact, non-negative integer object, representing
the number of characters to be read.
The @code{get-string-n} procedure reads from @var{textual-input-port},
before an end of file, the end-of-file object is returned.
@end deffn
-@deffn {Scheme Procedure} get-string-n! port string start count
+@deffn {Scheme Procedure} get-string-n! textual-input-port string start count
-@var{Start} and @var{count} must be exact, non-negative integer objects,
+@var{start} and @var{count} must be exact, non-negative integer objects,
with @var{count} representing the number of characters to be read.
-@var{String} must be a string with at least $@var{start} + @var{count}$
+@var{string} must be a string with at least $@var{start} + @var{count}$
characters.
The @code{get-string-n!} procedure reads from @var{textual-input-port}
file, the end-of-file object is returned.
@end deffn
-@deffn {Scheme Procedure} get-string-all port count
+@deffn {Scheme Procedure} get-string-all textual-input-port
Reads from @var{textual-input-port} until an end of file, decoding
characters in the same manner as @code{get-string-n} and
@code{get-string-n!}.
precedes the end of file, the end-of-file object is returned.
@end deffn
-@deffn {Scheme Procedure} get-line port
+@deffn {Scheme Procedure} get-line textual-input-port
Reads from @var{textual-input-port} up to and including the linefeed
character or end of file, decoding characters in the same manner as
@code{get-string-n} and @code{get-string-n!}.
@end quotation
@end deffn
-@deffn {Scheme Procedure} get-datum port count
+@deffn {Scheme Procedure} get-datum textual-input-port count
Reads an external representation from @var{textual-input-port} and returns the
datum it represents. The @code{get-datum} procedure returns the next
datum that can be parsed from the given @var{textual-input-port}, updating
@deffn {Scheme Procedure} put-char port char
Writes @var{char} to the port. The @code{put-char} procedure returns
+an unspecified value.
@end deffn
@deffn {Scheme Procedure} put-string port string
@code{put-string} procedure returns an unspecified value.
@end deffn
-@deffn {Scheme Procedure} put-datum port datum
+@deffn {Scheme Procedure} put-datum textual-output-port datum
@var{datum} should be a datum value. The @code{put-datum} procedure
writes an external representation of @var{datum} to
@var{textual-output-port}. The specific external representation is
@end table
+@node BOM Handling
+@subsection Handling of Unicode byte order marks.
+@cindex BOM
+@cindex byte order mark
+
+This section documents the finer points of Guile's handling of Unicode
+byte order marks (BOMs). A byte order mark (U+FEFF) is typically found
+at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably
+determine the byte order. Occasionally, a BOM is found at the start of
+a UTF-8 stream, but this is much less common and not generally
+recommended.
+
+Guile attempts to handle BOMs automatically, and in accordance with the
+recommendations of the Unicode Standard, when the port encoding is set
+to @code{UTF-8}, @code{UTF-16}, or @code{UTF-32}. In brief, Guile
+automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
+and automatically consumes one from the start of a UTF-8, UTF-16, or
+UTF-32 stream.
+
+As specified in the Unicode Standard, a BOM is only handled specially at
+the start of a stream, and only if the port encoding is set to
+@code{UTF-8}, @code{UTF-16} or @code{UTF-32}. If the port encoding is
+set to @code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-32BE}, or
+@code{UTF-32LE}, then BOMs are @emph{not} handled specially, and none of
+the special handling described in this section applies.
+
+@itemize @bullet
+@item
+To ensure that Guile will properly detect the byte order of a UTF-16 or
+UTF-32 stream, you must perform a textual read before any writes, seeks,
+or binary I/O. Guile will not attempt to read a BOM unless a read is
+explicitly requested at the start of the stream.
+
+@item
+If a textual write is performed before the first read, then an arbitrary
+byte order will be chosen. Currently, big endian is the default on all
+platforms, but that may change in the future. If you wish to explicitly
+control the byte order of an output stream, set the port encoding to
+@code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-32BE}, or @code{UTF-32LE},
+and explicitly write a BOM (@code{#\xFEFF}) if desired.
+
+@item
+If @code{set-port-encoding!} is called in the middle of a stream, Guile
+treats this as a new logical ``start of stream'' for purposes of BOM
+handling, and will forget about any BOMs that had previously been seen.
+Therefore, it may choose a different byte order than had been used
+previously. This is intended to support multiple logical text streams
+embedded within a larger binary stream.
+
+@item
+Binary I/O operations are not guaranteed to update Guile's notion of
+whether the port is at the ``start of the stream'', nor are they
+guaranteed to produce or consume BOMs.
+
+@item
+For ports that support seeking (e.g. normal files), the input and output
+streams are considered linked: if the user reads first, then a BOM will
+be consumed (if appropriate), but later writes will @emph{not} produce a
+BOM. Similarly, if the user writes first, then later reads will
+@emph{not} consume a BOM.
+
+@item
+For ports that do not support seeking (e.g. pipes, sockets, and
+terminals), the input and output streams are considered
+@emph{independent} for purposes of BOM handling: the first read will
+consume a BOM (if appropriate), and the first write will @emph{also}
+produce a BOM (if appropriate). However, the input and output streams
+will always use the same byte order.
+
+@item
+Seeks to the beginning of a file will set the ``start of stream'' flags.
+Therefore, a subsequent textual read or write will consume or produce a
+BOM. However, unlike @code{set-port-encoding!}, if a byte order had
+already been chosen for the port, it will remain in effect after a seek,
+and cannot be changed by the presence of a BOM. Seeks anywhere other
+than the beginning of a file clear the ``start of stream'' flags.
+@end itemize
+
@c Local Variables:
@c TeX-master: "guile.texi"
@c End: