@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
-@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009
+@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010
@c Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
-@page
@node Input and Output
@section Input and Output
Ports are garbage collected in the usual way (@pxref{Memory
Management}), and will be closed at that time if not already closed.
-In this case any errors occuring in the close will not be reported.
+In this case any errors occurring in the close will not be reported.
Usually a program will want to explicitly close so as to be sure all
its operations have been successful. Of course if a program has
abandoned something due to an error or other condition then closing
available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
read and written on a 32-bit system.
+Each port has an associated character encoding that controls how bytes
+read from the port are converted to characters and string and controls
+how characters and strings written to the port are converted to bytes.
+When ports are created, they inherit their character encoding from the
+current locale, but, that can be modified after the port is created.
+
+Currently, the ports only work with @emph{non-modal} encodings. Most
+encodings are non-modal, meaning that the conversion of bytes to a
+string doesn't depend on its context: the same byte sequence will always
+return the same string. A couple of modal encodings are in common use,
+like ISO-2022-JP and ISO-2022-KR, and they are not yet supported.
+
+Each port also has an associated conversion strategy: what to do when
+a Guile character can't be converted to the port's encoded character
+representation for output. There are three possible strategies: to
+raise an error, to replace the character with a hex escape, or to
+replace the character with a substitute character.
+
@rnindex input-port?
@deffn {Scheme Procedure} input-port? x
@deffnx {C Function} scm_input_port_p (x)
@var{x}))}.
@end deffn
+@deffn {Scheme Procedure} set-port-encoding! port enc
+@deffnx {C Function} scm_set_port_encoding_x (port, enc)
+Sets the character encoding that will be used to interpret all port I/O.
+@var{enc} is a string containing the name of an encoding. Valid
+encoding names are those
+@url{http://www.iana.org/assignments/character-sets, defined by IANA}.
+@end deffn
+
+@defvr {Scheme Variable} %default-port-encoding
+A fluid containing @code{#f} or the name of the encoding to
+be used by default for newly created ports (@pxref{Fluids and Dynamic
+States}). The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
+
+New ports are created with the encoding appropriate for the current
+locale if @code{setlocale} has been called or the value specified by
+this fluid otherwise.
+@end defvr
+
+@deffn {Scheme Procedure} port-encoding port
+@deffnx {C Function} scm_port_encoding
+Returns, as a string, the character encoding that @var{port} uses to interpret
+its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
+@end deffn
+
+@deffn {Scheme Procedure} set-port-conversion-strategy! port sym
+@deffnx {C Function} scm_set_port_conversion_strategy_x (port, sym)
+Sets the behavior of the interpreter when outputting a character that
+is not representable in the port's current encoding. @var{sym} can be
+either @code{'error}, @code{'substitute}, or @code{'escape}. If it is
+@code{'error}, an error will be thrown when an nonconvertible character
+is encountered. If it is @code{'substitute}, then nonconvertible
+characters will be replaced with approximate characters, or with
+question marks if no approximately correct character is available. If
+it is @code{'escape}, it will appear as a hex escape when output.
+
+If @var{port} is an open port, the conversion error behavior
+is set for that port. If it is @code{#f}, it is set as the
+default behavior for any future ports that get created in
+this thread.
+@end deffn
+
+@deffn {Scheme Procedure} port-conversion-strategy port
+@deffnx {C Function} scm_port_conversion_strategy (port)
+Returns the behavior of the port when outputting a character that is
+not representable in the port's current encoding. It returns the
+symbol @code{error} if unrepresentable characters should cause
+exceptions, @code{substitute} if the port should try to replace
+unrepresentable characters with question marks or approximate
+characters, or @code{escape} if unrepresentable characters should be
+converted to string escapes.
+
+If @var{port} is @code{#f}, then the current default behavior will be
+returned. New ports will have this default behavior when they are
+created.
+@end deffn
+
+
@node Reading
@subsection Reading
The output is designed to be machine readable, and can be read back
with @code{read} (@pxref{Reading}). Strings are printed in
-doublequotes, with escapes if necessary, and characters are printed in
+double quotes, with escapes if necessary, and characters are printed in
@samp{#\} notation.
@end deffn
output port if not given.
The output is designed for human readability, it differs from
-@code{write} in that strings are printed without doublequotes and
+@code{write} in that strings are printed without double quotes and
escapes, and characters are printed as per @code{write-char}, not in
@samp{#\} form.
@end deffn
The delimited-I/O module can be accessed with:
-@smalllisp
+@lisp
(use-modules (ice-9 rdelim))
-@end smalllisp
+@end lisp
It can be used to read or write lines of text, or read text delimited by
a specified set of characters. It's similar to the @code{(scsh rdelim)}
@end lisp
@end deffn
-Some of the abovementioned I/O functions rely on the following C
+Some of the aforementioned I/O functions rely on the following C
primitives. These will mainly be of interest to people hacking Guile
internals.
The Block-string-I/O module can be accessed with:
-@smalllisp
+@lisp
(use-modules (ice-9 rw))
-@end smalllisp
+@end lisp
It currently contains procedures that help to implement the
@code{(scsh rw)} module in guile-scsh.
(For reference, Guile leaves text versus binary up to the C library,
@code{b} here just adds @code{O_BINARY} to the underlying @code{open}
call, when that flag is available.)
+
+Also, open the file using the 8-bit character encoding "ISO-8859-1",
+ignoring any coding declaration or port encoding.
+
+Note that, when reading or writing binary data with ports, the
+bytevector ports in the @code{(rnrs io ports)} module are preferred,
+as they return vectors, and not strings (@pxref{R6RS I/O Ports}).
@end table
If a file cannot be opened with the access
requested, @code{open-file} throws an exception.
+When the file is opened, this procedure will scan for a coding
+declaration (@pxref{Character Encoding of Source Files}). If present
+will use that encoding for interpreting the file. Otherwise, the
+port's encoding will be used. To supress this behavior, open
+the file in binary mode and then set the port encoding explicitly
+using @code{set-port-encoding!}.
+
In theory we could create read/write ports which were buffered
in one direction only. However this isn't included in the
current interfaces.
@rnindex open-input-file
@deffn {Scheme Procedure} open-input-file filename
Open @var{filename} for input. Equivalent to
-@smalllisp
+@lisp
(open-file @var{filename} "r")
-@end smalllisp
+@end lisp
@end deffn
@rnindex open-output-file
@deffn {Scheme Procedure} open-output-file filename
Open @var{filename} for output. Equivalent to
-@smalllisp
+@lisp
(open-file @var{filename} "w")
-@end smalllisp
+@end lisp
@end deffn
@deffn {Scheme Procedure} call-with-input-file filename proc
Open @var{filename} for input or output, and call @code{(@var{proc}
port)} with the resulting port. Return the value returned by
@var{proc}. @var{filename} is opened as per @code{open-input-file} or
-@code{open-output-file} respectively, and an error is signalled if it
+@code{open-output-file} respectively, and an error is signaled if it
cannot be opened.
When @var{proc} returns, the port is closed. If @var{proc} does not
-return (eg.@: if it throws an error), then the port might not be
+return (e.g.@: if it throws an error), then the port might not be
closed automatically, though it will be garbage collected in the usual
way if not otherwise referenced.
@end deffn
@code{current-output-port}, or @code{current-error-port}. Return the
value returned by @var{thunk}. @var{filename} is opened as per
@code{open-input-file} or @code{open-output-file} respectively, and an
-error is signalled if it cannot be opened.
+error is signaled if it cannot be opened.
When @var{thunk} returns, the port is closed and the previous setting
of the respective current port is restored.
The following allow string ports to be opened by analogy to R4R*
file port facilities:
+With string ports, the port-encoding is treated differently than other
+types of ports. When string ports are created, they do not inherit a
+character encoding from the current locale. They are given a
+default locale that allows them to handle all valid string characters.
+Typically one should not modify a string port's character encoding
+away from its default.
+
@deffn {Scheme Procedure} call-with-output-string proc
@deffnx {C Function} scm_call_with_output_string (proc)
Calls the one-argument procedure @var{proc} with a newly created output
port. When the function returns, the string composed of the characters
written into the port is returned. @var{proc} should not close the port.
+
+Note that which characters can be written to a string port depend on the port's
+encoding. The default encoding of string ports is specified by the
+@code{%default-port-encoding} fluid (@pxref{Ports,
+@code{%default-port-encoding}}). For instance, it is an error to write Greek
+letter alpha to an ISO-8859-1-encoded string port since this character cannot be
+represented with ISO-8859-1:
+
+@example
+(define alpha (integer->char #x03b1)) ; GREEK SMALL LETTER ALPHA
+
+(with-fluids ((%default-port-encoding "ISO-8859-1"))
+ (call-with-output-string
+ (lambda (p)
+ (display alpha p))))
+
+@result{}
+Throw to key `encoding-error'
+@end example
+
+Changing the string port's encoding to a Unicode-capable encoding such as UTF-8
+solves the problem.
@end deffn
@deffn {Scheme Procedure} call-with-input-string string proc
Calls the zero-argument procedure @var{thunk} with the current output
port set temporarily to a new string port. It returns a string
composed of the characters written to the current output.
+
+See @code{call-with-output-string} above for character encoding considerations.
@end deffn
@deffn {Scheme Procedure} with-input-from-string string thunk
@node Port Implementation
@subsubsection Port Implementation
-@cindex Port implemenation
+@cindex Port implementation
This section describes how to implement a new port type in C.