@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
-@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009
+@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010
@c Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
-@page
@node Input and Output
@section Input and Output
When ports are created, they inherit their character encoding from the
current locale, but, that can be modified after the port is created.
+Currently, the ports only work with @emph{non-modal} encodings. Most
+encodings are non-modal, meaning that the conversion of bytes to a
+string doesn't depend on its context: the same byte sequence will always
+return the same string. A couple of modal encodings are in common use,
+like ISO-2022-JP and ISO-2022-KR, and they are not yet supported.
+
Each port also has an associated conversion strategy: what to do when
a Guile character can't be converted to the port's encoded character
representation for output. There are three possible strategies: to
@deffn {Scheme Procedure} set-port-encoding! port enc
@deffnx {C Function} scm_set_port_encoding_x (port, enc)
-Sets the character encoding that will be used to interpret all port
-I/O. @var{enc} is a string containing the name of an encoding.
+Sets the character encoding that will be used to interpret all port I/O.
+@var{enc} is a string containing the name of an encoding. Valid
+encoding names are those
+@url{http://www.iana.org/assignments/character-sets, defined by IANA}.
@end deffn
+@defvr {Scheme Variable} %default-port-encoding
+A fluid containing @code{#f} or the name of the encoding to
+be used by default for newly created ports (@pxref{Fluids and Dynamic
+States}). The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
+
New ports are created with the encoding appropriate for the current
-locale if @code{setlocale} has been called or ISO-8859-1 otherwise,
-and this procedure can be used to modify that encoding.
+locale if @code{setlocale} has been called or the value specified by
+this fluid otherwise.
+@end defvr
@deffn {Scheme Procedure} port-encoding port
@deffnx {C Function} scm_port_encoding
-Returns, as a string, the character encoding that @var{port} uses to
-interpret its input and output.
+Returns, as a string, the character encoding that @var{port} uses to interpret
+its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
@end deffn
@deffn {Scheme Procedure} set-port-conversion-strategy! port sym
(For reference, Guile leaves text versus binary up to the C library,
@code{b} here just adds @code{O_BINARY} to the underlying @code{open}
call, when that flag is available.)
+
+Also, open the file using the 8-bit character encoding "ISO-8859-1",
+ignoring any coding declaration or port encoding.
+
+Note that, when reading or writing binary data with ports, the
+bytevector ports in the @code{(rnrs io ports)} module are preferred,
+as they return vectors, and not strings (@pxref{R6RS I/O Ports}).
@end table
If a file cannot be opened with the access
requested, @code{open-file} throws an exception.
+When the file is opened, this procedure will scan for a coding
+declaration (@pxref{Character Encoding of Source Files}). If present
+will use that encoding for interpreting the file. Otherwise, the
+port's encoding will be used. To supress this behavior, open
+the file in binary mode and then set the port encoding explicitly
+using @code{set-port-encoding!}.
+
In theory we could create read/write ports which were buffered
in one direction only. However this isn't included in the
current interfaces.
Calls the one-argument procedure @var{proc} with a newly created output
port. When the function returns, the string composed of the characters
written into the port is returned. @var{proc} should not close the port.
+
+Note that which characters can be written to a string port depend on the port's
+encoding. The default encoding of string ports is specified by the
+@code{%default-port-encoding} fluid (@pxref{Ports,
+@code{%default-port-encoding}}). For instance, it is an error to write Greek
+letter alpha to an ISO-8859-1-encoded string port since this character cannot be
+represented with ISO-8859-1:
+
+@example
+(define alpha (integer->char #x03b1)) ; GREEK SMALL LETTER ALPHA
+
+(with-fluids ((%default-port-encoding "ISO-8859-1"))
+ (call-with-output-string
+ (lambda (p)
+ (display alpha p))))
+
+@result{}
+Throw to key `encoding-error'
+@end example
+
+Changing the string port's encoding to a Unicode-capable encoding such as UTF-8
+solves the problem.
@end deffn
@deffn {Scheme Procedure} call-with-input-string string proc
Calls the zero-argument procedure @var{thunk} with the current output
port set temporarily to a new string port. It returns a string
composed of the characters written to the current output.
+
+See @code{call-with-output-string} above for character encoding considerations.
@end deffn
@deffn {Scheme Procedure} with-input-from-string string thunk