update web.texi
[bpt/guile.git] / doc / ref / web.texi
CommitLineData
8db7e094
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
3@c Copyright (C) 2010 Free Software Foundation, Inc.
4@c See the file guile.texi for copying conditions.
5
6@node Web
7@section @acronym{HTTP}, the Web, and All That
8@cindex Web
9@cindex WWW
10@cindex HTTP
11
12When Guile started back in the mid-nineties, the GNU system was still
13focused on producing a good POSIX implementation. This is why Guile's
14POSIX support is good, and has been so for a while.
15
16But times change, and in a way these days the web is the new POSIX: a
17standard and a motley set of implementations on which much computing is
18done. So today's Guile also supports the web at the programming
19language level, by defining common data types and operations for the
20technologies underpinning the web: URIs, HTTP, and XML.
21
22It is particularly important to define native web data types. Though
23the web is text in motion, programming the web in text is like
24programming with @code{goto}: muddy, and error-prone. Most current
25security problems on the web are due to treating the web as text instead
26of as instances of the proper data types.
27
28In addition, common web data types help programmers to share code.
29
30Well. That's all very nice and opinionated and such, but how do I use
31the thing? Read on!
32
33@menu
34* URIs:: Universal Resource Identifiers.
35* HTTP:: The Hyper-Text Transfer Protocol.
36* Requests:: HTTP requests.
37* Responses:: HTTP responses.
38* Web Handlers:: A simple web application interface.
39* Web Server:: Serving HTTP to the internet.
40@end menu
41
42@node URIs
43@subsection Universal Resource Identifiers
44
299cd1a2
AW
45Guile provides a standard data type for Universal Resource Identifiers
46(URIs), as defined in RFC 3986.
8db7e094 47
299cd1a2 48The generic URI syntax is as follows:
8db7e094 49
299cd1a2
AW
50@example
51URI := scheme ":" ["//" [userinfo "@"] host [":" port]] path \
52 [ "?" query ] [ "#" fragment ]
53@end example
8db7e094 54
299cd1a2
AW
55So, all URIs have a scheme and a path. Some URIs have a host, and some
56of those have ports and userinfo. Any URI might have a query part or a
57fragment.
8db7e094 58
299cd1a2
AW
59Userinfo is something of an abstraction, as some legacy URI schemes
60allowed userinfo of the form @code{@var{username}:@var{passwd}}.
61Passwords don't belong in URIs, so the RFC does not want to condone
62this, but neither can it say that what is before the @code{@} sign is
63just a username, so the RFC punts on the issue and calls it
64@dfn{userinfo}.
8db7e094 65
299cd1a2
AW
66Also, strictly speaking, a URI with a fragment is a @dfn{URI
67reference}. A fragment is typically not serialized when sending a URI
68over the wire; that is, it is not part of the identifier of a resource.
69It only identifies a part of a given resource. But it's useful to have
70a field for it in the URI record itself, so we hope you will forgive the
71inconsistency.
8db7e094 72
299cd1a2
AW
73@example
74(use-modules (web uri))
75@end example
8db7e094 76
299cd1a2
AW
77The following procedures can be found in the @code{(web uri)}
78module. Load it into your Guile, using a form like the above, to have
79access to them.
8db7e094
AW
80
81@defun build-uri scheme [#:userinfo] [#:host] [#:port] [#:path] [#:query] [#:fragment] [#:validate?]
82Construct a URI object. If @var{validate?} is true, also run some
83consistency checks to make sure that the constructed URI is valid.
84@end defun
85
299cd1a2
AW
86@defun uri? x
87@defun uri-scheme uri
88@defunx uri-userinfo uri
89@defunx uri-host uri
90@defunx uri-port uri
91@defunx uri-path uri
92@defunx uri-query uri
93@defunx uri-fragment uri
94A predicate and field accessors for the URI record type.
95@end defun
96
8db7e094
AW
97@defun declare-default-port! scheme port
98Declare a default port for the given URI scheme.
99
100Default ports are for printing URI objects: a default port is not
101printed.
102@end defun
103
104@defun parse-uri string
105Parse @var{string} into a URI object. Returns @code{#f} if the string
106could not be parsed.
107@end defun
108
109@defun unparse-uri uri
110Serialize @var{uri} to a string.
111@end defun
112
113@defun uri-decode str [#:charset]
114Percent-decode the given @var{str}, according to @var{charset}.
115
116Note that this function should not generally be applied to a full URI
117string. For paths, use split-and-decode-uri-path instead. For query
118strings, split the query on @code{&} and @code{=} boundaries, and decode
119the components separately.
120
121Note that percent-encoded strings encode @emph{bytes}, not characters.
122There is no guarantee that a given byte sequence is a valid string
123encoding. Therefore this routine may signal an error if the decoded
124bytes are not valid for the given encoding. Pass @code{#f} for
125@var{charset} if you want decoded bytes as a bytevector directly.
126@end defun
127
128@defun uri-encode str [#:charset] [#:unescaped-chars]
129Percent-encode any character not in @var{unescaped-chars}.
130
131Percent-encoding first writes out the given character to a bytevector
132within the given @var{charset}, then encodes each byte as
133@code{%@var{HH}}, where @var{HH} is the hexadecimal representation of
134the byte.
135@end defun
136
137@defun split-and-decode-uri-path path
138Split @var{path} into its components, and decode each component,
139removing empty components.
140
141For example, @code{"/foo/bar/"} decodes to the two-element list,
142@code{("foo" "bar")}.
143@end defun
144
145@defun encode-and-join-uri-path parts
146URI-encode each element of @var{parts}, which should be a list of
147strings, and join the parts together with @code{/} as a delimiter.
148@end defun
149
150@node HTTP
151@subsection The Hyper-Text Transfer Protocol
152
299cd1a2
AW
153The initial motivation for including web functionality in Guile, rather
154than rely on an external package, was to establish a standard base on
155which people can share code. To that end, we continue the focus on data
156types by providing a number of low-level parsers and unparsers for
157elements of the HTTP protocol.
158
159If you are want to skip the low-level details for now and move on to web
160pages, @pxref{Web Server}. Otherwise, load the HTTP module, and read
161on.
162
8db7e094
AW
163@example
164(use-modules (web http))
165@end example
166
299cd1a2
AW
167The focus of the @code{(web http)} module is to parse and unparse
168standard HTTP headers, representing them to Guile as native data
169structures. For example, a @code{Date:} header will be represented as a
170SRFI-19 date record (@pxref{SRFI-19}), rather than as a string.
171
172Guile tries to follow RFCs fairly strictly---the road to perdition being
173paved with compatibility hacks---though some allowances are made for
174not-too-divergent texts.
175
176The first bit is to define a registry of parsers, validators, and
177unparsers, keyed by header name. That is the function of the
178@code{<header-decl>} object.
179
180@defun make-header-decl sym name multiple? parser validator writer
181@defunx header-decl? x
182@defunx header-decl-sym decl
183@defunx header-decl-name decl
184@defunx header-decl-multiple? decl
185@defunx header-decl-parser decl
186@defunx header-decl-validator decl
187@defunx header-decl-writer decl.
188A constructor, predicate, and field accessors for the
189@code{<header-decl>} type. The fields are as follows:
190
191@table @code
192@item sym
193The symbol name for this header field, always in lower-case. For
194example, @code{"Content-Length"} has a symbolic name of
195@code{content-length}.
196@item name
197The string name of the header, in its preferred capitalization.
198@item multiple?
199@code{#t} iff this header may appear multiple times in a message.
200@item parser
201A procedure which takes a string and returns a parsed value.
202@item validator
203A predicate, returning @code{#t} iff the value is valid for this header.
204@item writer
205A writer, which writes a value to the port given in the second argument.
206@end table
207@end defun
8db7e094 208
299cd1a2
AW
209@defun declare-header! sym name [#:multiple?] [#:parser] [#:validator] [#:writer]
210Make a header declaration, as above, and register it by symbol and by
211name.
212@end defun
8db7e094
AW
213
214@defun lookup-header-decl name
215Return the @var{header-decl} object registered for the given @var{name}.
216
217@var{name} may be a symbol or a string. Strings are mapped to headers in
218a case-insensitive fashion.
219@end defun
220
299cd1a2
AW
221@defun valid-header? sym val
222Returns a true value iff @var{val} is a valid Scheme value for the
223header with name @var{sym}.
8db7e094
AW
224@end defun
225
299cd1a2
AW
226Now that we have a generic interface for reading and writing headers, we
227do just that.
228
8db7e094
AW
229@defun read-header port
230Reads one HTTP header from @var{port}. Returns two values: the header
231name and the parsed Scheme value. May raise an exception if the header
232was known but the value was invalid.
233
234Returns @var{#f} for both values if the end of the message body was
235reached (i.e., a blank line).
236@end defun
237
238@defun parse-header name val
239Parse @var{val}, a string, with the parser for the header named
240@var{name}.
241
242Returns two values, the header name and parsed value. If a parser was
243found, the header name will be returned as a symbol. If a parser was not
244found, both the header name and the value are returned as strings.
245@end defun
246
8db7e094
AW
247@defun write-header name val port
248Writes the given header name and value to @var{port}. If @var{name} is a
249symbol, looks up a declared header and uses that writer. Otherwise the
250value is written using @var{display}.
251@end defun
252
253@defun read-headers port
254Read an HTTP message from @var{port}, returning the headers as an
255ordered alist.
256@end defun
257
258@defun write-headers headers port
259Write the given header alist to @var{port}. Doesn't write the final
260\r\n, as the user might want to add another header.
261@end defun
262
299cd1a2
AW
263The @code{(web http)} module also has some utility procedures to read
264and write request and response lines.
265
8db7e094
AW
266@defun parse-http-method str [start] [end]
267Parse an HTTP method from @var{str}. The result is an upper-case symbol,
268like @code{GET}.
269@end defun
270
271@defun parse-http-version str [start] [end]
272Parse an HTTP version from @var{str}, returning it as a major-minor
273pair. For example, @code{HTTP/1.1} parses as the pair of integers,
274@code{(1 . 1)}.
275@end defun
276
277@defun parse-request-uri str [start] [end]
278Parse a URI from an HTTP request line. Note that URIs in requests do not
279have to have a scheme or host name. The result is a URI object.
280@end defun
281
282@defun read-request-line port
283Read the first line of an HTTP request from @var{port}, returning three
284values: the method, the URI, and the version.
285@end defun
286
287@defun write-request-line method uri version port
288Write the first line of an HTTP request to @var{port}.
289@end defun
290
291@defun read-response-line port
292Read the first line of an HTTP response from @var{port}, returning three
293values: the HTTP version, the response code, and the "reason phrase".
294@end defun
295
296@defun write-response-line version code reason-phrase port
297Write the first line of an HTTP response to @var{port}.
298@end defun
299
300
301@node Requests
302@subsection HTTP Requests
303
304@example
305(use-modules (web request))
306@end example
307
299cd1a2
AW
308@defun request?
309@end defun
8db7e094 310
299cd1a2
AW
311@defun request-method
312@end defun
8db7e094 313
299cd1a2
AW
314@defun request-uri
315@end defun
8db7e094 316
299cd1a2
AW
317@defun request-version
318@end defun
8db7e094 319
299cd1a2
AW
320@defun request-headers
321@end defun
8db7e094 322
299cd1a2
AW
323@defun request-meta
324@end defun
8db7e094 325
299cd1a2
AW
326@defun request-port
327@end defun
8db7e094
AW
328
329@defun read-request port [meta]
330Read an HTTP request from @var{port}, optionally attaching the given
331metadata, @var{meta}.
332
333As a side effect, sets the encoding on @var{port} to ISO-8859-1
334(latin-1), so that reading one character reads one byte. See the
335discussion of character sets in "HTTP Requests" in the manual, for more
336information.
337@end defun
338
339@defun build-request [#:method] [#:uri] [#:version] [#:headers] [#:port] [#:meta] [#:validate-headers?]
340Construct an HTTP request object. If @var{validate-headers?} is true,
341the headers are each run through their respective validators.
342@end defun
343
344@defun write-request r port
345Write the given HTTP request to @var{port}.
346
347Returns a new request, whose @code{request-port} will continue writing
348on @var{port}, perhaps using some transfer encoding.
349@end defun
350
351@defun read-request-body/latin-1 r
352Reads the request body from @var{r}, as a string.
353
354Assumes that the request port has ISO-8859-1 encoding, so that the
355number of characters to read is the same as the
356@code{request-content-length}. Returns @code{#f} if there was no request
357body.
358@end defun
359
360@defun write-request-body/latin-1 r body
361Write @var{body}, a string encodable in ISO-8859-1, to the port
362corresponding to the HTTP request @var{r}.
363@end defun
364
365@defun read-request-body/bytevector r
366Reads the request body from @var{r}, as a bytevector. Returns @code{#f}
367if there was no request body.
368@end defun
369
370@defun write-request-body/bytevector r bv
371Write @var{body}, a bytevector, to the port corresponding to the HTTP
372request @var{r}.
373@end defun
374
375@defun request-accept request [default='()]
376@defunx request-accept-charset request [default='()]
377@defunx request-accept-encoding request [default='()]
378@defunx request-accept-language request [default='()]
379@defunx request-allow request [default='()]
380@defunx request-authorization request [default=#f]
381@defunx request-cache-control request [default='()]
382@defunx request-connection request [default='()]
383@defunx request-content-encoding request [default='()]
384@defunx request-content-language request [default='()]
385@defunx request-content-length request [default=#f]
386@defunx request-content-location request [default=#f]
387@defunx request-content-md5 request [default=#f]
388@defunx request-content-range request [default=#f]
389@defunx request-content-type request [default=#f]
390@defunx request-date request [default=#f]
391@defunx request-expect request [default='()]
392@defunx request-expires request [default=#f]
393@defunx request-from request [default=#f]
394@defunx request-host request [default=#f]
395@defunx request-if-match request [default=#f]
396@defunx request-if-modified-since request [default=#f]
397@defunx request-if-none-match request [default=#f]
398@defunx request-if-range request [default=#f]
399@defunx request-if-unmodified-since request [default=#f]
400@defunx request-last-modified request [default=#f]
401@defunx request-max-forwards request [default=#f]
402@defunx request-pragma request [default='()]
403@defunx request-proxy-authorization request [default=#f]
404@defunx request-range request [default=#f]
405@defunx request-referer request [default=#f]
406@defunx request-te request [default=#f]
407@defunx request-trailer request [default='()]
408@defunx request-transfer-encoding request [default='()]
409@defunx request-upgrade request [default='()]
410@defunx request-user-agent request [default=#f]
411@defunx request-via request [default='()]
412@defunx request-warning request [default='()]
413@end defun
414
415@defun request-absolute-uri r [default-host] [default-port]
416@end defun
417
418
419
420@node Responses
421@subsection HTTP Responses
422
423@example
424(use-modules (web response))
425@end example
426
427
299cd1a2
AW
428@defun response?
429@end defun
8db7e094 430
299cd1a2
AW
431@defun response-version
432@end defun
8db7e094 433
299cd1a2
AW
434@defun response-code
435@end defun
8db7e094
AW
436
437@defun response-reason-phrase response
438Return the reason phrase given in @var{response}, or the standard reason
439phrase for the response's code.
440@end defun
441
299cd1a2
AW
442@defun response-headers
443@end defun
8db7e094 444
299cd1a2
AW
445@defun response-port
446@end defun
8db7e094
AW
447
448@defun read-response port
449Read an HTTP response from @var{port}, optionally attaching the given
450metadata, @var{meta}.
451
452As a side effect, sets the encoding on @var{port} to ISO-8859-1
453(latin-1), so that reading one character reads one byte. See the
454discussion of character sets in "HTTP Responses" in the manual, for more
455information.
456@end defun
457
458@defun build-response [#:version] [#:code] [#:reason-phrase] [#:headers] [#:port]
459Construct an HTTP response object. If @var{validate-headers?} is true,
460the headers are each run through their respective validators.
461@end defun
462
463@defun extend-response r k v . additional
464Extend an HTTP response by setting additional HTTP headers @var{k},
465@var{v}. Returns a new HTTP response.
466@end defun
467
468@defun adapt-response-version response version
469Adapt the given response to a different HTTP version. Returns a new HTTP
470response.
471
472The idea is that many applications might just build a response for the
473default HTTP version, and this method could handle a number of
474programmatic transformations to respond to older HTTP versions (0.9 and
4751.0). But currently this function is a bit heavy-handed, just updating
476the version field.
477@end defun
478
479@defun write-response r port
480Write the given HTTP response to @var{port}.
481
482Returns a new response, whose @code{response-port} will continue writing
483on @var{port}, perhaps using some transfer encoding.
484@end defun
485
486@defun read-response-body/latin-1 r
487Reads the response body from @var{r}, as a string.
488
489Assumes that the response port has ISO-8859-1 encoding, so that the
490number of characters to read is the same as the
491@code{response-content-length}. Returns @code{#f} if there was no
492response body.
493@end defun
494
495@defun write-response-body/latin-1 r body
496Write @var{body}, a string encodable in ISO-8859-1, to the port
497corresponding to the HTTP response @var{r}.
498@end defun
499
500@defun read-response-body/bytevector r
501Reads the response body from @var{r}, as a bytevector. Returns @code{#f}
502if there was no response body.
503@end defun
504
505@defun write-response-body/bytevector r bv
506Write @var{body}, a bytevector, to the port corresponding to the HTTP
507response @var{r}.
508@end defun
509
510@defun response-accept-ranges response [default=#f]
511@defunx response-age response [default='()]
512@defunx response-allow response [default='()]
513@defunx response-cache-control response [default='()]
514@defunx response-connection response [default='()]
515@defunx response-content-encoding response [default='()]
516@defunx response-content-language response [default='()]
517@defunx response-content-length response [default=#f]
518@defunx response-content-location response [default=#f]
519@defunx response-content-md5 response [default=#f]
520@defunx response-content-range response [default=#f]
521@defunx response-content-type response [default=#f]
522@defunx response-date response [default=#f]
523@defunx response-etag response [default=#f]
524@defunx response-expires response [default=#f]
525@defunx response-last-modified response [default=#f]
526@defunx response-location response [default=#f]
527@defunx response-pragma response [default='()]
528@defunx response-proxy-authenticate response [default=#f]
529@defunx response-retry-after response [default=#f]
530@defunx response-server response [default=#f]
531@defunx response-trailer response [default='()]
532@defunx response-transfer-encoding response [default='()]
533@defunx response-upgrade response [default='()]
534@defunx response-vary response [default='()]
535@defunx response-via response [default='()]
536@defunx response-warning response [default='()]
537@defunx response-www-authenticate response [default=#f]
538@end defun
539
540
541@node Web Handlers
542@subsection Web Handlers
543
544from request to response
545
546@node Web Server
547@subsection Web Server
548
549@code{(web server)} is a generic web server interface, along with a main
550loop implementation for web servers controlled by Guile.
551
552The lowest layer is the <server-impl> object, which defines a set of
553hooks to open a server, read a request from a client, write a
554response to a client, and close a server. These hooks -- open,
555read, write, and close, respectively -- are bound together in a
556<server-impl> object. Procedures in this module take a
557<server-impl> object, if needed.
558
559A <server-impl> may also be looked up by name. If you pass the
560@code{http} symbol to @code{run-server}, Guile looks for a variable named
561@code{http} in the @code{(web server http)} module, which should be bound to a
562<server-impl> object. Such a binding is made by instantiation of
563the @code{define-server-impl} syntax. In this way the run-server loop can
564automatically load other backends if available.
565
566The life cycle of a server goes as follows:
567
568@enumerate
569@item
570The @code{open} hook is called, to open the server. @code{open} takes 0 or
571more arguments, depending on the backend, and returns an opaque
572server socket object, or signals an error.
573
574@item
575The @code{read} hook is called, to read a request from a new client.
576The @code{read} hook takes one arguments, the server socket. It
577should return three values: an opaque client socket, the
578request, and the request body. The request should be a
579@code{<request>} object, from @code{(web request)}. The body should be a
580string or a bytevector, or @code{#f} if there is no body.
581
582If the read failed, the @code{read} hook may return #f for the client
583socket, request, and body.
584
585@item
586A user-provided handler procedure is called, with the request
587and body as its arguments. The handler should return two
588values: the response, as a @code{<response>} record from @code{(web
589response)}, and the response body as a string, bytevector, or
590@code{#f} if not present. We also allow the reponse to be simply an
591alist of headers, in which case a default response object is
592constructed with those headers.
593
594@item
595The @code{write} hook is called with three arguments: the client
596socket, the response, and the body. The @code{write} hook returns no
597values.
598
599@item
600At this point the request handling is complete. For a loop, we
601loop back and try to read a new request.
602
603@item
604If the user interrupts the loop, the @code{close} hook is called on
605the server socket.
606@end enumerate
607
299cd1a2
AW
608@defun define-server-impl name open read write close
609@end defun
8db7e094
AW
610
611@defun lookup-server-impl impl
612Look up a server implementation. If @var{impl} is a server
613implementation already, it is returned directly. If it is a symbol, the
614binding named @var{impl} in the @code{(web server @var{impl})} module is
615looked up. Otherwise an error is signaled.
616
617Currently a server implementation is a somewhat opaque type, useful only
618for passing to other procedures in this module, like @code{read-client}.
619@end defun
620
621@defun open-server impl open-params
622Open a server for the given implementation. Returns one value, the new
623server object. The implementation's @code{open} procedure is applied to
624@var{open-params}, which should be a list.
625@end defun
626
627@defun read-client impl server
628Read a new client from @var{server}, by applying the implementation's
629@code{read} procedure to the server. If successful, returns three
630values: an object corresponding to the client, a request object, and the
631request body. If any exception occurs, returns @code{#f} for all three
632values.
633@end defun
634
635@defun handle-request handler request body state
636Handle a given request, returning the response and body.
637
638The response and response body are produced by calling the given
639@var{handler} with @var{request} and @var{body} as arguments.
640
641The elements of @var{state} are also passed to @var{handler} as
642arguments, and may be returned as additional values. The new
643@var{state}, collected from the @var{handler}'s return values, is then
644returned as a list. The idea is that a server loop receives a handler
645from the user, along with whatever state values the user is interested
646in, allowing the user's handler to explicitly manage its state.
647@end defun
648
649@defun sanitize-response request response body
650"Sanitize" the given response and body, making them appropriate for the
651given request.
652
653As a convenience to web handler authors, @var{response} may be given as
654an alist of headers, in which case it is used to construct a default
655response. Ensures that the response version corresponds to the request
656version. If @var{body} is a string, encodes the string to a bytevector,
657in an encoding appropriate for @var{response}. Adds a
658@code{content-length} and @code{content-type} header, as necessary.
659
660If @var{body} is a procedure, it is called with a port as an argument,
661and the output collected as a bytevector. In the future we might try to
662instead use a compressing, chunk-encoded port, and call this procedure
663later, in the write-client procedure. Authors are advised not to rely on
664the procedure being called at any particular time.
665@end defun
666
667@defun write-client impl server client response body
668Write an HTTP response and body to @var{client}. If the server and
669client support persistent connections, it is the implementation's
670responsibility to keep track of the client thereafter, presumably by
671attaching it to the @var{server} argument somehow.
672@end defun
673
674@defun close-server impl server
675Release resources allocated by a previous invocation of
676@code{open-server}.
677@end defun
678
679@defun serve-one-client handler impl server state
680Read one request from @var{server}, call @var{handler} on the request
681and body, and write the response to the client. Returns the new state
682produced by the handler procedure.
683@end defun
684
685@defun run-server handler [impl] [open-params] . state
686Run Guile's built-in web server.
687
688@var{handler} should be a procedure that takes two or more arguments,
689the HTTP request and request body, and returns two or more values, the
690response and response body.
691
692For example, here is a simple "Hello, World!" server:
693
694@example
695 (define (handler request body)
696 (values '((content-type . ("text/plain")))
697 "Hello, World!"))
698 (run-server handler)
699@end example
700
701The response and body will be run through @code{sanitize-response}
702before sending back to the client.
703
704Additional arguments to @var{handler} are taken from @var{state}.
705Additional return values are accumulated into a new @var{state}, which
706will be used for subsequent requests. In this way a handler can
707explicitly manage its state.
708
709The default server implementation is @code{http}, which accepts
710@var{open-params} like @code{(#:port 8081)}, among others. See "Web
711Server" in the manual, for more information.
712@end defun
713
714@example
715(use-modules (web server))
716@end example
717
718
719@c Local Variables:
720@c TeX-master: "guile.texi"
721@c End: