X-Git-Url: http://git.hcoop.net/bpt/guile.git/blobdiff_plain/5cdab8b8f6088ac6b8f7f78b8c32201a92a84ccd..adf06a72d53821d34e14fa69b20b10e2f809f593:/doc/ref/web.texi diff --git a/doc/ref/web.texi b/doc/ref/web.texi index ea5cd4644..c59f9580d 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -1,6 +1,6 @@ @c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. -@c Copyright (C) 2010 Free Software Foundation, Inc. +@c Copyright (C) 2010, 2011, 2012, 2013 Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node Web @@ -9,37 +9,161 @@ @cindex WWW @cindex HTTP -When Guile started back in the mid-nineties, the GNU system was still -focused on producing a good POSIX implementation. This is why Guile's -POSIX support is good, and has been so for a while. - -But times change, and in a way these days the web is the new POSIX: a -standard and a motley set of implementations on which much computing is -done. So today's Guile also supports the web at the programming -language level, by defining common data types and operations for the -technologies underpinning the web: URIs, HTTP, and XML. - -It is particularly important to define native web data types. Though -the web is text in motion, programming the web in text is like -programming with @code{goto}: muddy, and error-prone. Most current -security problems on the web are due to treating the web as text instead -of as instances of the proper data types. - -In addition, common web data types help programmers to share code. - -Well. That's all very nice and opinionated and such, but how do I use -the thing? Read on! +It has always been possible to connect computers together and share +information between them, but the rise of the World Wide Web over the +last couple of decades has made it much easier to do so. The result is +a richly connected network of computation, in which Guile forms a part. + +By ``the web'', we mean the HTTP protocol@footnote{Yes, the P is for +protocol, but this phrase appears repeatedly in RFC 2616.} as handled by +servers, clients, proxies, caches, and the various kinds of messages and +message components that can be sent and received by that protocol, +notably HTML. + +On one level, the web is text in motion: the protocols themselves are +textual (though the payload may be binary), and it's possible to create +a socket and speak text to the web. But such an approach is obviously +primitive. This section details the higher-level data types and +operations provided by Guile: URIs, HTTP request and response records, +and a conventional web server implementation. + +The material in this section is arranged in ascending order, in which +later concepts build on previous ones. If you prefer to start with the +highest-level perspective, @pxref{Web Examples}, and work your way +back. @menu +* Types and the Web:: Types prevent bugs and security problems. * URIs:: Universal Resource Identifiers. * HTTP:: The Hyper-Text Transfer Protocol. * HTTP Headers:: How Guile represents specific header values. +* Transfer Codings:: HTTP Transfer Codings. * Requests:: HTTP requests. * Responses:: HTTP responses. +* Web Client:: Accessing web resources over HTTP. * Web Server:: Serving HTTP to the internet. * Web Examples:: How to use this thing. @end menu +@node Types and the Web +@subsection Types and the Web + +It is a truth universally acknowledged, that a program with good use of +data types, will be free from many common bugs. Unfortunately, the +common practice in web programming seems to ignore this maxim. This +subsection makes the case for expressive data types in web programming. + +By ``expressive data types'', we mean that the data types @emph{say} +something about how a program solves a problem. For example, if we +choose to represent dates using SRFI 19 date records (@pxref{SRFI-19}), +this indicates that there is a part of the program that will always have +valid dates. Error handling for a number of basic cases, like invalid +dates, occurs on the boundary in which we produce a SRFI 19 date record +from other types, like strings. + +With regards to the web, data types are helpful in the two broad phases +of HTTP messages: parsing and generation. + +Consider a server, which has to parse a request, and produce a response. +Guile will parse the request into an HTTP request object +(@pxref{Requests}), with each header parsed into an appropriate Scheme +data type. This transition from an incoming stream of characters to +typed data is a state change in a program---the strings might parse, or +they might not, and something has to happen if they do not. (Guile +throws an error in this case.) But after you have the parsed request, +``client'' code (code built on top of the Guile web framework) will not +have to check for syntactic validity. The types already make this +information manifest. + +This state change on the parsing boundary makes programs more robust, +as they themselves are freed from the need to do a number of common +error checks, and they can use normal Scheme procedures to handle a +request instead of ad-hoc string parsers. + +The need for types on the response generation side (in a server) is more +subtle, though not less important. Consider the example of a POST +handler, which prints out the text that a user submits from a form. +Such a handler might include a procedure like this: + +@example +;; First, a helper procedure +(define (para . contents) + (string-append "

" (string-concatenate contents) "

")) + +;; Now the meat of our simple web application +(define (you-said text) + (para "You said: " text)) + +(display (you-said "Hi!")) +@print{}

You said: Hi!

+@end example + +This is a perfectly valid implementation, provided that the incoming +text does not contain the special HTML characters @samp{<}, @samp{>}, or +@samp{&}. But this provision of a restricted character set is not +reflected anywhere in the program itself: we must @emph{assume} that the +programmer understands this, and performs the check elsewhere. + +Unfortunately, the short history of the practice of programming does not +bear out this assumption. A @dfn{cross-site scripting} (@acronym{XSS}) +vulnerability is just such a common error in which unfiltered user input +is allowed into the output. A user could submit a crafted comment to +your web site which results in visitors running malicious Javascript, +within the security context of your domain: + +@example +(display (you-said "