Use @copying, @title, @subtitle, @author.
[bpt/emacs.git] / doc / misc / url.texi
1 \input texinfo
2 @setfilename ../../info/url
3 @settitle URL Programmer's Manual
4
5 @iftex
6 @c @finalout
7 @end iftex
8 @c @setchapternewpage odd
9 @c @smallbook
10
11 @tex
12 \overfullrule=0pt
13 %\global\baselineskip 30pt % for printing in double space
14 @end tex
15 @dircategory World Wide Web
16 @dircategory GNU Emacs Lisp
17 @direntry
18 * URL: (url). URL loading package.
19 @end direntry
20
21 @copying
22 This file documents the URL loading package.
23
24 Copyright @copyright{} 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2002,
25 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
26
27 @quotation
28 Permission is granted to copy, distribute and/or modify this document
29 under the terms of the GNU Free Documentation License, Version 1.2 or
30 any later version published by the Free Software Foundation; with no
31 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
32 Texts. A copy of the license is included in the section entitled ``GNU
33 Free Documentation License''.
34 @end quotation
35 @end copying
36
37 @c
38 @titlepage
39 @title URL Programmer's Manual
40 @subtitle First Edition, URL Version 2.0
41 @author William M. Perry @email{wmperry@@gnu.org}
42 @author David Love @email{fx@@gnu.org}
43 @page
44 @vskip 0pt plus 1filll
45 @insertcopying
46 @end titlepage
47
48 @page
49 @node Top
50 @top URL
51
52
53 @menu
54 * Getting Started:: Preparing your program to use URLs.
55 * Retrieving URLs:: How to use this package to retrieve a URL.
56 * Supported URL Types:: Descriptions of URL types currently supported.
57 * Defining New URLs:: How to define a URL loader for a new protocol.
58 * General Facilities:: URLs can be cached, accessed via a gateway
59 and tracked in a history list.
60 * Customization:: Variables you can alter.
61 * GNU Free Documentation License:: The license for this documentation.
62 * Function Index::
63 * Variable Index::
64 * Concept Index::
65 @end menu
66
67 @node Getting Started
68 @chapter Getting Started
69 @cindex URLs, definition
70 @cindex URIs
71
72 @dfn{Uniform Resource Locators} (URLs) are a specific form of
73 @dfn{Uniform Resource Identifiers} (URI) described in RFC 2396 which
74 updates RFC 1738 and RFC 1808. RFC 2016 defines uniform resource
75 agents.
76
77 URIs have the form @var{scheme}:@var{scheme-specific-part}, where the
78 @var{scheme}s supported by this library are described below.
79 @xref{Supported URL Types}.
80
81 FTP, NFS, HTTP, HTTPS, @code{rlogin}, @code{telnet}, tn3270,
82 IRC and gopher URLs all have the form
83
84 @example
85 @var{scheme}://@r{[}@var{userinfo}@@@r{]}@var{hostname}@r{[}:@var{port}@r{]}@r{[}/@var{path}@r{]}
86 @end example
87 @noindent
88 where @samp{@r{[}} and @samp{@r{]}} delimit optional parts.
89 @var{userinfo} sometimes takes the form @var{username}:@var{password}
90 but you should beware of the security risks of sending cleartext
91 passwords. @var{hostname} may be a domain name or a dotted decimal
92 address. If the @samp{:@var{port}} is omitted then the library will
93 use the `well known' port for that service when accessing URLs. With
94 the possible exception of @code{telnet}, it is rare for ports to be
95 specified, and it is possible using a non-standard port may have
96 undesired consequences if a different service is listening on that
97 port (e.g., an HTTP URL specifying the SMTP port can cause mail to be
98 sent). @c , but @xref{Other Variables, url-bad-port-list}.
99 The meaning of the @var{path} component depends on the service.
100
101 @menu
102 * Configuration::
103 * Parsed URLs:: URLs are parsed into vector structures.
104 @end menu
105
106 @node Configuration
107 @section Configuration
108
109 @defvar url-configuration-directory
110 @cindex @file{~/.url}
111 @cindex configuration files
112 The directory in which URL configuration files, the cache etc.,
113 reside. Default @file{~/.url}.
114 @end defvar
115
116 @node Parsed URLs
117 @section Parsed URLs
118 @cindex parsed URLs
119 The library functions typically operate on @dfn{parsed} versions of
120 URLs. These are actually vectors of the form:
121
122 @example
123 [@var{type} @var{user} @var{password} @var{host} @var{port} @var{file} @var{target} @var{attributes} @var{full}]
124 @end example
125
126 @noindent where
127 @table @var
128 @item type
129 is the type of the URL scheme, e.g., @code{http}
130 @item user
131 is the username associated with it, or @code{nil};
132 @item password
133 is the user password associated with it, or @code{nil};
134 @item host
135 is the host name associated with it, or @code{nil};
136 @item port
137 is the port number associated with it, or @code{nil};
138 @item file
139 is the `file' part of it, or @code{nil}. This doesn't necessarily
140 actually refer to a file;
141 @item target
142 is the target part, or @code{nil};
143 @item attributes
144 is the attributes associated with it, or @code{nil};
145 @item full
146 is @code{t} for a fully-specified URL, with a host part indicated by
147 @samp{//} after the scheme part.
148 @end table
149
150 @findex url-type
151 @findex url-user
152 @findex url-password
153 @findex url-host
154 @findex url-port
155 @findex url-file
156 @findex url-target
157 @findex url-attributes
158 @findex url-full
159 @findex url-set-type
160 @findex url-set-user
161 @findex url-set-password
162 @findex url-set-host
163 @findex url-set-port
164 @findex url-set-file
165 @findex url-set-target
166 @findex url-set-attributes
167 @findex url-set-full
168 These attributes have accessors named @code{url-@var{part}}, where
169 @var{part} is the name of one of the elements above, e.g.,
170 @code{url-host}. Similarly, there are setters of the form
171 @code{url-set-@var{part}}.
172
173 There are functions for parsing and unparsing between the string and
174 vector forms.
175
176 @defun url-generic-parse-url url
177 Return a parsed version of the string @var{url}.
178 @end defun
179
180 @defun url-recreate-url url
181 @cindex unparsing URLs
182 Recreates a URL string from the parsed @var{url}.
183 @end defun
184
185 @node Retrieving URLs
186 @chapter Retrieving URLs
187
188 @defun url-retrieve-synchronously url
189 Retrieve @var{url} synchronously and return a buffer containing the
190 data. @var{url} is either a string or a parsed URL structure. Return
191 @code{nil} if there are no data associated with it (the case for dired,
192 info, or mailto URLs that need no further processing).
193 @end defun
194
195 @defun url-retrieve url callback &optional cbargs
196 Retrieve @var{url} asynchronously and call @var{callback} with args
197 @var{cbargs} when finished. The callback is called when the object
198 has been completely retrieved, with the current buffer containing the
199 object and any MIME headers associated with it. @var{url} is either a
200 string or a parsed URL structure. Returns the buffer @var{url} will
201 load into, or @code{nil} if the process has already completed.
202 @end defun
203
204 @node Supported URL Types
205 @chapter Supported URL Types
206
207 @menu
208 * http/https:: Hypertext Transfer Protocol.
209 * file/ftp:: Local files and FTP archives.
210 * info:: Emacs `Info' pages.
211 * mailto:: Sending email.
212 * news/nntp/snews:: Usenet news.
213 * rlogin/telnet/tn3270:: Remote host connectivity.
214 * irc:: Internet Relay Chat.
215 * data:: Embedded data URLs.
216 * nfs:: Networked File System
217 @c * finger::
218 @c * gopher::
219 @c * netrek::
220 @c * prospero::
221 * cid:: Content-ID.
222 * about::
223 * ldap:: Lightweight Directory Access Protocol
224 * imap:: IMAP mailboxes.
225 * man:: Unix man pages.
226 @end menu
227
228 @node http/https
229 @section @code{http} and @code{https}
230
231 The scheme @code{http} is Hypertext Transfer Protocol. The library
232 supports version 1.1, specified in RFC 2616. (This supersedes 1.0,
233 defined in RFC 1945) HTTP URLs have the following form, where most of
234 the parts are optional:
235 @example
236 http://@var{user}:@var{password}@@@var{host}:@var{port}/@var{path}?@var{searchpart}#@var{fragment}
237 @end example
238 @c The @code{:@var{port}} part is optional, and @var{port} defaults to
239 @c 80. The @code{/@var{path}} part, if present, is a slash-separated
240 @c series elements. The @code{?@var{searchpart}}, if present, is the
241 @c query for a search or the content of a form submission. The
242 @c @code{#fragment} part, if present, is a location in the document.
243
244 The scheme @code{https} is a secure version of @code{http}, with
245 transmission via SSL. It is defined in RFC 2069. Its default port is
246 443. This scheme depends on SSL support in Emacs via the
247 @file{ssl.el} library and is actually implemented by forcing the
248 @code{ssl} gateway method to be used. @xref{Gateways in general}.
249
250 @defopt url-honor-refresh-requests
251 This controls honouring of HTTP @samp{Refresh} headers by which
252 servers can direct clients to reload documents from the same URL or a
253 or different one. @code{nil} means they will not be honoured,
254 @code{t} (the default) means they will always be honoured, and
255 otherwise the user will be asked on each request.
256 @end defopt
257
258
259 @menu
260 * Cookies::
261 * HTTP language/coding::
262 * HTTP URL Options::
263 * Dealing with HTTP documents::
264 @end menu
265
266 @node Cookies
267 @subsection Cookies
268
269 @defopt url-cookie-file
270 The file in which cookies are stored, defaulting to @file{cookies} in
271 the directory specified by @code{url-configuration-directory}.
272 @end defopt
273
274 @defopt url-cookie-confirmation
275 Specifies whether confirmation is require to accept cookies.
276 @end defopt
277
278 @defopt url-cookie-multiple-line
279 Specifies whether to put all cookies for the server on one line in the
280 HTTP request to satisfy broken servers like
281 @url{http://www.hotmail.com}.
282 @end defopt
283
284 @defopt url-cookie-trusted-urls
285 A list of regular expressions matching URLs from which to accept
286 cookies always.
287 @end defopt
288
289 @defopt url-cookie-untrusted-urls
290 A list of regular expressions matching URLs from which to reject
291 cookies always.
292 @end defopt
293
294 @defopt url-cookie-save-interval
295 The number of seconds between automatic saves of cookies to disk.
296 Default is one hour.
297 @end defopt
298
299
300 @node HTTP language/coding
301 @subsection Language and Encoding Preferences
302
303 HTTP allows clients to express preferences for the language and
304 encoding of documents which servers may honour. For each of these
305 variables, the value is a string; it can specify a single choice, or
306 it can be a comma-separated list.
307
308 Normally this list ordered by descending preference. However, each
309 element can be followed by @samp{;q=@var{priority}} to specify its
310 preference level, a decimal number from 0 to 1; e.g., for
311 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
312 en;q=0.7"}}. An element that has no @samp{;q} specification has
313 preference level 1.
314
315 @defopt url-mime-charset-string
316 @cindex character sets
317 @cindex coding systems
318 This variable specifies a preference for character sets when documents
319 can be served in more than one encoding.
320
321 HTTP allows specifying a series of MIME charsets which indicate your
322 preferred character set encodings, e.g., Latin-9 or Big5, and these
323 can be weighted. The default series is generated automatically from
324 the associated MIME types of all defined coding systems, sorted by the
325 coding system priority specified in Emacs. @xref{Recognize Coding, ,
326 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
327 @end defopt
328
329 @defopt url-mime-language-string
330 @cindex language preferences
331 A string specifying the preferred language when servers can serve
332 files in several languages. Use RFC 1766 abbreviations, e.g.,
333 @samp{en} for English, @samp{de} for German.
334
335 The string can be @code{"*"} to get the first available language (as
336 opposed to the default).
337 @end defopt
338
339 @node HTTP URL Options
340 @subsection HTTP URL Options
341
342 HTTP supports an @samp{OPTIONS} method describing things supported by
343 the URL@.
344
345 @defun url-http-options url
346 Returns a property list describing options available for URL. The
347 property list members are:
348
349 @table @code
350 @item methods
351 A list of symbols specifying what HTTP methods the resource
352 supports.
353
354 @item dav
355 @cindex DAV
356 A list of numbers specifying what DAV protocol/schema versions are
357 supported.
358
359 @item dasl
360 @cindex DASL
361 A list of supported DASL search types supported (string form).
362
363 @item ranges
364 A list of the units available for use in partial document fetches.
365
366 @item p3p
367 @cindex P3P
368 The @dfn{Platform For Privacy Protection} description for the resource.
369 Currently this is just the raw header contents.
370 @end table
371
372 @end defun
373
374 @node Dealing with HTTP documents
375 @subsection Dealing with HTTP documents
376
377 HTTP URLs are retrieved into a buffer containing the HTTP headers
378 followed by the body. Since the headers are quasi-MIME, they may be
379 processed using the MIME library. @xref{Top,, Emacs MIME,
380 emacs-mime, The Emacs MIME Manual}. The URL package provides a
381 function to do this in general:
382
383 @defun url-decode-text-part handle &optional coding
384 This function decodes charset-encoded text in the current buffer. In
385 Emacs, the buffer is expected to be unibyte initially and is set to
386 multibyte after decoding.
387 HANDLE is the MIME handle of the original part. CODING is an explicit
388 coding to use, overriding what the MIME headers specify.
389 The coding system used for the decoding is returned.
390
391 Note that this function doesn't deal with @samp{http-equiv} charset
392 specifications in HTML @samp{<meta>} elements.
393 @end defun
394
395 @node file/ftp
396 @section file and ftp
397 @cindex files
398 @cindex FTP
399 @cindex File Transfer Protocol
400 @cindex compressed files
401 @cindex dired
402
403 @example
404 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
405 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
406 @end example
407
408 These schemes are defined in RFC 1808.
409 @samp{ftp:} and @samp{file:} are synonymous in this library. They
410 allow reading arbitrary files from hosts. Either @samp{ange-ftp}
411 (Emacs) or @samp{efs} (XEmacs) is used to retrieve them from remote
412 hosts. Local files are accessed directly.
413
414 Compressed files are handled, but support is hard-coded so that
415 @code{jka-compr-compression-info-list} and so on have no affect.
416 Suffixes recognized are @samp{.z}, @samp{.gz}, @samp{.Z} and
417 @samp{.bz2}.
418
419 @defopt url-directory-index-file
420 The filename to look for when indexing a directory, default
421 @samp{"index.html"}. If this file exists, and is readable, then it
422 will be viewed instead of using @code{dired} to view the directory.
423 @end defopt
424
425 @node info
426 @section info
427 @cindex Info
428 @cindex Texinfo
429 @findex Info-goto-node
430
431 @example
432 info:@var{file}#@var{node}
433 @end example
434
435 Info URLs are not officially defined. They invoke
436 @code{Info-goto-node} with argument @samp{(@var{file})@var{node}}.
437 @samp{#@var{node}} is optional, defaulting to @samp{Top}.
438
439 @node mailto
440 @section mailto
441
442 @cindex mailto
443 @cindex email
444 A mailto URL will send an email message to the address in the
445 URL, for example @samp{mailto:foo@@bar.com} would compose a
446 message to @samp{foo@@bar.com}.
447
448 @defopt url-mail-command
449 @vindex mail-user-agent
450 The function called whenever url needs to send mail. This should
451 normally be left to default from @var{mail-user-agent}. @xref{Mail
452 Methods, , Mail-Composition Methods, emacs, The GNU Emacs Manual}.
453 @end defopt
454
455 An @samp{X-Url-From} header field containing the URL of the document
456 that contained the mailto URL is added if that URL is known.
457
458 RFC 2368 extends the definition of mailto URLs in RFC 1738.
459 The form of a mailto URL is
460 @example
461 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
462 @end example
463 @noindent where an arbitrary number of @var{header}s can be added. If the
464 @var{header} is @samp{body}, then @var{contents} is put in the body
465 otherwise a @var{header} header field is created with @var{contents}
466 as its contents. Note that the URL library does not consider any
467 headers `dangerous' so you should check them before sending the
468 message.
469
470 @c Fixme: update
471 Email messages are defined in @sc{rfc}822.
472
473 @node news/nntp/snews
474 @section @code{news}, @code{nntp} and @code{snews}
475 @cindex news
476 @cindex network news
477 @cindex usenet
478 @cindex NNTP
479 @cindex snews
480
481 @c draft-gilman-news-url-01
482 The network news URL scheme take the following forms following RFC
483 1738 except that for compatibility with other clients, host and port
484 fields may be included in news URLs though they are properly only
485 allowed for nntp an snews.
486
487 @table @samp
488 @item news:@var{newsgroup}
489 Retrieves a list of messages in @var{newsgroup};
490 @item news:@var{message-id}
491 Retrieves the message with the given @var{message-id};
492 @item news:*
493 Retrieves a list of all available newsgroups;
494 @item nntp://@var{host}:@var{port}/@var{newsgroup}
495 @itemx nntp://@var{host}:@var{port}/@var{message-id}
496 @itemx nntp://@var{host}:@var{port}/*
497 Similar to the @samp{news} versions.
498 @end table
499
500 @samp{:@var{port}} is optional and defaults to :119.
501
502 @samp{snews} is the same as @samp{nntp} except that the default port
503 is :563.
504 @cindex SSL
505 (It is tunneled through SSL.)
506
507 An @samp{nntp} URL is the same as a news URL, except that the URL may
508 specify an article by its number.
509
510 @defopt url-news-server
511 This variable can be used to override the default news server.
512 Usually this will be set by the Gnus package, which is used to fetch
513 news.
514 @cindex environment variable
515 @vindex NNTPSERVER
516 It may be set from the conventional environment variable
517 @code{NNTPSERVER}.
518 @end defopt
519
520 @node rlogin/telnet/tn3270
521 @section rlogin, telnet and tn3270
522 @cindex rlogin
523 @cindex telnet
524 @cindex tn3270
525 @cindex terminal emulation
526 @findex terminal-emulator
527
528 These URL schemes from RFC 1738 for logon via a terminal emulator have
529 the form
530 @example
531 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
532 @end example
533 but the @code{:@var{password}} component is ignored.
534
535 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
536 @code{telnet} or @code{tn3270} (the program names and arguments are
537 hardcoded) session is run in a @code{terminal-emulator} buffer.
538 Well-known ports are used if the URL does not specify a port.
539
540 @node irc
541 @section irc
542 @cindex IRC
543 @cindex Internet Relay Chat
544 @cindex ZEN IRC
545 @cindex ERC
546 @cindex rcirc
547 @c Fixme: reference (was http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt)
548 @dfn{Internet Relay Chat} (IRC) is handled by handing off the @sc{irc}
549 session to a function named in @code{url-irc-function}.
550
551 @defopt url-irc-function
552 A function to actually open an IRC connection.
553 This function
554 must take five arguments, @var{host}, @var{port}, @var{channel},
555 @var{user} and @var{password}. The @var{channel} argument specifies the
556 channel to join immediately, this can be @code{nil}. By default this is
557 @code{url-irc-rcirc}.
558 @end defopt
559 @defun url-irc-rcirc host port channel user password
560 Processes the arguments and lets @code{rcirc} handle the session.
561 @end defun
562 @defun url-irc-erc host port channel user password
563 Processes the arguments and lets @code{ERC} handle the session.
564 @end defun
565 @defun url-irc-zenirc host port channel user password
566 Processes the arguments and lets @code{zenirc} handle the session.
567 @end defun
568
569 @node data
570 @section data
571 @cindex data URLs
572
573 @example
574 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
575 @end example
576
577 Data URLs contain MIME data in the URL itself. They are defined in
578 RFC 2397.
579
580 @var{media-type} is a MIME @samp{Content-Type} string, possibly
581 including parameters. It defaults to
582 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
583 omitted but the charset parameter supplied. If @samp{;base64} is
584 present, the @var{data} are base64-encoded.
585
586 @node nfs
587 @section nfs
588 @cindex NFS
589 @cindex Network File System
590 @cindex automounter
591
592 @example
593 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
594 @end example
595
596 The @samp{nfs:} scheme is defined in RFC 2224. It is similar to
597 @samp{ftp:} except that it points to a file on a remote host that is
598 handled by the automounter on the local host.
599
600 @defvar url-nfs-automounter-directory-spec
601 @end defvar
602 A string saying how to invoke the NFS automounter. Certain @samp{%}
603 sequences are recognized:
604
605 @table @samp
606 @item %h
607 The hostname of the NFS server;
608 @item %n
609 The port number of the NFS server;
610 @item %u
611 The username to use to authenticate;
612 @item %p
613 The password to use to authenticate;
614 @item %f
615 The filename on the remote server;
616 @item %%
617 A literal @samp{%}.
618 @end table
619
620 Each can be used any number of times.
621
622 @node cid
623 @section cid
624 @cindex Content-ID
625
626 RFC 2111
627
628 @node about
629 @section about
630
631 @node ldap
632 @section ldap
633 @cindex LDAP
634 @cindex Lightweight Directory Access Protocol
635
636 The LDAP scheme is defined in RFC 2255.
637
638 @node imap
639 @section imap
640 @cindex IMAP
641
642 RFC 2192
643
644 @node man
645 @section man
646 @cindex @command{man}
647 @cindex Unix man pages
648 @findex man
649
650 @example
651 @samp{man:@var{page-spec}}
652 @end example
653
654 This is a non-standard scheme. @var{page-spec} is passed directly to
655 the Lisp @code{man} function.
656
657 @node Defining New URLs
658 @chapter Defining New URLs
659
660 @menu
661 * Naming conventions::
662 * Required functions::
663 * Optional functions::
664 * Asynchronous fetching::
665 * Supporting file-name-handlers::
666 @end menu
667
668 @node Naming conventions
669 @section Naming conventions
670
671 @node Required functions
672 @section Required functions
673
674 @node Optional functions
675 @section Optional functions
676
677 @node Asynchronous fetching
678 @section Asynchronous fetching
679
680 @node Supporting file-name-handlers
681 @section Supporting file-name-handlers
682
683 @node General Facilities
684 @chapter General Facilities
685
686 @menu
687 * Disk Caching::
688 * Proxies::
689 * Gateways in general::
690 * History::
691 @end menu
692
693 @node Disk Caching
694 @section Disk Caching
695 @cindex Caching
696 @cindex Persistent Cache
697 @cindex Disk Cache
698
699 The disk cache stores retrieved documents locally, whence they can be
700 retrieved more quickly. When requesting a URL that is in the cache,
701 the library checks to see if the page has changed since it was last
702 retrieved from the remote machine. If not, the local copy is used,
703 saving the transmission over the network.
704 @cindex Cleaning the cache
705 @cindex Clearing the cache
706 @cindex Cache cleaning
707 Currently the cache isn't cleared automatically.
708 @c Running the @code{clean-cache} shell script
709 @c fist is recommended, to allow for future cleaning of the cache. This
710 @c shell script will remove all files that have not been accessed since it
711 @c was last run. To keep the cache pared down, it is recommended that this
712 @c script be run from @i{at} or @i{cron} (see the manual pages for
713 @c crontab(5) or at(1) for more information)
714
715 @defopt url-automatic-caching
716 Setting this variable non-@code{nil} causes documents to be cached
717 automatically.
718 @end defopt
719
720 @defopt url-cache-directory
721 This variable specifies the
722 directory to store the cache files. It defaults to sub-directory
723 @file{cache} of @code{url-configuration-directory}.
724 @end defopt
725
726 @c Fixme: function v. option, but neither used.
727 @c @findex url-cache-expired
728 @c @defopt url-cache-expired
729 @c This is a function to decide whether or not a cache entry has expired.
730 @c It takes two times as it parameters and returns non-@code{nil} if the
731 @c second time is ``too old'' when compared with the first time.
732 @c @end defopt
733
734 @defopt url-cache-creation-function
735 The cache relies on a scheme for mapping URLs to files in the cache.
736 This variable names a function which sets the type of cache to use.
737 It takes a URL as argument and returns the absolute file name of the
738 corresponding cache file. The two supplied possibilities are
739 @code{url-cache-create-filename-using-md5} and
740 @code{url-cache-create-filename-human-readable}.
741 @end defopt
742
743 @defun url-cache-create-filename-using-md5 url
744 Creates a cache file name from @var{url} using MD5 hashing.
745 This is creates entries with very few cache collisions and is fast.
746 @cindex MD5
747 @smallexample
748 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
749 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
750 @end smallexample
751 @end defun
752
753 @defun url-cache-create-filename-human-readable url
754 Creates a cache file name from @var{url} more obviously connected to
755 @var{url} than for @code{url-cache-create-filename-using-md5}, but
756 more likely to conflict with other files.
757 @smallexample
758 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
759 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
760 @end smallexample
761 @end defun
762
763 @c Fixme: never actually used currently?
764 @c @defopt url-standalone-mode
765 @c @cindex Relying on cache
766 @c @cindex Cache only mode
767 @c @cindex Standalone mode
768 @c If this variable is non-@code{nil}, the library relies solely on the
769 @c cache for fetching documents and avoids checking if they have changed
770 @c on remote servers.
771 @c @end defopt
772
773 @c With a large cache of documents on the local disk, it can be very handy
774 @c when traveling, or any other time the network connection is not active
775 @c (a laptop with a dial-on-demand PPP connection, etc). Emacs/W3 can rely
776 @c solely on its cache, and avoid checking to see if the page has changed
777 @c on the remote server. In the case of a dial-on-demand PPP connection,
778 @c this will keep the phone line free as long as possible, only bringing up
779 @c the PPP connection when asking for a page that is not located in the
780 @c cache. This is very useful for demonstrations as well.
781
782 @node Proxies
783 @section Proxies and Gatewaying
784
785 @c fixme: check/document url-ns stuff
786 @cindex proxy servers
787 @cindex proxies
788 @cindex environment variables
789 @vindex HTTP_PROXY
790 Proxy servers are commonly used to provide gateways through firewalls
791 or as caches serving some more-or-less local network. Each protocol
792 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
793 conventionally configured commonly amongst different programs through
794 environment variables of the form @code{@var{protocol}_proxy}, where
795 @var{protocol} is one of the supported network protocols (@code{http},
796 @code{ftp} etc.). The library recognizes such variables in either
797 upper or lower case. Their values are of one of the forms:
798 @itemize @bullet
799 @item @code{@var{host}:@var{port}}
800 @item A full URL;
801 @item Simply a host name.
802 @end itemize
803
804 @vindex NO_PROXY
805 The @code{NO_PROXY} environment variable specifies URLs that should be
806 excluded from proxying (on servers that should be contacted directly).
807 This should be a comma-separated list of hostnames, domain names, or a
808 mixture of both. Asterisks can be used as wildcards, but other
809 clients may not support that. Domain names may be indicated by a
810 leading dot. For example:
811 @example
812 NO_PROXY="*.aventail.com,home.com,.seanet.com"
813 @end example
814 @noindent says to contact all machines in the @samp{aventail.com} and
815 @samp{seanet.com} domains directly, as well as the machine named
816 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
817 and @code{no_proxy} are also tried, in that order.
818
819 Proxies may also be specified directly in Lisp.
820
821 @defopt url-proxy-services
822 This variable is an alist of URL schemes and proxy servers that
823 gateway them. The items are of the form @w{@code{(@var{scheme}
824 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
825 gatewayed through @var{portnumber} on the specified @var{host}. An
826 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
827 a regexp matching host names not to be proxied. This variable is
828 initialized from the environment as above.
829
830 @example
831 (setq url-proxy-services
832 '(("http" . "proxy.aventail.com:80")
833 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
834 @end example
835 @end defopt
836
837 @node Gateways in general
838 @section Gateways in General
839 @cindex gateways
840 @cindex firewalls
841
842 The library provides a general gateway layer through which all
843 networking passes. It can both control access to the network and
844 provide access through gateways in firewalls. This may make direct
845 connections in some cases and pass through some sort of gateway in
846 others.@footnote{Proxies (which only operate over HTTP) are
847 implemented using this.} The library's basic function responsible for
848 making connections is @code{url-open-stream}.
849
850 @defun url-open-stream name buffer host service
851 @cindex opening a stream
852 @cindex stream, opening
853 Open a stream to @var{host}, possibly via a gateway. The other
854 arguments are as for @code{open-network-stream}. This will not make a
855 connection if @code{url-gateway-unplugged} is non-@code{nil}.
856 @end defun
857
858 @defvar url-gateway-local-host-regexp
859 This is a regular expression that matches local hosts that do not
860 require the use of a gateway. If @code{nil}, all connections are made
861 through the gateway.
862 @end defvar
863
864 @defvar url-gateway-method
865 This variable controls which gateway method is used. It may be useful
866 to bind it temporarily in some applications. It has values taken from
867 a list of symbols. Possible values are:
868
869 @table @code
870 @item telnet
871 @cindex @command{telnet}
872 Use this method if you must first telnet and log into a gateway host,
873 and then run telnet from that host to connect to outside machines.
874
875 @item rlogin
876 @cindex @command{rlogin}
877 This method is identical to @code{telnet}, but uses @command{rlogin}
878 to log into the remote machine without having to send the username and
879 password over the wire every time.
880
881 @item socks
882 @cindex @sc{socks}
883 Use if the firewall has a @sc{socks} gateway running on it. The
884 @sc{socks} v5 protocol is defined in RFC 1928.
885
886 @c @item ssl
887 @c This probably shouldn't be documented
888 @c Fixme: why not? -- fx
889
890 @item native
891 This method uses Emacs's builtin networking directly. This is the
892 default. It can be used only if there is no firewall blocking access.
893 @end table
894 @end defvar
895
896 The following variables control the gateway methods.
897
898 @defopt url-gateway-telnet-host
899 The gateway host to telnet to. Once logged in there, you then telnet
900 out to the hosts you want to connect to.
901 @end defopt
902 @defopt url-gateway-telnet-parameters
903 This should be a list of parameters to pass to the @command{telnet} program.
904 @end defopt
905 @defopt url-gateway-telnet-password-prompt
906 This is a regular expression that matches the password prompt when
907 logging in.
908 @end defopt
909 @defopt url-gateway-telnet-login-prompt
910 This is a regular expression that matches the username prompt when
911 logging in.
912 @end defopt
913 @defopt url-gateway-telnet-user-name
914 The username to log in with.
915 @end defopt
916 @defopt url-gateway-telnet-password
917 The password to send when logging in.
918 @end defopt
919 @defopt url-gateway-prompt-pattern
920 This is a regular expression that matches the shell prompt.
921 @end defopt
922
923 @defopt url-gateway-rlogin-host
924 Host to @samp{rlogin} to before telnetting out.
925 @end defopt
926 @defopt url-gateway-rlogin-parameters
927 Parameters to pass to @samp{rsh}.
928 @end defopt
929 @defopt url-gateway-rlogin-user-name
930 User name to use when logging in to the gateway.
931 @end defopt
932 @defopt url-gateway-prompt-pattern
933 This is a regular expression that matches the shell prompt.
934 @end defopt
935
936 @defopt socks-server
937 This specifies the default server, it takes the form
938 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
939 where @var{version} can be either 4 or 5.
940 @end defopt
941 @defvar socks-password
942 If this is @code{nil} then you will be asked for the password,
943 otherwise it will be used as the password for authenticating you to
944 the @sc{socks} server.
945 @end defvar
946 @defvar socks-username
947 This is the username to use when authenticating yourself to the
948 @sc{socks} server. By default this is your login name.
949 @end defvar
950 @defvar socks-timeout
951 This controls how long, in seconds, to wait for responses from the
952 @sc{socks} server; it is 5 by default.
953 @end defvar
954 @c fixme: these have been effectively commented-out in the code
955 @c @defopt socks-server-aliases
956 @c This a list of server aliases. It is a list of aliases of the form
957 @c @var{(alias hostname port version)}.
958 @c @end defopt
959 @c @defopt socks-network-aliases
960 @c This a list of network aliases. Each entry in the list takes the form
961 @c @var{(alias (network))} where @var{alias} is a string that names the
962 @c @var{network}. The networks can contain a pair (not a dotted pair) of
963 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
964 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
965 @c address.
966 @c @end defopt
967 @c @defopt socks-redirection-rules
968 @c This a list of redirection rules. Each rule take the form
969 @c @var{(Destination network Connection type)} where @var{Destination
970 @c network} is a network alias from @code{socks-network-aliases} and
971 @c @var{Connection type} can be @code{nil} in which case a direct
972 @c connection is used, or it can be an alias from
973 @c @code{socks-server-aliases} in which case that server is used as a
974 @c proxy.
975 @c @end defopt
976 @defopt socks-nslookup-program
977 @cindex @command{nslookup}
978 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
979 @end defopt
980
981 @menu
982 * Suppressing network connections::
983 @end menu
984 @c * Broken hostname resolution::
985
986 @node Suppressing network connections
987 @subsection Suppressing Network Connections
988
989 @cindex network connections, suppressing
990 @cindex suppressing network connections
991 @cindex bugs, HTML
992 @cindex HTML `bugs'
993 In some circumstances it is desirable to suppress making network
994 connections. A typical case is when rendering HTML in a mail user
995 agent, when external URLs should not be activated, particularly to
996 avoid `bugs' which `call home' by fetch single-pixel images and the
997 like. To arrange this, bind the following variable for the duration
998 of such processing.
999
1000 @defvar url-gateway-unplugged
1001 If this variable is non-@code{nil} new network connections are never
1002 opened by the URL library.
1003 @end defvar
1004
1005 @c @node Broken hostname resolution
1006 @c @subsection Broken Hostname Resolution
1007
1008 @c @cindex hostname resolver
1009 @c @cindex resolver, hostname
1010 @c Some C libraries do not include the hostname resolver routines in
1011 @c their static libraries. If Emacs was linked statically, and was not
1012 @c linked with the resolver libraries, it will not be able to get to any
1013 @c machines off the local network. This is characterized by being able
1014 @c to reach someplace with a raw ip number, but not its hostname
1015 @c (@url{http://129.79.254.191/} works, but
1016 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1017 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1018 @c rebuilt linked against the resolver library, it can use the external
1019 @c @command{nslookup} program instead.
1020
1021 @c @defopt url-gateway-broken-resolution
1022 @c @cindex @code{nslookup} program
1023 @c @cindex program, @code{nslookup}
1024 @c If non-@code{nil}, this variable says to use the program specified by
1025 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1026 @c @end defopt
1027
1028 @c @defopt url-gateway-nslookup-program
1029 @c The name of the program to do hostname lookup if Emacs can't do it
1030 @c directly. This program should expect a single argument on the command
1031 @c line---the hostname to resolve---and should produce output similar to
1032 @c the standard Unix @command{nslookup} program:
1033 @c @example
1034 @c Name: www.cs.indiana.edu
1035 @c Address: 129.79.254.191
1036 @c @end example
1037 @c @end defopt
1038
1039 @node History
1040 @section History
1041
1042 @findex url-do-setup
1043 The library can maintain a global history list tracking URLs accessed.
1044 URL completion can be done from it. The history mechanism is set up
1045 automatically via @code{url-do-setup} when it is configured to be on.
1046 Note that the size of the history list is currently not limited.
1047
1048 @vindex url-history-hash-table
1049 The history `list' is actually a hash table,
1050 @code{url-history-hash-table}. It contains access times keyed by URL
1051 strings. The times are in the format returned by @code{current-time}.
1052
1053 @defun url-history-update-url url time
1054 This function updates the history table with an entry for @var{url}
1055 accessed at the given @var{time}.
1056 @end defun
1057
1058 @defopt url-history-track
1059 If non-@code{nil}, the library will keep track of all the URLs
1060 accessed. If it is @code{t}, the list is saved to disk at the end of
1061 each Emacs session. The default is @code{nil}.
1062 @end defopt
1063
1064 @defopt url-history-file
1065 The file storing the history list between sessions. It defaults to
1066 @file{history} in @code{url-configuration-directory}.
1067 @end defopt
1068
1069 @defopt url-history-save-interval
1070 @findex url-history-setup-save-timer
1071 The number of seconds between automatic saves of the history list.
1072 Default is one hour. Note that if you change this variable directly,
1073 rather than using Custom, after @code{url-do-setup} has been run, you
1074 need to run the function @code{url-history-setup-save-timer}.
1075 @end defopt
1076
1077 @defun url-history-parse-history &optional fname
1078 Parses the history file @var{fname} (default @code{url-history-file})
1079 and sets up the history list.
1080 @end defun
1081
1082 @defun url-history-save-history &optional fname
1083 Saves the current history to file @var{fname} (default
1084 @code{url-history-file}).
1085 @end defun
1086
1087 @defun url-completion-function string predicate function
1088 You can use this function to do completion of URLs from the history.
1089 @end defun
1090
1091 @node Customization
1092 @chapter Customization
1093
1094 @section Environment Variables
1095
1096 @cindex environment variables
1097 The following environment variables affect the library's operation at
1098 startup.
1099
1100 @table @code
1101 @item TMPDIR
1102 @vindex TMPDIR
1103 @vindex url-temporary-directory
1104 If this is defined, @var{url-temporary-directory} is initialized from
1105 it.
1106 @end table
1107
1108 @section General User Options
1109
1110 The following user options, settable with Customize, affect the
1111 general operation of the package.
1112
1113 @defopt url-debug
1114 @cindex debugging
1115 Specifies the types of debug messages the library which are logged to
1116 the @code{*URL-DEBUG*} buffer.
1117 @code{t} means log all messages.
1118 A number means log all messages and show them with @code{message}.
1119 If may also be a list of the types of messages to be logged.
1120 @end defopt
1121 @defopt url-personal-mail-address
1122 @end defopt
1123 @defopt url-privacy-level
1124 @end defopt
1125 @defopt url-uncompressor-alist
1126 @end defopt
1127 @defopt url-passwd-entry-func
1128 @end defopt
1129 @defopt url-standalone-mode
1130 @end defopt
1131 @defopt url-bad-port-list
1132 @end defopt
1133 @defopt url-max-password-attempts
1134 @end defopt
1135 @defopt url-temporary-directory
1136 @end defopt
1137 @defopt url-show-status
1138 @end defopt
1139 @defopt url-confirmation-func
1140 The function to use for asking yes or no functions. This is normally
1141 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1142 function taking a single argument (the prompt) and returning @code{t}
1143 only if an affirmative answer is given.
1144 @end defopt
1145 @defopt url-gateway-method
1146 @c fixme: describe gatewaying
1147 A symbol specifying the type of gateway support to use for connections
1148 from the local machine. The supported methods are:
1149
1150 @table @code
1151 @item telnet
1152 Run telnet in a subprocess to connect;
1153 @item rlogin
1154 Rlogin to another machine to connect;
1155 @item socks
1156 Connect through a socks server;
1157 @item ssl
1158 Connect with SSL;
1159 @item native
1160 Connect directly.
1161 @end table
1162 @end defopt
1163
1164 @node GNU Free Documentation License
1165 @appendix GNU Free Documentation License
1166 @include doclicense.texi
1167
1168 @node Function Index
1169 @unnumbered Command and Function Index
1170 @printindex fn
1171
1172 @node Variable Index
1173 @unnumbered Variable Index
1174 @printindex vr
1175
1176 @node Concept Index
1177 @unnumbered Concept Index
1178 @printindex cp
1179
1180 @setchapternewpage odd
1181 @contents
1182 @bye
1183
1184 @ignore
1185 arch-tag: c96be356-7e2d-4196-bcda-b13246c5c3f0
1186 @end ignore