ecfd0c66929693a947acd8380ec282b80d96d1f2
[bpt/emacs.git] / doc / misc / url.texi
1 \input texinfo
2 @setfilename ../../info/url
3 @settitle URL Programmer's Manual
4
5 @documentencoding UTF-8
6
7 @iftex
8 @c @finalout
9 @end iftex
10 @c @setchapternewpage odd
11 @c @smallbook
12
13 @tex
14 \overfullrule=0pt
15 %\global\baselineskip 30pt % for printing in double space
16 @end tex
17 @dircategory Emacs lisp libraries
18 @direntry
19 * URL: (url). URL loading package.
20 @end direntry
21
22 @copying
23 This is the manual for the @code{url} Emacs Lisp library.
24
25 Copyright @copyright{} 1993--1999, 2002, 2004--2014 Free Software
26 Foundation, Inc.
27
28 @quotation
29 Permission is granted to copy, distribute and/or modify this document
30 under the terms of the GNU Free Documentation License, Version 1.3 or
31 any later version published by the Free Software Foundation; with no
32 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
33 and with the Back-Cover Texts as in (a) below. A copy of the license
34 is included in the section entitled ``GNU Free Documentation License''.
35
36 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
37 modify this GNU manual.''
38 @end quotation
39 @end copying
40
41 @c
42 @titlepage
43 @title URL Programmer's Manual
44 @subtitle First Edition, URL Version 2.0
45 @author William M. Perry @email{wmperry@@gnu.org}
46 @author David Love @email{fx@@gnu.org}
47 @page
48 @vskip 0pt plus 1filll
49 @insertcopying
50 @end titlepage
51
52 @contents
53
54 @node Top
55 @top URL
56
57 @ifnottex
58 @insertcopying
59 @end ifnottex
60
61 @menu
62 * Introduction:: About the @code{url} library.
63 * URI Parsing:: Parsing (and unparsing) URIs.
64 * Retrieving URLs:: How to use this package to retrieve a URL.
65 * Supported URL Types:: Descriptions of URL types currently supported.
66 * General Facilities:: URLs can be cached, accessed via a gateway
67 and tracked in a history list.
68 * Customization:: Variables you can alter.
69 * GNU Free Documentation License:: The license for this documentation.
70 * Function Index::
71 * Variable Index::
72 * Concept Index::
73 @end menu
74
75 @node Introduction
76 @chapter Introduction
77 @cindex URL
78 @cindex URI
79 @cindex uniform resource identifier
80 @cindex uniform resource locator
81
82 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
83 name, such as an Internet address, that identifies some name or
84 resource. The format of URIs is described in RFC 3986, which updates
85 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
86 @dfn{Uniform Resource Locator} (URL) is an older but still-common
87 term, which basically refers to a URI corresponding to a resource that
88 can be accessed (usually over a network) in a specific way.
89
90 Here are some examples of URIs (taken from RFC 3986):
91
92 @example
93 ftp://ftp.is.co.za/rfc/rfc1808.txt
94 http://www.ietf.org/rfc/rfc2396.txt
95 ldap://[2001:db8::7]/c=GB?objectClass?one
96 mailto:John.Doe@@example.com
97 news:comp.infosystems.www.servers.unix
98 tel:+1-816-555-1212
99 telnet://192.0.2.16:80/
100 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
101 @end example
102
103 This manual describes the @code{url} library, an Emacs Lisp library
104 for parsing URIs and retrieving the resources to which they refer.
105 (The library is so-named for historical reasons; nowadays, the ``URI''
106 terminology is regarded as the more general one, and ``URL'' is
107 technically obsolete despite its widespread vernacular usage.)
108
109 @node URI Parsing
110 @chapter URI Parsing
111
112 A URI consists of several @dfn{components}, each having a different
113 meaning. For example, the URI
114
115 @example
116 http://www.gnu.org/software/emacs/
117 @end example
118
119 @noindent
120 specifies the scheme component @samp{http}, the hostname component
121 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
122
123 @cindex parsed URIs
124 The format of URIs is specified by RFC 3986. The @code{url} library
125 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
126 standard-compliant URI parser, as well as function
127 @code{url-recreate-url}, which converts a parsed URI back into a URI
128 string.
129
130 @defun url-generic-parse-url uri-string
131 This function returns a parsed version of the string @var{uri-string}.
132 @end defun
133
134 @defun url-recreate-url uri-obj
135 @cindex unparsing URLs
136 Given a parsed URI, this function returns the corresponding URI string.
137 @end defun
138
139 @cindex parsed URI
140 The return value of @code{url-generic-parse-url}, and the argument
141 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
142 structure whose slots hold the various components of the URI@.
143 @xref{Top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
144 details about CL structures. Most of the other functions in the
145 @code{url} library act on parsed URIs.
146
147 @menu
148 * Parsed URIs:: Format of parsed URI structures.
149 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
150 @end menu
151
152 @node Parsed URIs
153 @section Parsed URI structures
154
155 Each parsed URI structure contains the following slots:
156
157 @table @code
158 @item type
159 The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
160 Types}, for a list of schemes that the @code{url} library knows how to
161 process. This slot can also be @code{nil}, if the URI is not fully
162 specified.
163
164 @item user
165 The user name (a string), or @code{nil}.
166
167 @item password
168 The user password (a string), or @code{nil}. The use of this URI
169 component is strongly discouraged; nowadays, passwords are transmitted
170 by other means, not as part of a URI.
171
172 @item host
173 The host name (a string), or @code{nil}. If present, this is
174 typically a domain name or IP address.
175
176 @item port
177 The port number (an integer), or @code{nil}. Omitting this component
178 usually means to use the ``standard'' port associated with the URI
179 scheme.
180
181 @item filename
182 The combination of the ``path'' and ``query'' components of the URI (a
183 string), or @code{nil}. If the query component is present, it is the
184 substring following the first @samp{?} character, and the path
185 component is the substring before the @samp{?}. The meaning of these
186 components is scheme-dependent; they do not necessarily refer to a
187 file on a disk.
188
189 @item target
190 The fragment component (a string), or @code{nil}. The fragment
191 component specifies a ``secondary resource'', such as a section of a
192 webpage.
193
194 @item fullness
195 This is @code{t} if the URI is fully specified, i.e., the
196 hierarchical components of the URI (the hostname and/or username
197 and/or password) are preceded by @samp{//}.
198 @end table
199
200 @findex url-type
201 @findex url-user
202 @findex url-password
203 @findex url-host
204 @findex url-port
205 @findex url-filename
206 @findex url-target
207 @findex url-attributes
208 @findex url-fullness
209 These slots have accessors named @code{url-@var{part}}, where
210 @var{part} is the slot name. For example, the accessor for the
211 @code{host} slot is the function @code{url-host}. The @code{url-port}
212 accessor returns the default port for the URI scheme if the parsed
213 URI's @var{port} slot is @code{nil}.
214
215 The slots can be set using @code{setf}. For example:
216
217 @example
218 (setf (url-port url) 80)
219 @end example
220
221 @node URI Encoding
222 @section URI Encoding
223
224 @cindex percent encoding
225 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
226 one respect: it allows non-@acronym{ASCII} characters in URI strings.
227
228 Strictly speaking, RFC 3986 compatible URIs may only consist of
229 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
230 represented by converting them to UTF-8 byte sequences, and performing
231 @dfn{percent encoding} on the bytes. For example, the o-umlaut
232 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
233 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
234 @acronym{ASCII} characters must also be percent encoded when they
235 appear in URI components.)
236
237 The function @code{url-encode-url} can be used to convert a URI
238 string containing arbitrary characters to one that is properly
239 percent-encoded in accordance with RFC 3986.
240
241 @defun url-encode-url url-string
242 This function return a properly URI-encoded version of
243 @var{url-string}. It also performs @dfn{URI normalization},
244 e.g., converting the scheme component to lowercase if it was
245 previously uppercase.
246 @end defun
247
248 To convert between a string containing arbitrary characters and a
249 percent-encoded all-@acronym{ASCII} string, use the functions
250 @code{url-hexify-string} and @code{url-unhex-string}:
251
252 @defun url-hexify-string string &optional allowed-chars
253 This function performs percent-encoding on @var{string}, and returns
254 the result.
255
256 If @var{string} is multibyte, it is first converted to a UTF-8 byte
257 string. Each byte corresponding to an allowed character is left
258 as-is, while all other bytes are converted to a three-character
259 sequence: @samp{%} followed by two upper-case hex digits.
260
261 @vindex url-unreserved-chars
262 @cindex unreserved characters
263 The allowed characters are specified by @var{allowed-chars}. If this
264 argument is @code{nil}, the allowed characters are those specified as
265 @dfn{unreserved characters} by RFC 3986 (see the variable
266 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
267 be a vector whose @var{n}-th element is non-@code{nil} if character
268 @var{n} is allowed.
269 @end defun
270
271 @defun url-unhex-string string &optional allow-newlines
272 This function replaces percent-encoding sequences in @var{string} with
273 their character equivalents, and returns the resulting string.
274
275 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
276 carriage returns and line feeds, which are normally forbidden in URIs.
277 @end defun
278
279 @node Retrieving URLs
280 @chapter Retrieving URLs
281
282 The @code{url} library defines the following three functions for
283 retrieving the data specified by a URL@. The actual retrieval protocol
284 depends on the URL's URI scheme, and is performed by lower-level
285 scheme-specific functions. (Those lower-level functions are not
286 documented here, and generally should not be called directly.)
287
288 In each of these functions, the @var{url} argument can be either a
289 string or a parsed URL structure. If it is a string, that string is
290 passed through @code{url-encode-url} before using it, to ensure that
291 it is properly URI-encoded (@pxref{URI Encoding}).
292
293 @defun url-retrieve-synchronously url
294 This function synchronously retrieves the data specified by @var{url},
295 and returns a buffer containing the data. The return value is
296 @code{nil} if there is no data associated with the URL (as is the case
297 for @code{dired}, @code{info}, and @code{mailto} URLs).
298 @end defun
299
300 @defun url-retrieve url callback &optional cbargs silent no-cookies
301 This function retrieves @var{url} asynchronously, calling the function
302 @var{callback} when the object has been completely retrieved. The
303 return value is the buffer into which the data will be inserted, or
304 @code{nil} if the process has already completed.
305
306 The callback function is called this way:
307
308 @example
309 (apply @var{callback} @var{status} @var{cbargs})
310 @end example
311
312 @noindent
313 where @var{status} is a plist representing what happened during the
314 retrieval, with most recent events first, or an empty list if no
315 events have occurred. Each pair in the plist is one of:
316
317 @table @code
318 @item (:redirect @var{redirected-to})
319 This means that the request was redirected to the URL
320 @var{redirected-to}.
321
322 @item (:error (@var{error-symbol} . @var{data}))
323 This means that an error occurred. If so desired, the error can be
324 signaled with @code{(signal @var{error-symbol} @var{data})}.
325 @end table
326
327 When the callback function is called, the current buffer is the one
328 containing the retrieved data (if any). The buffer also contains any
329 MIME headers associated with the data retrieval.
330
331 If the optional argument @var{silent} is non-@code{nil}, progress
332 messages are suppressed. If the optional argument @var{no-cookies} is
333 non-@code{nil}, cookies are not stored or sent.
334 @end defun
335
336 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
337 This function acts like @code{url-retrieve}, but with limits on the
338 number of concurrently-running network processes. The option
339 @code{url-queue-parallel-processes} controls the number of concurrent
340 processes, and the option @code{url-queue-timeout} sets a timeout in
341 seconds.
342
343 To use this function, you must @code{(require 'url-queue)}.
344 @end defun
345
346 @vindex url-queue-parallel-processes
347 @defopt url-queue-parallel-processes
348 The value of this option is an integer specifying the maximum number
349 of concurrent @code{url-queue-retrieve} network processes. If the
350 number of @code{url-queue-retrieve} calls is larger than this number,
351 later ones are queued until earlier ones are finished.
352 @end defopt
353
354 @vindex url-queue-timeout
355 @defopt url-queue-timeout
356 The value of this option is a number specifying the maximum lifetime
357 of a @code{url-queue-retrieve} network process, once it is started.
358 If a process is not finished by then, it is killed and removed from
359 the queue.
360 @end defopt
361
362 @node Supported URL Types
363 @chapter Supported URL Types
364
365 This chapter describes functions and variables affecting URL retrieval
366 for specific schemes.
367
368 @menu
369 * http/https:: Hypertext Transfer Protocol.
370 * file/ftp:: Local files and FTP archives.
371 * info:: Emacs "Info" pages.
372 * mailto:: Sending email.
373 * news/nntp/snews:: Usenet news.
374 * rlogin/telnet/tn3270:: Remote host connectivity.
375 * irc:: Internet Relay Chat.
376 * data:: Embedded data URLs.
377 * nfs:: Networked File System
378 * ldap:: Lightweight Directory Access Protocol
379 * man:: Unix man pages.
380 @end menu
381
382 @node http/https
383 @section @code{http} and @code{https}
384
385 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
386 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
387 Its default port is 80.
388
389 The @code{https} scheme is a secure version of @code{http}, with
390 transmission via SSL@. It is defined in RFC 2069, and its default port
391 is 443. When using @code{https}, the @code{url} library performs SSL
392 encryption via the @code{ssl} library, by forcing the @code{ssl}
393 gateway method to be used. @xref{Gateways in general}.
394
395 @defopt url-honor-refresh-requests
396 If this option is non-@code{nil} (the default), the @code{url} library
397 honors the HTTP @samp{Refresh} header, which is used by servers to
398 direct clients to reload documents from the same URL or a or different
399 one. If the value is @code{nil}, the @samp{Refresh} header is
400 ignored; any other value means to ask the user on each request.
401 @end defopt
402
403 @menu
404 * Cookies::
405 * HTTP language/coding::
406 * HTTP URL Options::
407 * Dealing with HTTP documents::
408 @end menu
409
410 @node Cookies
411 @subsection Cookies
412
413 @defopt url-cookie-file
414 The file in which cookies are stored, defaulting to @file{cookies} in
415 the directory specified by @code{url-configuration-directory}.
416 @end defopt
417
418 @defopt url-cookie-confirmation
419 Specifies whether confirmation is require to accept cookies.
420 @end defopt
421
422 @defopt url-cookie-multiple-line
423 Specifies whether to put all cookies for the server on one line in the
424 HTTP request to satisfy broken servers like
425 @url{http://www.hotmail.com}.
426 @end defopt
427
428 @defopt url-cookie-trusted-urls
429 A list of regular expressions matching URLs from which to accept
430 cookies always.
431 @end defopt
432
433 @defopt url-cookie-untrusted-urls
434 A list of regular expressions matching URLs from which to reject
435 cookies always.
436 @end defopt
437
438 @defopt url-cookie-save-interval
439 The number of seconds between automatic saves of cookies to disk.
440 Default is one hour.
441 @end defopt
442
443
444 @node HTTP language/coding
445 @subsection Language and Encoding Preferences
446
447 HTTP allows clients to express preferences for the language and
448 encoding of documents which servers may honor. For each of these
449 variables, the value is a string; it can specify a single choice, or
450 it can be a comma-separated list.
451
452 Normally, this list is ordered by descending preference. However, each
453 element can be followed by @samp{;q=@var{priority}} to specify its
454 preference level, a decimal number from 0 to 1; e.g., for
455 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
456 en;q=0.7"}}. An element that has no @samp{;q} specification has
457 preference level 1.
458
459 @defopt url-mime-charset-string
460 @cindex character sets
461 @cindex coding systems
462 This variable specifies a preference for character sets when documents
463 can be served in more than one encoding.
464
465 HTTP allows specifying a series of MIME charsets which indicate your
466 preferred character set encodings, e.g., Latin-9 or Big5, and these
467 can be weighted. The default series is generated automatically from
468 the associated MIME types of all defined coding systems, sorted by the
469 coding system priority specified in Emacs. @xref{Recognize Coding, ,
470 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
471 @end defopt
472
473 @defopt url-mime-language-string
474 @cindex language preferences
475 A string specifying the preferred language when servers can serve
476 files in several languages. Use RFC 1766 abbreviations, e.g.,
477 @samp{en} for English, @samp{de} for German.
478
479 The string can be @code{"*"} to get the first available language (as
480 opposed to the default).
481 @end defopt
482
483 @node HTTP URL Options
484 @subsection HTTP URL Options
485
486 HTTP supports an @samp{OPTIONS} method describing things supported by
487 the URL@.
488
489 @defun url-http-options url
490 Returns a property list describing options available for URL@. The
491 property list members are:
492
493 @table @code
494 @item methods
495 A list of symbols specifying what HTTP methods the resource
496 supports.
497
498 @item dav
499 @cindex DAV
500 A list of numbers specifying what DAV protocol/schema versions are
501 supported.
502
503 @item dasl
504 @cindex DASL
505 A list of supported DASL search types supported (string form).
506
507 @item ranges
508 A list of the units available for use in partial document fetches.
509
510 @item p3p
511 @cindex P3P
512 The @dfn{Platform For Privacy Protection} description for the resource.
513 Currently this is just the raw header contents.
514 @end table
515
516 @end defun
517
518 @node Dealing with HTTP documents
519 @subsection Dealing with HTTP documents
520
521 HTTP URLs are retrieved into a buffer containing the HTTP headers
522 followed by the body. Since the headers are quasi-MIME, they may be
523 processed using the MIME library. @xref{Top,, Emacs MIME,
524 emacs-mime, The Emacs MIME Manual}.
525
526 @node file/ftp
527 @section file and ftp
528 @cindex files
529 @cindex FTP
530 @cindex File Transfer Protocol
531 @cindex compressed files
532 @cindex dired
533
534 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
535 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
536 Such URLs have the form
537
538 @example
539 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
540 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
541 @end example
542
543 @noindent
544 If the URL specifies a local file, it is retrieved by reading the file
545 contents in the usual way. If it specifies a remote file, it is
546 retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
547 The GNU Emacs Manual}.
548
549 When retrieving a compressed file, it is automatically uncompressed
550 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
551 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
552 hard-coded, and cannot be altered by customizing
553 @code{jka-compr-compression-info-list}.)
554
555 @defopt url-directory-index-file
556 This option specifies the filename to look for when a @code{file} or
557 @code{ftp} URL specifies a directory. The default is
558 @file{index.html}. If this file exists and is readable, it is viewed.
559 Otherwise, Emacs visits the directory using Dired.
560 @end defopt
561
562 @node info
563 @section info
564 @cindex Info
565 @cindex Texinfo
566 @findex Info-goto-node
567
568 The @code{info} scheme is non-standard. Such URLs have the form
569
570 @example
571 info:@var{file}#@var{node}
572 @end example
573
574 @noindent
575 and are retrieved by invoking @code{Info-goto-node} with argument
576 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
577 @samp{Top} node is opened.
578
579 @node mailto
580 @section mailto
581
582 @cindex mailto
583 @cindex email
584 A @code{mailto} URL specifies an email message to be sent to a given
585 email address. For example, @samp{mailto:foo@@bar.com} specifies
586 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
587 for such URLs is to open a mail composition buffer in which the
588 appropriate content (e.g., the recipient address) has been filled in.
589
590 As defined in RFC 2368, a @code{mailto} URL has the form
591
592 @example
593 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
594 @end example
595
596 @noindent
597 where an arbitrary number of @var{header}s can be added. If the
598 @var{header} is @samp{body}, then @var{contents} is put in the message
599 body; otherwise, a @var{header} header field is created with
600 @var{contents} as its contents. Note that the @code{url} library does
601 not perform any checking of @var{header} or @var{contents}, so you
602 should check them before sending the message.
603
604 @defopt url-mail-command
605 @vindex mail-user-agent
606 The value of this variable is the function called whenever url needs
607 to send mail. This should normally be left its default, which is the
608 standard mail-composition command @code{compose-mail}. @xref{Sending
609 Mail,,, emacs, The GNU Emacs Manual}.
610 @end defopt
611
612 If the document containing the @code{mailto} URL itself possessed a
613 known URL, Emacs automatically inserts an @samp{X-Url-From} header
614 field into the mail buffer, specifying that URL.
615
616 @node news/nntp/snews
617 @section @code{news}, @code{nntp} and @code{snews}
618 @cindex news
619 @cindex network news
620 @cindex usenet
621 @cindex NNTP
622 @cindex snews
623
624 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
625 1738, are used for reading Usenet newsgroups. For compatibility with
626 non-standard-compliant news clients, the @code{url} library allows
627 host and port fields to be included in @code{news} URLs, even though
628 they are properly only allowed for @code{nntp} and @code{snews}.
629
630 @code{news} and @code{nntp} URLs have the following form:
631
632 @table @samp
633 @item news:@var{newsgroup}
634 Retrieves a list of messages in @var{newsgroup};
635 @item news:@var{message-id}
636 Retrieves the message with the given @var{message-id};
637 @item news:*
638 Retrieves a list of all available newsgroups;
639 @item nntp://@var{host}:@var{port}/@var{newsgroup}
640 @itemx nntp://@var{host}:@var{port}/@var{message-id}
641 @itemx nntp://@var{host}:@var{port}/*
642 Similar to the @samp{news} versions.
643 @end table
644
645 The default port for @code{nntp} (and @code{news}) is 119. The
646 difference between an @code{nntp} URL and a @code{news} URL is that an
647 @code{nttp} URL may specify an article by its number. The
648 @samp{snews} scheme is the same as @samp{nntp}, except that it is
649 tunneled through SSL and has default port 563.
650
651 These URLs are retrieved via the Gnus package.
652
653 @cindex environment variable
654 @vindex NNTPSERVER
655 @defopt url-news-server
656 This variable specifies the default news server from which to fetch
657 news, if no server was specified in the URL@. The default value,
658 @code{nil}, means to use the server specified by the standard
659 environment variable @samp{NNTPSERVER}, or @samp{news} if that
660 environment variable is unset.
661 @end defopt
662
663 @node rlogin/telnet/tn3270
664 @section rlogin, telnet and tn3270
665 @cindex rlogin
666 @cindex telnet
667 @cindex tn3270
668 @cindex terminal emulation
669 @findex terminal-emulator
670
671 These URL schemes are defined in RFC 1738, and are used for logging in
672 via a terminal emulator. They have the form
673
674 @example
675 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
676 @end example
677
678 @noindent
679 but the @var{password} component is ignored.
680
681 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
682 @code{telnet} or @code{tn3270} (the program names and arguments are
683 hardcoded) session is run in a @code{terminal-emulator} buffer.
684 Well-known ports are used if the URL does not specify a port.
685
686 @node irc
687 @section irc
688 @cindex IRC
689 @cindex Internet Relay Chat
690 @cindex ZEN IRC
691 @cindex ERC
692 @cindex rcirc
693
694 The @code{irc} scheme is defined in the Internet Draft at
695 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
696 was never approved as an RFC). Such URLs have the form
697
698 @example
699 irc://@var{host}:@var{port}/@var{target},@var{needpass}
700 @end example
701
702 @noindent
703 and are retrieved by opening an @acronym{IRC} session using the
704 function specified by @code{url-irc-function}.
705
706 @defopt url-irc-function
707 The value of this option is a function, which is called to open an IRC
708 connection for @code{irc} URLs. This function must take five
709 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
710 @var{password}. The @var{channel} argument specifies the channel to
711 join immediately, and may be @code{nil}.
712
713 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
714 Other options are @code{url-irc-erc} (which uses ERC) and
715 @code{url-irc-zenirc} (which uses ZenIRC).
716 @end defopt
717
718 @node data
719 @section data
720 @cindex data URLs
721
722 The @code{data} scheme, defined in RFC 2397, contains MIME data in
723 the URL itself. Such URLs have the form
724
725 @example
726 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
727 @end example
728
729 @noindent
730 @var{media-type} is a MIME @samp{Content-Type} string, possibly
731 including parameters. It defaults to
732 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
733 omitted but the charset parameter supplied. If @samp{;base64} is
734 present, the @var{data} are base64-encoded.
735
736 @node nfs
737 @section nfs
738 @cindex NFS
739 @cindex Network File System
740 @cindex automounter
741
742 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
743 except that it points to a file on a remote host that is handled by an
744 NFS automounter on the local host. Such URLs have the form
745
746 @example
747 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
748 @end example
749
750 @defvar url-nfs-automounter-directory-spec
751 @end defvar
752 A string saying how to invoke the NFS automounter. Certain @samp{%}
753 sequences are recognized:
754
755 @table @samp
756 @item %h
757 The hostname of the NFS server;
758 @item %n
759 The port number of the NFS server;
760 @item %u
761 The username to use to authenticate;
762 @item %p
763 The password to use to authenticate;
764 @item %f
765 The filename on the remote server;
766 @item %%
767 A literal @samp{%}.
768 @end table
769
770 Each can be used any number of times.
771
772 @node ldap
773 @section ldap
774 @cindex LDAP
775 @cindex Lightweight Directory Access Protocol
776
777 The LDAP scheme is defined in RFC 2255.
778
779 @node man
780 @section man
781 @cindex @command{man}
782 @cindex Unix man pages
783 @findex man
784
785 The @code{man} scheme is a non-standard one. Such URLs have the form
786
787 @example
788 @samp{man:@var{page-spec}}
789 @end example
790
791 @noindent
792 and are retrieved by passing @var{page-spec} to the Lisp function
793 @code{man}.
794
795 @node General Facilities
796 @chapter General Facilities
797
798 @menu
799 * Disk Caching::
800 * Proxies::
801 * Gateways in general::
802 * History::
803 @end menu
804
805 @node Disk Caching
806 @section Disk Caching
807 @cindex Caching
808 @cindex Persistent Cache
809 @cindex Disk Cache
810
811 The disk cache stores retrieved documents locally, whence they can be
812 retrieved more quickly. When requesting a URL that is in the cache,
813 the library checks to see if the page has changed since it was last
814 retrieved from the remote machine. If not, the local copy is used,
815 saving the transmission over the network.
816 @cindex Cleaning the cache
817 @cindex Clearing the cache
818 @cindex Cache cleaning
819 Currently the cache isn't cleared automatically.
820 @c Running the @code{clean-cache} shell script
821 @c fist is recommended, to allow for future cleaning of the cache. This
822 @c shell script will remove all files that have not been accessed since it
823 @c was last run. To keep the cache pared down, it is recommended that this
824 @c script be run from @i{at} or @i{cron} (see the manual pages for
825 @c crontab(5) or at(1) for more information)
826
827 @defopt url-automatic-caching
828 Setting this variable non-@code{nil} causes documents to be cached
829 automatically.
830 @end defopt
831
832 @defopt url-cache-directory
833 This variable specifies the
834 directory to store the cache files. It defaults to sub-directory
835 @file{cache} of @code{url-configuration-directory}.
836 @end defopt
837
838 @defopt url-cache-creation-function
839 The cache relies on a scheme for mapping URLs to files in the cache.
840 This variable names a function which sets the type of cache to use.
841 It takes a URL as argument and returns the absolute file name of the
842 corresponding cache file. The two supplied possibilities are
843 @code{url-cache-create-filename-using-md5} and
844 @code{url-cache-create-filename-human-readable}.
845 @end defopt
846
847 @defun url-cache-create-filename-using-md5 url
848 Creates a cache file name from @var{url} using MD5 hashing.
849 This is creates entries with very few cache collisions and is fast.
850 @cindex MD5
851 @smallexample
852 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
853 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
854 @end smallexample
855 @end defun
856
857 @defun url-cache-create-filename-human-readable url
858 Creates a cache file name from @var{url} more obviously connected to
859 @var{url} than for @code{url-cache-create-filename-using-md5}, but
860 more likely to conflict with other files.
861 @smallexample
862 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
863 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
864 @end smallexample
865 @end defun
866
867 @defun url-cache-expired
868 This function returns non-nil if a cache entry has expired (or is absent).
869 The arguments are a URL and optional expiration delay in seconds
870 (default @var{url-cache-expire-time}).
871 @end defun
872
873 @defopt url-cache-expire-time
874 This variable is the default number of seconds to use for the
875 expire-time argument of the function @code{url-cache-expired}.
876 @end defopt
877
878 @defun url-fetch-from-cache
879 This function takes a URL as its argument and returns a buffer
880 containing the data cached for that URL.
881 @end defun
882
883 @c Fixme: never actually used currently?
884 @c @defopt url-standalone-mode
885 @c @cindex Relying on cache
886 @c @cindex Cache only mode
887 @c @cindex Standalone mode
888 @c If this variable is non-@code{nil}, the library relies solely on the
889 @c cache for fetching documents and avoids checking if they have changed
890 @c on remote servers.
891 @c @end defopt
892
893 @c With a large cache of documents on the local disk, it can be very handy
894 @c when traveling, or any other time the network connection is not active
895 @c (a laptop with a dial-on-demand PPP connection, etc.). Emacs/W3 can rely
896 @c solely on its cache, and avoid checking to see if the page has changed
897 @c on the remote server. In the case of a dial-on-demand PPP connection,
898 @c this will keep the phone line free as long as possible, only bringing up
899 @c the PPP connection when asking for a page that is not located in the
900 @c cache. This is very useful for demonstrations as well.
901
902 @node Proxies
903 @section Proxies and Gatewaying
904
905 @c fixme: check/document url-ns stuff
906 @cindex proxy servers
907 @cindex proxies
908 @cindex environment variables
909 @vindex HTTP_PROXY
910 Proxy servers are commonly used to provide gateways through firewalls
911 or as caches serving some more-or-less local network. Each protocol
912 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
913 conventionally configured commonly amongst different programs through
914 environment variables of the form @code{@var{protocol}_proxy}, where
915 @var{protocol} is one of the supported network protocols (@code{http},
916 @code{ftp} etc.). The library recognizes such variables in either
917 upper or lower case. Their values are of one of the forms:
918 @itemize @bullet
919 @item @code{@var{host}:@var{port}}
920 @item A full URL;
921 @item Simply a host name.
922 @end itemize
923
924 @vindex NO_PROXY
925 The @code{NO_PROXY} environment variable specifies URLs that should be
926 excluded from proxying (on servers that should be contacted directly).
927 This should be a comma-separated list of hostnames, domain names, or a
928 mixture of both. Asterisks can be used as wildcards, but other
929 clients may not support that. Domain names may be indicated by a
930 leading dot. For example:
931 @example
932 NO_PROXY="*.aventail.com,home.com,.seanet.com"
933 @end example
934 @noindent says to contact all machines in the @samp{aventail.com} and
935 @samp{seanet.com} domains directly, as well as the machine named
936 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
937 and @code{no_proxy} are also tried, in that order.
938
939 Proxies may also be specified directly in Lisp.
940
941 @defopt url-proxy-services
942 This variable is an alist of URL schemes and proxy servers that
943 gateway them. The items are of the form @w{@code{(@var{scheme}
944 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
945 gatewayed through @var{portnumber} on the specified @var{host}. An
946 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
947 a regexp matching host names not to be proxied. This variable is
948 initialized from the environment as above.
949
950 @example
951 (setq url-proxy-services
952 '(("http" . "proxy.aventail.com:80")
953 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
954 @end example
955 @end defopt
956
957 @node Gateways in general
958 @section Gateways in General
959 @cindex gateways
960 @cindex firewalls
961
962 The library provides a general gateway layer through which all
963 networking passes. It can both control access to the network and
964 provide access through gateways in firewalls. This may make direct
965 connections in some cases and pass through some sort of gateway in
966 others.@footnote{Proxies (which only operate over HTTP) are
967 implemented using this.} The library's basic function responsible for
968 making connections is @code{url-open-stream}.
969
970 @defun url-open-stream name buffer host service
971 @cindex opening a stream
972 @cindex stream, opening
973 Open a stream to @var{host}, possibly via a gateway. The other
974 arguments are as for @code{open-network-stream}. This will not make a
975 connection if @code{url-gateway-unplugged} is non-@code{nil}.
976 @end defun
977
978 @defvar url-gateway-local-host-regexp
979 This is a regular expression that matches local hosts that do not
980 require the use of a gateway. If @code{nil}, all connections are made
981 through the gateway.
982 @end defvar
983
984 @defvar url-gateway-method
985 This variable controls which gateway method is used. It may be useful
986 to bind it temporarily in some applications. It has values taken from
987 a list of symbols. Possible values are:
988
989 @table @code
990 @item telnet
991 @cindex @command{telnet}
992 Use this method if you must first telnet and log into a gateway host,
993 and then run telnet from that host to connect to outside machines.
994
995 @item rlogin
996 @cindex @command{rlogin}
997 This method is identical to @code{telnet}, but uses @command{rlogin}
998 to log into the remote machine without having to send the username and
999 password over the wire every time.
1000
1001 @item socks
1002 @cindex @sc{socks}
1003 Use if the firewall has a @sc{socks} gateway running on it. The
1004 @sc{socks} v5 protocol is defined in RFC 1928.
1005
1006 @c @item ssl
1007 @c This probably shouldn't be documented
1008 @c Fixme: why not? -- fx
1009
1010 @item native
1011 This method uses Emacs's builtin networking directly. This is the
1012 default. It can be used only if there is no firewall blocking access.
1013 @end table
1014 @end defvar
1015
1016 The following variables control the gateway methods.
1017
1018 @defopt url-gateway-telnet-host
1019 The gateway host to telnet to. Once logged in there, you then telnet
1020 out to the hosts you want to connect to.
1021 @end defopt
1022 @defopt url-gateway-telnet-parameters
1023 This should be a list of parameters to pass to the @command{telnet} program.
1024 @end defopt
1025 @defopt url-gateway-telnet-password-prompt
1026 This is a regular expression that matches the password prompt when
1027 logging in.
1028 @end defopt
1029 @defopt url-gateway-telnet-login-prompt
1030 This is a regular expression that matches the username prompt when
1031 logging in.
1032 @end defopt
1033 @defopt url-gateway-telnet-user-name
1034 The username to log in with.
1035 @end defopt
1036 @defopt url-gateway-telnet-password
1037 The password to send when logging in.
1038 @end defopt
1039 @defopt url-gateway-prompt-pattern
1040 This is a regular expression that matches the shell prompt.
1041 @end defopt
1042
1043 @defopt url-gateway-rlogin-host
1044 Host to @samp{rlogin} to before telnetting out.
1045 @end defopt
1046 @defopt url-gateway-rlogin-parameters
1047 Parameters to pass to @samp{rsh}.
1048 @end defopt
1049 @defopt url-gateway-rlogin-user-name
1050 User name to use when logging in to the gateway.
1051 @end defopt
1052 @defopt url-gateway-prompt-pattern
1053 This is a regular expression that matches the shell prompt.
1054 @end defopt
1055
1056 @defopt socks-server
1057 This specifies the default server, it takes the form
1058 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1059 where @var{version} can be either 4 or 5.
1060 @end defopt
1061 @defvar socks-password
1062 If this is @code{nil} then you will be asked for the password,
1063 otherwise it will be used as the password for authenticating you to
1064 the @sc{socks} server.
1065 @end defvar
1066 @defvar socks-username
1067 This is the username to use when authenticating yourself to the
1068 @sc{socks} server. By default this is your login name.
1069 @end defvar
1070 @defvar socks-timeout
1071 This controls how long, in seconds, to wait for responses from the
1072 @sc{socks} server; it is 5 by default.
1073 @end defvar
1074 @c fixme: these have been effectively commented-out in the code
1075 @c @defopt socks-server-aliases
1076 @c This a list of server aliases. It is a list of aliases of the form
1077 @c @var{(alias hostname port version)}.
1078 @c @end defopt
1079 @c @defopt socks-network-aliases
1080 @c This a list of network aliases. Each entry in the list takes the form
1081 @c @var{(alias (network))} where @var{alias} is a string that names the
1082 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1083 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1084 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1085 @c address.
1086 @c @end defopt
1087 @c @defopt socks-redirection-rules
1088 @c This a list of redirection rules. Each rule take the form
1089 @c @var{(Destination network Connection type)} where @var{Destination
1090 @c network} is a network alias from @code{socks-network-aliases} and
1091 @c @var{Connection type} can be @code{nil} in which case a direct
1092 @c connection is used, or it can be an alias from
1093 @c @code{socks-server-aliases} in which case that server is used as a
1094 @c proxy.
1095 @c @end defopt
1096 @defopt socks-nslookup-program
1097 @cindex @command{nslookup}
1098 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1099 @end defopt
1100
1101 @menu
1102 * Suppressing network connections::
1103 @end menu
1104 @c * Broken hostname resolution::
1105
1106 @node Suppressing network connections
1107 @subsection Suppressing Network Connections
1108
1109 @cindex network connections, suppressing
1110 @cindex suppressing network connections
1111 @cindex bugs, HTML
1112 @cindex HTML `bugs'
1113 In some circumstances it is desirable to suppress making network
1114 connections. A typical case is when rendering HTML in a mail user
1115 agent, when external URLs should not be activated, particularly to
1116 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1117 like. To arrange this, bind the following variable for the duration
1118 of such processing.
1119
1120 @defvar url-gateway-unplugged
1121 If this variable is non-@code{nil} new network connections are never
1122 opened by the URL library.
1123 @end defvar
1124
1125 @c @node Broken hostname resolution
1126 @c @subsection Broken Hostname Resolution
1127
1128 @c @cindex hostname resolver
1129 @c @cindex resolver, hostname
1130 @c Some C libraries do not include the hostname resolver routines in
1131 @c their static libraries. If Emacs was linked statically, and was not
1132 @c linked with the resolver libraries, it will not be able to get to any
1133 @c machines off the local network. This is characterized by being able
1134 @c to reach someplace with a raw ip number, but not its hostname
1135 @c (@url{http://129.79.254.191/} works, but
1136 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1137 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1138 @c rebuilt linked against the resolver library, it can use the external
1139 @c @command{nslookup} program instead.
1140
1141 @c @defopt url-gateway-broken-resolution
1142 @c @cindex @code{nslookup} program
1143 @c @cindex program, @code{nslookup}
1144 @c If non-@code{nil}, this variable says to use the program specified by
1145 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1146 @c @end defopt
1147
1148 @c @defopt url-gateway-nslookup-program
1149 @c The name of the program to do hostname lookup if Emacs can't do it
1150 @c directly. This program should expect a single argument on the command
1151 @c line---the hostname to resolve---and should produce output similar to
1152 @c the standard Unix @command{nslookup} program:
1153 @c @example
1154 @c Name: www.cs.indiana.edu
1155 @c Address: 129.79.254.191
1156 @c @end example
1157 @c @end defopt
1158
1159 @node History
1160 @section History
1161
1162 @findex url-do-setup
1163 The library can maintain a global history list tracking URLs accessed.
1164 URL completion can be done from it. The history mechanism is set up
1165 automatically via @code{url-do-setup} when it is configured to be on.
1166 Note that the size of the history list is currently not limited.
1167
1168 @vindex url-history-hash-table
1169 The history ``list'' is actually a hash table,
1170 @code{url-history-hash-table}. It contains access times keyed by URL
1171 strings. The times are in the format returned by @code{current-time}.
1172
1173 @defun url-history-update-url url time
1174 This function updates the history table with an entry for @var{url}
1175 accessed at the given @var{time}.
1176 @end defun
1177
1178 @defopt url-history-track
1179 If non-@code{nil}, the library will keep track of all the URLs
1180 accessed. If it is @code{t}, the list is saved to disk at the end of
1181 each Emacs session. The default is @code{nil}.
1182 @end defopt
1183
1184 @defopt url-history-file
1185 The file storing the history list between sessions. It defaults to
1186 @file{history} in @code{url-configuration-directory}.
1187 @end defopt
1188
1189 @defopt url-history-save-interval
1190 @findex url-history-setup-save-timer
1191 The number of seconds between automatic saves of the history list.
1192 Default is one hour. Note that if you change this variable directly,
1193 rather than using Custom, after @code{url-do-setup} has been run, you
1194 need to run the function @code{url-history-setup-save-timer}.
1195 @end defopt
1196
1197 @defun url-history-parse-history &optional fname
1198 Parses the history file @var{fname} (default @code{url-history-file})
1199 and sets up the history list.
1200 @end defun
1201
1202 @defun url-history-save-history &optional fname
1203 Saves the current history to file @var{fname} (default
1204 @code{url-history-file}).
1205 @end defun
1206
1207 @defun url-completion-function string predicate function
1208 You can use this function to do completion of URLs from the history.
1209 @end defun
1210
1211 @node Customization
1212 @chapter Customization
1213
1214 @cindex environment variables
1215 The following environment variables affect the @code{url} library's
1216 operation at startup.
1217
1218 @table @code
1219 @item TMPDIR
1220 @vindex TMPDIR
1221 @vindex url-temporary-directory
1222 If this is defined, @var{url-temporary-directory} is initialized from
1223 it.
1224 @end table
1225
1226 The following user options affect the general operation of
1227 @code{url} library.
1228
1229 @defopt url-configuration-directory
1230 @cindex configuration files
1231 The value of this variable specifies the name of the directory where
1232 the @code{url} library stores its various configuration files, cache
1233 files, etc.
1234
1235 The default value specifies a subdirectory named @file{url/} in the
1236 standard Emacs user data directory specified by the variable
1237 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1238 the old default was @file{~/.url}, and this directory is used instead
1239 if it exists.
1240 @end defopt
1241
1242 @defopt url-debug
1243 @cindex debugging
1244 Specifies the types of debug messages which are logged to
1245 the @code{*URL-DEBUG*} buffer.
1246 @code{t} means log all messages.
1247 A number means log all messages and show them with @code{message}.
1248 It may also be a list of the types of messages to be logged.
1249 @end defopt
1250 @defopt url-personal-mail-address
1251 @end defopt
1252 @defopt url-privacy-level
1253 @end defopt
1254 @defopt url-uncompressor-alist
1255 @end defopt
1256 @defopt url-passwd-entry-func
1257 @end defopt
1258 @defopt url-standalone-mode
1259 @end defopt
1260 @defopt url-bad-port-list
1261 @end defopt
1262 @defopt url-max-password-attempts
1263 @end defopt
1264 @defopt url-temporary-directory
1265 @end defopt
1266 @defopt url-show-status
1267 @end defopt
1268 @defopt url-confirmation-func
1269 The function to use for asking yes or no functions. This is normally
1270 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1271 function taking a single argument (the prompt) and returning @code{t}
1272 only if an affirmative answer is given.
1273 @end defopt
1274 @defopt url-gateway-method
1275 @c fixme: describe gatewaying
1276 A symbol specifying the type of gateway support to use for connections
1277 from the local machine. The supported methods are:
1278
1279 @table @code
1280 @item telnet
1281 Run telnet in a subprocess to connect;
1282 @item rlogin
1283 Rlogin to another machine to connect;
1284 @item socks
1285 Connect through a socks server;
1286 @item ssl
1287 Connect with SSL;
1288 @item native
1289 Connect directly.
1290 @end table
1291 @end defopt
1292
1293 @node GNU Free Documentation License
1294 @appendix GNU Free Documentation License
1295 @include doclicense.texi
1296
1297 @node Function Index
1298 @unnumbered Command and Function Index
1299 @printindex fn
1300
1301 @node Variable Index
1302 @unnumbered Variable Index
1303 @printindex vr
1304
1305 @node Concept Index
1306 @unnumbered Concept Index
1307 @printindex cp
1308
1309 @bye