Don't say "buying copies from the FSF" for manuals they do not publish
[bpt/emacs.git] / doc / misc / url.texi
1 \input texinfo
2 @setfilename ../../info/url
3 @settitle URL Programmer's Manual
4
5 @iftex
6 @c @finalout
7 @end iftex
8 @c @setchapternewpage odd
9 @c @smallbook
10
11 @tex
12 \overfullrule=0pt
13 %\global\baselineskip 30pt % for printing in double space
14 @end tex
15 @dircategory Emacs lisp libraries
16 @direntry
17 * URL: (url). URL loading package.
18 @end direntry
19
20 @copying
21 This is the manual for the @code{url} Emacs Lisp library.
22
23 Copyright @copyright{} 1993-1999, 2002, 2004-2012 Free Software Foundation, Inc.
24
25 @quotation
26 Permission is granted to copy, distribute and/or modify this document
27 under the terms of the GNU Free Documentation License, Version 1.3 or
28 any later version published by the Free Software Foundation; with no
29 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
30 and with the Back-Cover Texts as in (a) below. A copy of the license
31 is included in the section entitled ``GNU Free Documentation License''.
32
33 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
34 modify this GNU manual.''
35 @end quotation
36 @end copying
37
38 @c
39 @titlepage
40 @title URL Programmer's Manual
41 @subtitle First Edition, URL Version 2.0
42 @author William M. Perry @email{wmperry@@gnu.org}
43 @author David Love @email{fx@@gnu.org}
44 @page
45 @vskip 0pt plus 1filll
46 @insertcopying
47 @end titlepage
48
49 @contents
50
51 @node Top
52 @top URL
53
54 @ifnottex
55 @insertcopying
56 @end ifnottex
57
58 @menu
59 * Introduction:: About the @code{url} library.
60 * URI Parsing:: Parsing (and unparsing) URIs.
61 * Retrieving URLs:: How to use this package to retrieve a URL.
62 * Supported URL Types:: Descriptions of URL types currently supported.
63 * General Facilities:: URLs can be cached, accessed via a gateway
64 and tracked in a history list.
65 * Customization:: Variables you can alter.
66 * GNU Free Documentation License:: The license for this documentation.
67 * Function Index::
68 * Variable Index::
69 * Concept Index::
70 @end menu
71
72 @node Introduction
73 @chapter Introduction
74 @cindex URL
75 @cindex URI
76 @cindex uniform resource identifier
77 @cindex uniform resource locator
78
79 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
80 name, such as an Internet address, that identifies some name or
81 resource. The format of URIs is described in RFC 3986, which updates
82 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
83 @dfn{Uniform Resource Locator} (URL) is an older but still-common
84 term, which basically refers to a URI corresponding to a resource that
85 can be accessed (usually over a network) in a specific way.
86
87 Here are some examples of URIs (taken from RFC 3986):
88
89 @example
90 ftp://ftp.is.co.za/rfc/rfc1808.txt
91 http://www.ietf.org/rfc/rfc2396.txt
92 ldap://[2001:db8::7]/c=GB?objectClass?one
93 mailto:John.Doe@@example.com
94 news:comp.infosystems.www.servers.unix
95 tel:+1-816-555-1212
96 telnet://192.0.2.16:80/
97 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
98 @end example
99
100 This manual describes the @code{url} library, an Emacs Lisp library
101 for parsing URIs and retrieving the resources to which they refer.
102 (The library is so-named for historical reasons; nowadays, the ``URI''
103 terminology is regarded as the more general one, and ``URL'' is
104 technically obsolete despite its widespread vernacular usage.)
105
106 @node URI Parsing
107 @chapter URI Parsing
108
109 A URI consists of several @dfn{components}, each having a different
110 meaning. For example, the URI
111
112 @example
113 http://www.gnu.org/software/emacs/
114 @end example
115
116 @noindent
117 specifies the scheme component @samp{http}, the hostname component
118 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
119
120 @cindex parsed URIs
121 The format of URIs is specified by RFC 3986. The @code{url} library
122 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
123 standard-compliant URI parser, as well as function
124 @code{url-recreate-url}, which converts a parsed URI back into a URI
125 string.
126
127 @defun url-generic-parse-url uri-string
128 This function returns a parsed version of the string @var{uri-string}.
129 @end defun
130
131 @defun url-recreate-url uri-obj
132 @cindex unparsing URLs
133 Given a parsed URI, this function returns the corresponding URI string.
134 @end defun
135
136 @cindex parsed URI
137 The return value of @code{url-generic-parse-url}, and the argument
138 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
139 structure whose slots hold the various components of the URI@.
140 @xref{top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
141 details about CL structures. Most of the other functions in the
142 @code{url} library act on parsed URIs.
143
144 @menu
145 * Parsed URIs:: Format of parsed URI structures.
146 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
147 @end menu
148
149 @node Parsed URIs
150 @section Parsed URI structures
151
152 Each parsed URI structure contains the following slots:
153
154 @table @code
155 @item type
156 The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
157 Types}, for a list of schemes that the @code{url} library knows how to
158 process. This slot can also be @code{nil}, if the URI is not fully
159 specified.
160
161 @item user
162 The user name (a string), or @code{nil}.
163
164 @item password
165 The user password (a string), or @code{nil}. The use of this URI
166 component is strongly discouraged; nowadays, passwords are transmitted
167 by other means, not as part of a URI.
168
169 @item host
170 The host name (a string), or @code{nil}. If present, this is
171 typically a domain name or IP address.
172
173 @item port
174 The port number (an integer), or @code{nil}. Omitting this component
175 usually means to use the ``standard'' port associated with the URI
176 scheme.
177
178 @item filename
179 The combination of the ``path'' and ``query'' components of the URI (a
180 string), or @code{nil}. If the query component is present, it is the
181 substring following the first @samp{?} character, and the path
182 component is the substring before the @samp{?}. The meaning of these
183 components is scheme-dependent; they do not necessarily refer to a
184 file on a disk.
185
186 @item target
187 The fragment component (a string), or @code{nil}. The fragment
188 component specifies a ``secondary resource'', such as a section of a
189 webpage.
190
191 @item fullness
192 This is @code{t} if the URI is fully specified, i.e., the
193 hierarchical components of the URI (the hostname and/or username
194 and/or password) are preceded by @samp{//}.
195 @end table
196
197 @findex url-type
198 @findex url-user
199 @findex url-password
200 @findex url-host
201 @findex url-port
202 @findex url-filename
203 @findex url-target
204 @findex url-attributes
205 @findex url-fullness
206 These slots have accessors named @code{url-@var{part}}, where
207 @var{part} is the slot name. For example, the accessor for the
208 @code{host} slot is the function @code{url-host}. The @code{url-port}
209 accessor returns the default port for the URI scheme if the parsed
210 URI's @var{port} slot is @code{nil}.
211
212 The slots can be set using @code{setf}. For example:
213
214 @example
215 (setf (url-port url) 80)
216 @end example
217
218 @node URI Encoding
219 @section URI Encoding
220
221 @cindex percent encoding
222 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
223 one respect: it allows non-@acronym{ASCII} characters in URI strings.
224
225 Strictly speaking, RFC 3986 compatible URIs may only consist of
226 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
227 represented by converting them to UTF-8 byte sequences, and performing
228 @dfn{percent encoding} on the bytes. For example, the o-umlaut
229 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
230 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
231 @acronym{ASCII} characters must also be percent encoded when they
232 appear in URI components.)
233
234 The function @code{url-encode-url} can be used to convert a URI
235 string containing arbitrary characters to one that is properly
236 percent-encoded in accordance with RFC 3986.
237
238 @defun url-encode-url url-string
239 This function return a properly URI-encoded version of
240 @var{url-string}. It also performs @dfn{URI normalization},
241 e.g., converting the scheme component to lowercase if it was
242 previously uppercase.
243 @end defun
244
245 To convert between a string containing arbitrary characters and a
246 percent-encoded all-@acronym{ASCII} string, use the functions
247 @code{url-hexify-string} and @code{url-unhex-string}:
248
249 @defun url-hexify-string string &optional allowed-chars
250 This function performs percent-encoding on @var{string}, and returns
251 the result.
252
253 If @var{string} is multibyte, it is first converted to a UTF-8 byte
254 string. Each byte corresponding to an allowed character is left
255 as-is, while all other bytes are converted to a three-character
256 sequence: @samp{%} followed by two upper-case hex digits.
257
258 @vindex url-unreserved-chars
259 @cindex unreserved characters
260 The allowed characters are specified by @var{allowed-chars}. If this
261 argument is @code{nil}, the allowed characters are those specified as
262 @dfn{unreserved characters} by RFC 3986 (see the variable
263 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
264 be a vector whose @var{n}-th element is non-@code{nil} if character
265 @var{n} is allowed.
266 @end defun
267
268 @defun url-unhex-string string &optional allow-newlines
269 This function replaces percent-encoding sequences in @var{string} with
270 their character equivalents, and returns the resulting string.
271
272 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
273 carriage returns and line feeds, which are normally forbidden in URIs.
274 @end defun
275
276 @node Retrieving URLs
277 @chapter Retrieving URLs
278
279 The @code{url} library defines the following three functions for
280 retrieving the data specified by a URL@. The actual retrieval protocol
281 depends on the URL's URI scheme, and is performed by lower-level
282 scheme-specific functions. (Those lower-level functions are not
283 documented here, and generally should not be called directly.)
284
285 In each of these functions, the @var{url} argument can be either a
286 string or a parsed URL structure. If it is a string, that string is
287 passed through @code{url-encode-url} before using it, to ensure that
288 it is properly URI-encoded (@pxref{URI Encoding}).
289
290 @defun url-retrieve-synchronously url
291 This function synchronously retrieves the data specified by @var{url},
292 and returns a buffer containing the data. The return value is
293 @code{nil} if there is no data associated with the URL (as is the case
294 for @code{dired}, @code{info}, and @code{mailto} URLs).
295 @end defun
296
297 @defun url-retrieve url callback &optional cbargs silent no-cookies
298 This function retrieves @var{url} asynchronously, calling the function
299 @var{callback} when the object has been completely retrieved. The
300 return value is the buffer into which the data will be inserted, or
301 @code{nil} if the process has already completed.
302
303 The callback function is called this way:
304
305 @example
306 (apply @var{callback} @var{status} @var{cbargs})
307 @end example
308
309 @noindent
310 where @var{status} is a plist representing what happened during the
311 retrieval, with most recent events first, or an empty list if no
312 events have occurred. Each pair in the plist is one of:
313
314 @table @code
315 @item (:redirect @var{redirected-to})
316 This means that the request was redirected to the URL
317 @var{redirected-to}.
318
319 @item (:error (@var{error-symbol} . @var{data}))
320 This means that an error occurred. If so desired, the error can be
321 signaled with @code{(signal @var{error-symbol} @var{data})}.
322 @end table
323
324 When the callback function is called, the current buffer is the one
325 containing the retrieved data (if any). The buffer also contains any
326 MIME headers associated with the data retrieval.
327
328 If the optional argument @var{silent} is non-@code{nil}, progress
329 messages are suppressed. If the optional argument @var{no-cookies} is
330 non-@code{nil}, cookies are not stored or sent.
331 @end defun
332
333 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
334 This function acts like @code{url-retrieve}, but with limits on the
335 number of concurrently-running network processes. The option
336 @code{url-queue-parallel-processes} controls the number of concurrent
337 processes, and the option @code{url-queue-timeout} sets a timeout in
338 seconds.
339
340 To use this function, you must @code{(require 'url-queue)}.
341 @end defun
342
343 @vindex url-queue-parallel-processes
344 @defopt url-queue-parallel-processes
345 The value of this option is an integer specifying the maximum number
346 of concurrent @code{url-queue-retrieve} network processes. If the
347 number of @code{url-queue-retrieve} calls is larger than this number,
348 later ones are queued until ealier ones are finished.
349 @end defopt
350
351 @vindex url-queue-timeout
352 @defopt url-queue-timeout
353 The value of this option is a number specifying the maximum lifetime
354 of a @code{url-queue-retrieve} network process, once it is started.
355 If a process is not finished by then, it is killed and removed from
356 the queue.
357 @end defopt
358
359 @node Supported URL Types
360 @chapter Supported URL Types
361
362 This chapter describes functions and variables affecting URL retrieval
363 for specific schemes.
364
365 @menu
366 * http/https:: Hypertext Transfer Protocol.
367 * file/ftp:: Local files and FTP archives.
368 * info:: Emacs "Info" pages.
369 * mailto:: Sending email.
370 * news/nntp/snews:: Usenet news.
371 * rlogin/telnet/tn3270:: Remote host connectivity.
372 * irc:: Internet Relay Chat.
373 * data:: Embedded data URLs.
374 * nfs:: Networked File System
375 * ldap:: Lightweight Directory Access Protocol
376 * man:: Unix man pages.
377 @end menu
378
379 @node http/https
380 @section @code{http} and @code{https}
381
382 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
383 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
384 Its default port is 80.
385
386 The @code{https} scheme is a secure version of @code{http}, with
387 transmission via SSL@. It is defined in RFC 2069, and its default port
388 is 443. When using @code{https}, the @code{url} library performs SSL
389 encryption via the @code{ssl} library, by forcing the @code{ssl}
390 gateway method to be used. @xref{Gateways in general}.
391
392 @defopt url-honor-refresh-requests
393 If this option is non-@code{nil} (the default), the @code{url} library
394 honors the HTTP @samp{Refresh} header, which is used by servers to
395 direct clients to reload documents from the same URL or a or different
396 one. If the value is @code{nil}, the @samp{Refresh} header is
397 ignored; any other value means to ask the user on each request.
398 @end defopt
399
400 @menu
401 * Cookies::
402 * HTTP language/coding::
403 * HTTP URL Options::
404 * Dealing with HTTP documents::
405 @end menu
406
407 @node Cookies
408 @subsection Cookies
409
410 @defopt url-cookie-file
411 The file in which cookies are stored, defaulting to @file{cookies} in
412 the directory specified by @code{url-configuration-directory}.
413 @end defopt
414
415 @defopt url-cookie-confirmation
416 Specifies whether confirmation is require to accept cookies.
417 @end defopt
418
419 @defopt url-cookie-multiple-line
420 Specifies whether to put all cookies for the server on one line in the
421 HTTP request to satisfy broken servers like
422 @url{http://www.hotmail.com}.
423 @end defopt
424
425 @defopt url-cookie-trusted-urls
426 A list of regular expressions matching URLs from which to accept
427 cookies always.
428 @end defopt
429
430 @defopt url-cookie-untrusted-urls
431 A list of regular expressions matching URLs from which to reject
432 cookies always.
433 @end defopt
434
435 @defopt url-cookie-save-interval
436 The number of seconds between automatic saves of cookies to disk.
437 Default is one hour.
438 @end defopt
439
440
441 @node HTTP language/coding
442 @subsection Language and Encoding Preferences
443
444 HTTP allows clients to express preferences for the language and
445 encoding of documents which servers may honor. For each of these
446 variables, the value is a string; it can specify a single choice, or
447 it can be a comma-separated list.
448
449 Normally, this list is ordered by descending preference. However, each
450 element can be followed by @samp{;q=@var{priority}} to specify its
451 preference level, a decimal number from 0 to 1; e.g., for
452 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
453 en;q=0.7"}}. An element that has no @samp{;q} specification has
454 preference level 1.
455
456 @defopt url-mime-charset-string
457 @cindex character sets
458 @cindex coding systems
459 This variable specifies a preference for character sets when documents
460 can be served in more than one encoding.
461
462 HTTP allows specifying a series of MIME charsets which indicate your
463 preferred character set encodings, e.g., Latin-9 or Big5, and these
464 can be weighted. The default series is generated automatically from
465 the associated MIME types of all defined coding systems, sorted by the
466 coding system priority specified in Emacs. @xref{Recognize Coding, ,
467 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
468 @end defopt
469
470 @defopt url-mime-language-string
471 @cindex language preferences
472 A string specifying the preferred language when servers can serve
473 files in several languages. Use RFC 1766 abbreviations, e.g.,
474 @samp{en} for English, @samp{de} for German.
475
476 The string can be @code{"*"} to get the first available language (as
477 opposed to the default).
478 @end defopt
479
480 @node HTTP URL Options
481 @subsection HTTP URL Options
482
483 HTTP supports an @samp{OPTIONS} method describing things supported by
484 the URL@.
485
486 @defun url-http-options url
487 Returns a property list describing options available for URL@. The
488 property list members are:
489
490 @table @code
491 @item methods
492 A list of symbols specifying what HTTP methods the resource
493 supports.
494
495 @item dav
496 @cindex DAV
497 A list of numbers specifying what DAV protocol/schema versions are
498 supported.
499
500 @item dasl
501 @cindex DASL
502 A list of supported DASL search types supported (string form).
503
504 @item ranges
505 A list of the units available for use in partial document fetches.
506
507 @item p3p
508 @cindex P3P
509 The @dfn{Platform For Privacy Protection} description for the resource.
510 Currently this is just the raw header contents.
511 @end table
512
513 @end defun
514
515 @node Dealing with HTTP documents
516 @subsection Dealing with HTTP documents
517
518 HTTP URLs are retrieved into a buffer containing the HTTP headers
519 followed by the body. Since the headers are quasi-MIME, they may be
520 processed using the MIME library. @xref{Top,, Emacs MIME,
521 emacs-mime, The Emacs MIME Manual}.
522
523 @node file/ftp
524 @section file and ftp
525 @cindex files
526 @cindex FTP
527 @cindex File Transfer Protocol
528 @cindex compressed files
529 @cindex dired
530
531 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
532 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
533 Such URLs have the form
534
535 @example
536 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
537 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
538 @end example
539
540 @noindent
541 If the URL specifies a local file, it is retrieved by reading the file
542 contents in the usual way. If it specifies a remote file, it is
543 retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
544 The GNU Emacs Manual}.
545
546 When retrieving a compressed file, it is automatically uncompressed
547 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
548 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
549 hard-coded, and cannot be altered by customizing
550 @code{jka-compr-compression-info-list}.)
551
552 @defopt url-directory-index-file
553 This option specifies the filename to look for when a @code{file} or
554 @code{ftp} URL specifies a directory. The default is
555 @file{index.html}. If this file exists and is readable, it is viewed.
556 Otherwise, Emacs visits the directory using Dired.
557 @end defopt
558
559 @node info
560 @section info
561 @cindex Info
562 @cindex Texinfo
563 @findex Info-goto-node
564
565 The @code{info} scheme is non-standard. Such URLs have the form
566
567 @example
568 info:@var{file}#@var{node}
569 @end example
570
571 @noindent
572 and are retrieved by invoking @code{Info-goto-node} with argument
573 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
574 @samp{Top} node is opened.
575
576 @node mailto
577 @section mailto
578
579 @cindex mailto
580 @cindex email
581 A @code{mailto} URL specifies an email message to be sent to a given
582 email address. For example, @samp{mailto:foo@@bar.com} specifies
583 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
584 for such URLs is to open a mail composition buffer in which the
585 appropriate content (e.g., the recipient address) has been filled in.
586
587 As defined in RFC 2368, a @code{mailto} URL has the form
588
589 @example
590 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
591 @end example
592
593 @noindent
594 where an arbitrary number of @var{header}s can be added. If the
595 @var{header} is @samp{body}, then @var{contents} is put in the message
596 body; otherwise, a @var{header} header field is created with
597 @var{contents} as its contents. Note that the @code{url} library does
598 not perform any checking of @var{header} or @var{contents}, so you
599 should check them before sending the message.
600
601 @defopt url-mail-command
602 @vindex mail-user-agent
603 The value of this variable is the function called whenever url needs
604 to send mail. This should normally be left its default, which is the
605 standard mail-composition command @code{compose-mail}. @xref{Sending
606 Mail,,, emacs, The GNU Emacs Manual}.
607 @end defopt
608
609 If the document containing the @code{mailto} URL itself possessed a
610 known URL, Emacs automatically inserts an @samp{X-Url-From} header
611 field into the mail buffer, specifying that URL.
612
613 @node news/nntp/snews
614 @section @code{news}, @code{nntp} and @code{snews}
615 @cindex news
616 @cindex network news
617 @cindex usenet
618 @cindex NNTP
619 @cindex snews
620
621 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
622 1738, are used for reading Usenet newsgroups. For compatibility with
623 non-standard-compliant news clients, the @code{url} library allows
624 host and port fields to be included in @code{news} URLs, even though
625 they are properly only allowed for @code{nntp} and @code{snews}.
626
627 @code{news} and @code{nntp} URLs have the following form:
628
629 @table @samp
630 @item news:@var{newsgroup}
631 Retrieves a list of messages in @var{newsgroup};
632 @item news:@var{message-id}
633 Retrieves the message with the given @var{message-id};
634 @item news:*
635 Retrieves a list of all available newsgroups;
636 @item nntp://@var{host}:@var{port}/@var{newsgroup}
637 @itemx nntp://@var{host}:@var{port}/@var{message-id}
638 @itemx nntp://@var{host}:@var{port}/*
639 Similar to the @samp{news} versions.
640 @end table
641
642 The default port for @code{nntp} (and @code{news}) is 119. The
643 difference between an @code{nntp} URL and a @code{news} URL is that an
644 @code{nttp} URL may specify an article by its number. The
645 @samp{snews} scheme is the same as @samp{nntp}, except that it is
646 tunneled through SSL and has default port 563.
647
648 These URLs are retrieved via the Gnus package.
649
650 @cindex environment variable
651 @vindex NNTPSERVER
652 @defopt url-news-server
653 This variable specifies the default news server from which to fetch
654 news, if no server was specified in the URL@. The default value,
655 @code{nil}, means to use the server specified by the standard
656 environment variable @samp{NNTPSERVER}, or @samp{news} if that
657 environment variable is unset.
658 @end defopt
659
660 @node rlogin/telnet/tn3270
661 @section rlogin, telnet and tn3270
662 @cindex rlogin
663 @cindex telnet
664 @cindex tn3270
665 @cindex terminal emulation
666 @findex terminal-emulator
667
668 These URL schemes are defined in RFC 1738, and are used for logging in
669 via a terminal emulator. They have the form
670
671 @example
672 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
673 @end example
674
675 @noindent
676 but the @var{password} component is ignored.
677
678 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
679 @code{telnet} or @code{tn3270} (the program names and arguments are
680 hardcoded) session is run in a @code{terminal-emulator} buffer.
681 Well-known ports are used if the URL does not specify a port.
682
683 @node irc
684 @section irc
685 @cindex IRC
686 @cindex Internet Relay Chat
687 @cindex ZEN IRC
688 @cindex ERC
689 @cindex rcirc
690
691 The @code{irc} scheme is defined in the Internet Draft at
692 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
693 was never approved as an RFC). Such URLs have the form
694
695 @example
696 irc://@var{host}:@var{port}/@var{target},@var{needpass}
697 @end example
698
699 @noindent
700 and are retrieved by opening an @acronym{IRC} session using the
701 function specified by @code{url-irc-function}.
702
703 @defopt url-irc-function
704 The value of this option is a function, which is called to open an IRC
705 connection for @code{irc} URLs. This function must take five
706 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
707 @var{password}. The @var{channel} argument specifies the channel to
708 join immediately, and may be @code{nil}.
709
710 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
711 Other options are @code{url-irc-erc} (which uses ERC) and
712 @code{url-irc-zenirc} (which uses ZenIRC).
713 @end defopt
714
715 @node data
716 @section data
717 @cindex data URLs
718
719 The @code{data} scheme, defined in RFC 2397, contains MIME data in
720 the URL itself. Such URLs have the form
721
722 @example
723 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
724 @end example
725
726 @noindent
727 @var{media-type} is a MIME @samp{Content-Type} string, possibly
728 including parameters. It defaults to
729 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
730 omitted but the charset parameter supplied. If @samp{;base64} is
731 present, the @var{data} are base64-encoded.
732
733 @node nfs
734 @section nfs
735 @cindex NFS
736 @cindex Network File System
737 @cindex automounter
738
739 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
740 except that it points to a file on a remote host that is handled by an
741 NFS automounter on the local host. Such URLs have the form
742
743 @example
744 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
745 @end example
746
747 @defvar url-nfs-automounter-directory-spec
748 @end defvar
749 A string saying how to invoke the NFS automounter. Certain @samp{%}
750 sequences are recognized:
751
752 @table @samp
753 @item %h
754 The hostname of the NFS server;
755 @item %n
756 The port number of the NFS server;
757 @item %u
758 The username to use to authenticate;
759 @item %p
760 The password to use to authenticate;
761 @item %f
762 The filename on the remote server;
763 @item %%
764 A literal @samp{%}.
765 @end table
766
767 Each can be used any number of times.
768
769 @node ldap
770 @section ldap
771 @cindex LDAP
772 @cindex Lightweight Directory Access Protocol
773
774 The LDAP scheme is defined in RFC 2255.
775
776 @node man
777 @section man
778 @cindex @command{man}
779 @cindex Unix man pages
780 @findex man
781
782 The @code{man} scheme is a non-standard one. Such URLs have the form
783
784 @example
785 @samp{man:@var{page-spec}}
786 @end example
787
788 @noindent
789 and are retrieved by passing @var{page-spec} to the Lisp function
790 @code{man}.
791
792 @node General Facilities
793 @chapter General Facilities
794
795 @menu
796 * Disk Caching::
797 * Proxies::
798 * Gateways in general::
799 * History::
800 @end menu
801
802 @node Disk Caching
803 @section Disk Caching
804 @cindex Caching
805 @cindex Persistent Cache
806 @cindex Disk Cache
807
808 The disk cache stores retrieved documents locally, whence they can be
809 retrieved more quickly. When requesting a URL that is in the cache,
810 the library checks to see if the page has changed since it was last
811 retrieved from the remote machine. If not, the local copy is used,
812 saving the transmission over the network.
813 @cindex Cleaning the cache
814 @cindex Clearing the cache
815 @cindex Cache cleaning
816 Currently the cache isn't cleared automatically.
817 @c Running the @code{clean-cache} shell script
818 @c fist is recommended, to allow for future cleaning of the cache. This
819 @c shell script will remove all files that have not been accessed since it
820 @c was last run. To keep the cache pared down, it is recommended that this
821 @c script be run from @i{at} or @i{cron} (see the manual pages for
822 @c crontab(5) or at(1) for more information)
823
824 @defopt url-automatic-caching
825 Setting this variable non-@code{nil} causes documents to be cached
826 automatically.
827 @end defopt
828
829 @defopt url-cache-directory
830 This variable specifies the
831 directory to store the cache files. It defaults to sub-directory
832 @file{cache} of @code{url-configuration-directory}.
833 @end defopt
834
835 @defopt url-cache-creation-function
836 The cache relies on a scheme for mapping URLs to files in the cache.
837 This variable names a function which sets the type of cache to use.
838 It takes a URL as argument and returns the absolute file name of the
839 corresponding cache file. The two supplied possibilities are
840 @code{url-cache-create-filename-using-md5} and
841 @code{url-cache-create-filename-human-readable}.
842 @end defopt
843
844 @defun url-cache-create-filename-using-md5 url
845 Creates a cache file name from @var{url} using MD5 hashing.
846 This is creates entries with very few cache collisions and is fast.
847 @cindex MD5
848 @smallexample
849 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
850 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
851 @end smallexample
852 @end defun
853
854 @defun url-cache-create-filename-human-readable url
855 Creates a cache file name from @var{url} more obviously connected to
856 @var{url} than for @code{url-cache-create-filename-using-md5}, but
857 more likely to conflict with other files.
858 @smallexample
859 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
860 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
861 @end smallexample
862 @end defun
863
864 @defun url-cache-expired
865 This function returns non-nil if a cache entry has expired (or is absent).
866 The arguments are a URL and optional expiration delay in seconds
867 (default @var{url-cache-expire-time}).
868 @end defun
869
870 @defopt url-cache-expire-time
871 This variable is the default number of seconds to use for the
872 expire-time argument of the function @code{url-cache-expired}.
873 @end defopt
874
875 @defun url-fetch-from-cache
876 This function takes a URL as its argument and returns a buffer
877 containing the data cached for that URL.
878 @end defun
879
880 @c Fixme: never actually used currently?
881 @c @defopt url-standalone-mode
882 @c @cindex Relying on cache
883 @c @cindex Cache only mode
884 @c @cindex Standalone mode
885 @c If this variable is non-@code{nil}, the library relies solely on the
886 @c cache for fetching documents and avoids checking if they have changed
887 @c on remote servers.
888 @c @end defopt
889
890 @c With a large cache of documents on the local disk, it can be very handy
891 @c when traveling, or any other time the network connection is not active
892 @c (a laptop with a dial-on-demand PPP connection, etc). Emacs/W3 can rely
893 @c solely on its cache, and avoid checking to see if the page has changed
894 @c on the remote server. In the case of a dial-on-demand PPP connection,
895 @c this will keep the phone line free as long as possible, only bringing up
896 @c the PPP connection when asking for a page that is not located in the
897 @c cache. This is very useful for demonstrations as well.
898
899 @node Proxies
900 @section Proxies and Gatewaying
901
902 @c fixme: check/document url-ns stuff
903 @cindex proxy servers
904 @cindex proxies
905 @cindex environment variables
906 @vindex HTTP_PROXY
907 Proxy servers are commonly used to provide gateways through firewalls
908 or as caches serving some more-or-less local network. Each protocol
909 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
910 conventionally configured commonly amongst different programs through
911 environment variables of the form @code{@var{protocol}_proxy}, where
912 @var{protocol} is one of the supported network protocols (@code{http},
913 @code{ftp} etc.). The library recognizes such variables in either
914 upper or lower case. Their values are of one of the forms:
915 @itemize @bullet
916 @item @code{@var{host}:@var{port}}
917 @item A full URL;
918 @item Simply a host name.
919 @end itemize
920
921 @vindex NO_PROXY
922 The @code{NO_PROXY} environment variable specifies URLs that should be
923 excluded from proxying (on servers that should be contacted directly).
924 This should be a comma-separated list of hostnames, domain names, or a
925 mixture of both. Asterisks can be used as wildcards, but other
926 clients may not support that. Domain names may be indicated by a
927 leading dot. For example:
928 @example
929 NO_PROXY="*.aventail.com,home.com,.seanet.com"
930 @end example
931 @noindent says to contact all machines in the @samp{aventail.com} and
932 @samp{seanet.com} domains directly, as well as the machine named
933 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
934 and @code{no_proxy} are also tried, in that order.
935
936 Proxies may also be specified directly in Lisp.
937
938 @defopt url-proxy-services
939 This variable is an alist of URL schemes and proxy servers that
940 gateway them. The items are of the form @w{@code{(@var{scheme}
941 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
942 gatewayed through @var{portnumber} on the specified @var{host}. An
943 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
944 a regexp matching host names not to be proxied. This variable is
945 initialized from the environment as above.
946
947 @example
948 (setq url-proxy-services
949 '(("http" . "proxy.aventail.com:80")
950 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
951 @end example
952 @end defopt
953
954 @node Gateways in general
955 @section Gateways in General
956 @cindex gateways
957 @cindex firewalls
958
959 The library provides a general gateway layer through which all
960 networking passes. It can both control access to the network and
961 provide access through gateways in firewalls. This may make direct
962 connections in some cases and pass through some sort of gateway in
963 others.@footnote{Proxies (which only operate over HTTP) are
964 implemented using this.} The library's basic function responsible for
965 making connections is @code{url-open-stream}.
966
967 @defun url-open-stream name buffer host service
968 @cindex opening a stream
969 @cindex stream, opening
970 Open a stream to @var{host}, possibly via a gateway. The other
971 arguments are as for @code{open-network-stream}. This will not make a
972 connection if @code{url-gateway-unplugged} is non-@code{nil}.
973 @end defun
974
975 @defvar url-gateway-local-host-regexp
976 This is a regular expression that matches local hosts that do not
977 require the use of a gateway. If @code{nil}, all connections are made
978 through the gateway.
979 @end defvar
980
981 @defvar url-gateway-method
982 This variable controls which gateway method is used. It may be useful
983 to bind it temporarily in some applications. It has values taken from
984 a list of symbols. Possible values are:
985
986 @table @code
987 @item telnet
988 @cindex @command{telnet}
989 Use this method if you must first telnet and log into a gateway host,
990 and then run telnet from that host to connect to outside machines.
991
992 @item rlogin
993 @cindex @command{rlogin}
994 This method is identical to @code{telnet}, but uses @command{rlogin}
995 to log into the remote machine without having to send the username and
996 password over the wire every time.
997
998 @item socks
999 @cindex @sc{socks}
1000 Use if the firewall has a @sc{socks} gateway running on it. The
1001 @sc{socks} v5 protocol is defined in RFC 1928.
1002
1003 @c @item ssl
1004 @c This probably shouldn't be documented
1005 @c Fixme: why not? -- fx
1006
1007 @item native
1008 This method uses Emacs's builtin networking directly. This is the
1009 default. It can be used only if there is no firewall blocking access.
1010 @end table
1011 @end defvar
1012
1013 The following variables control the gateway methods.
1014
1015 @defopt url-gateway-telnet-host
1016 The gateway host to telnet to. Once logged in there, you then telnet
1017 out to the hosts you want to connect to.
1018 @end defopt
1019 @defopt url-gateway-telnet-parameters
1020 This should be a list of parameters to pass to the @command{telnet} program.
1021 @end defopt
1022 @defopt url-gateway-telnet-password-prompt
1023 This is a regular expression that matches the password prompt when
1024 logging in.
1025 @end defopt
1026 @defopt url-gateway-telnet-login-prompt
1027 This is a regular expression that matches the username prompt when
1028 logging in.
1029 @end defopt
1030 @defopt url-gateway-telnet-user-name
1031 The username to log in with.
1032 @end defopt
1033 @defopt url-gateway-telnet-password
1034 The password to send when logging in.
1035 @end defopt
1036 @defopt url-gateway-prompt-pattern
1037 This is a regular expression that matches the shell prompt.
1038 @end defopt
1039
1040 @defopt url-gateway-rlogin-host
1041 Host to @samp{rlogin} to before telnetting out.
1042 @end defopt
1043 @defopt url-gateway-rlogin-parameters
1044 Parameters to pass to @samp{rsh}.
1045 @end defopt
1046 @defopt url-gateway-rlogin-user-name
1047 User name to use when logging in to the gateway.
1048 @end defopt
1049 @defopt url-gateway-prompt-pattern
1050 This is a regular expression that matches the shell prompt.
1051 @end defopt
1052
1053 @defopt socks-server
1054 This specifies the default server, it takes the form
1055 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1056 where @var{version} can be either 4 or 5.
1057 @end defopt
1058 @defvar socks-password
1059 If this is @code{nil} then you will be asked for the password,
1060 otherwise it will be used as the password for authenticating you to
1061 the @sc{socks} server.
1062 @end defvar
1063 @defvar socks-username
1064 This is the username to use when authenticating yourself to the
1065 @sc{socks} server. By default this is your login name.
1066 @end defvar
1067 @defvar socks-timeout
1068 This controls how long, in seconds, to wait for responses from the
1069 @sc{socks} server; it is 5 by default.
1070 @end defvar
1071 @c fixme: these have been effectively commented-out in the code
1072 @c @defopt socks-server-aliases
1073 @c This a list of server aliases. It is a list of aliases of the form
1074 @c @var{(alias hostname port version)}.
1075 @c @end defopt
1076 @c @defopt socks-network-aliases
1077 @c This a list of network aliases. Each entry in the list takes the form
1078 @c @var{(alias (network))} where @var{alias} is a string that names the
1079 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1080 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1081 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1082 @c address.
1083 @c @end defopt
1084 @c @defopt socks-redirection-rules
1085 @c This a list of redirection rules. Each rule take the form
1086 @c @var{(Destination network Connection type)} where @var{Destination
1087 @c network} is a network alias from @code{socks-network-aliases} and
1088 @c @var{Connection type} can be @code{nil} in which case a direct
1089 @c connection is used, or it can be an alias from
1090 @c @code{socks-server-aliases} in which case that server is used as a
1091 @c proxy.
1092 @c @end defopt
1093 @defopt socks-nslookup-program
1094 @cindex @command{nslookup}
1095 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1096 @end defopt
1097
1098 @menu
1099 * Suppressing network connections::
1100 @end menu
1101 @c * Broken hostname resolution::
1102
1103 @node Suppressing network connections
1104 @subsection Suppressing Network Connections
1105
1106 @cindex network connections, suppressing
1107 @cindex suppressing network connections
1108 @cindex bugs, HTML
1109 @cindex HTML `bugs'
1110 In some circumstances it is desirable to suppress making network
1111 connections. A typical case is when rendering HTML in a mail user
1112 agent, when external URLs should not be activated, particularly to
1113 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1114 like. To arrange this, bind the following variable for the duration
1115 of such processing.
1116
1117 @defvar url-gateway-unplugged
1118 If this variable is non-@code{nil} new network connections are never
1119 opened by the URL library.
1120 @end defvar
1121
1122 @c @node Broken hostname resolution
1123 @c @subsection Broken Hostname Resolution
1124
1125 @c @cindex hostname resolver
1126 @c @cindex resolver, hostname
1127 @c Some C libraries do not include the hostname resolver routines in
1128 @c their static libraries. If Emacs was linked statically, and was not
1129 @c linked with the resolver libraries, it will not be able to get to any
1130 @c machines off the local network. This is characterized by being able
1131 @c to reach someplace with a raw ip number, but not its hostname
1132 @c (@url{http://129.79.254.191/} works, but
1133 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1134 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1135 @c rebuilt linked against the resolver library, it can use the external
1136 @c @command{nslookup} program instead.
1137
1138 @c @defopt url-gateway-broken-resolution
1139 @c @cindex @code{nslookup} program
1140 @c @cindex program, @code{nslookup}
1141 @c If non-@code{nil}, this variable says to use the program specified by
1142 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1143 @c @end defopt
1144
1145 @c @defopt url-gateway-nslookup-program
1146 @c The name of the program to do hostname lookup if Emacs can't do it
1147 @c directly. This program should expect a single argument on the command
1148 @c line---the hostname to resolve---and should produce output similar to
1149 @c the standard Unix @command{nslookup} program:
1150 @c @example
1151 @c Name: www.cs.indiana.edu
1152 @c Address: 129.79.254.191
1153 @c @end example
1154 @c @end defopt
1155
1156 @node History
1157 @section History
1158
1159 @findex url-do-setup
1160 The library can maintain a global history list tracking URLs accessed.
1161 URL completion can be done from it. The history mechanism is set up
1162 automatically via @code{url-do-setup} when it is configured to be on.
1163 Note that the size of the history list is currently not limited.
1164
1165 @vindex url-history-hash-table
1166 The history ``list'' is actually a hash table,
1167 @code{url-history-hash-table}. It contains access times keyed by URL
1168 strings. The times are in the format returned by @code{current-time}.
1169
1170 @defun url-history-update-url url time
1171 This function updates the history table with an entry for @var{url}
1172 accessed at the given @var{time}.
1173 @end defun
1174
1175 @defopt url-history-track
1176 If non-@code{nil}, the library will keep track of all the URLs
1177 accessed. If it is @code{t}, the list is saved to disk at the end of
1178 each Emacs session. The default is @code{nil}.
1179 @end defopt
1180
1181 @defopt url-history-file
1182 The file storing the history list between sessions. It defaults to
1183 @file{history} in @code{url-configuration-directory}.
1184 @end defopt
1185
1186 @defopt url-history-save-interval
1187 @findex url-history-setup-save-timer
1188 The number of seconds between automatic saves of the history list.
1189 Default is one hour. Note that if you change this variable directly,
1190 rather than using Custom, after @code{url-do-setup} has been run, you
1191 need to run the function @code{url-history-setup-save-timer}.
1192 @end defopt
1193
1194 @defun url-history-parse-history &optional fname
1195 Parses the history file @var{fname} (default @code{url-history-file})
1196 and sets up the history list.
1197 @end defun
1198
1199 @defun url-history-save-history &optional fname
1200 Saves the current history to file @var{fname} (default
1201 @code{url-history-file}).
1202 @end defun
1203
1204 @defun url-completion-function string predicate function
1205 You can use this function to do completion of URLs from the history.
1206 @end defun
1207
1208 @node Customization
1209 @chapter Customization
1210
1211 @cindex environment variables
1212 The following environment variables affect the @code{url} library's
1213 operation at startup.
1214
1215 @table @code
1216 @item TMPDIR
1217 @vindex TMPDIR
1218 @vindex url-temporary-directory
1219 If this is defined, @var{url-temporary-directory} is initialized from
1220 it.
1221 @end table
1222
1223 The following user options affect the general operation of
1224 @code{url} library.
1225
1226 @defopt url-configuration-directory
1227 @cindex configuration files
1228 The value of this variable specifies the name of the directory where
1229 the @code{url} library stores its various configuration files, cache
1230 files, etc.
1231
1232 The default value specifies a subdirectory named @file{url/} in the
1233 standard Emacs user data directory specified by the variable
1234 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1235 the old default was @file{~/.url}, and this directory is used instead
1236 if it exists.
1237 @end defopt
1238
1239 @defopt url-debug
1240 @cindex debugging
1241 Specifies the types of debug messages which are logged to
1242 the @code{*URL-DEBUG*} buffer.
1243 @code{t} means log all messages.
1244 A number means log all messages and show them with @code{message}.
1245 It may also be a list of the types of messages to be logged.
1246 @end defopt
1247 @defopt url-personal-mail-address
1248 @end defopt
1249 @defopt url-privacy-level
1250 @end defopt
1251 @defopt url-uncompressor-alist
1252 @end defopt
1253 @defopt url-passwd-entry-func
1254 @end defopt
1255 @defopt url-standalone-mode
1256 @end defopt
1257 @defopt url-bad-port-list
1258 @end defopt
1259 @defopt url-max-password-attempts
1260 @end defopt
1261 @defopt url-temporary-directory
1262 @end defopt
1263 @defopt url-show-status
1264 @end defopt
1265 @defopt url-confirmation-func
1266 The function to use for asking yes or no functions. This is normally
1267 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1268 function taking a single argument (the prompt) and returning @code{t}
1269 only if an affirmative answer is given.
1270 @end defopt
1271 @defopt url-gateway-method
1272 @c fixme: describe gatewaying
1273 A symbol specifying the type of gateway support to use for connections
1274 from the local machine. The supported methods are:
1275
1276 @table @code
1277 @item telnet
1278 Run telnet in a subprocess to connect;
1279 @item rlogin
1280 Rlogin to another machine to connect;
1281 @item socks
1282 Connect through a socks server;
1283 @item ssl
1284 Connect with SSL;
1285 @item native
1286 Connect directly.
1287 @end table
1288 @end defopt
1289
1290 @node GNU Free Documentation License
1291 @appendix GNU Free Documentation License
1292 @include doclicense.texi
1293
1294 @node Function Index
1295 @unnumbered Command and Function Index
1296 @printindex fn
1297
1298 @node Variable Index
1299 @unnumbered Variable Index
1300 @printindex vr
1301
1302 @node Concept Index
1303 @unnumbered Concept Index
1304 @printindex cp
1305
1306 @bye