Hyphen and dash fixes in texinfo files.
[bpt/emacs.git] / doc / misc / url.texi
CommitLineData
4009494e 1\input texinfo
db78a8cb 2@setfilename ../../info/url
4009494e
GM
3@settitle URL Programmer's Manual
4
5@iftex
6@c @finalout
7@end iftex
8@c @setchapternewpage odd
9@c @smallbook
10
11@tex
12\overfullrule=0pt
13%\global\baselineskip 30pt % for printing in double space
14@end tex
0c973505 15@dircategory Emacs lisp libraries
4009494e 16@direntry
62e034c2 17* URL: (url). URL loading package.
4009494e
GM
18@end direntry
19
e2852284 20@copying
5b637222 21This is the manual for the @code{url} Emacs Lisp library.
4009494e 22
f99f1641
PE
23Copyright @copyright{} 1993--1999, 2002, 2004--2012 Free Software
24Foundation, Inc.
4009494e 25
e2852284 26@quotation
4009494e 27Permission is granted to copy, distribute and/or modify this document
6a2c4aec 28under the terms of the GNU Free Documentation License, Version 1.3 or
e2852284 29any later version published by the Free Software Foundation; with no
cd5c05d2
GM
30Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
31and with the Back-Cover Texts as in (a) below. A copy of the license
32is included in the section entitled ``GNU Free Documentation License''.
33
34(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
6bf430d1 35modify this GNU manual.''
e2852284
GM
36@end quotation
37@end copying
4009494e
GM
38
39@c
40@titlepage
e2852284
GM
41@title URL Programmer's Manual
42@subtitle First Edition, URL Version 2.0
43@author William M. Perry @email{wmperry@@gnu.org}
44@author David Love @email{fx@@gnu.org}
4009494e
GM
45@page
46@vskip 0pt plus 1filll
e2852284 47@insertcopying
4009494e 48@end titlepage
e2852284 49
5dc584b5
KB
50@contents
51
4009494e
GM
52@node Top
53@top URL
54
5dc584b5
KB
55@ifnottex
56@insertcopying
57@end ifnottex
4009494e 58
4009494e 59@menu
82f84fa3
CY
60* Introduction:: About the @code{url} library.
61* URI Parsing:: Parsing (and unparsing) URIs.
4009494e
GM
62* Retrieving URLs:: How to use this package to retrieve a URL.
63* Supported URL Types:: Descriptions of URL types currently supported.
4009494e
GM
64* General Facilities:: URLs can be cached, accessed via a gateway
65 and tracked in a history list.
66* Customization:: Variables you can alter.
67* GNU Free Documentation License:: The license for this documentation.
68* Function Index::
69* Variable Index::
70* Concept Index::
71@end menu
72
82f84fa3
CY
73@node Introduction
74@chapter Introduction
75@cindex URL
76@cindex URI
77@cindex uniform resource identifier
78@cindex uniform resource locator
4009494e 79
82f84fa3 80A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
5b637222 81name, such as an Internet address, that identifies some name or
82f84fa3
CY
82resource. The format of URIs is described in RFC 3986, which updates
83and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
84@dfn{Uniform Resource Locator} (URL) is an older but still-common
5b637222
CY
85term, which basically refers to a URI corresponding to a resource that
86can be accessed (usually over a network) in a specific way.
4009494e 87
82f84fa3 88 Here are some examples of URIs (taken from RFC 3986):
4009494e
GM
89
90@example
82f84fa3
CY
91ftp://ftp.is.co.za/rfc/rfc1808.txt
92http://www.ietf.org/rfc/rfc2396.txt
93ldap://[2001:db8::7]/c=GB?objectClass?one
94mailto:John.Doe@@example.com
95news:comp.infosystems.www.servers.unix
96tel:+1-816-555-1212
97telnet://192.0.2.16:80/
98urn:oasis:names:specification:docbook:dtd:xml:4.1.2
4009494e 99@end example
4009494e 100
82f84fa3
CY
101 This manual describes the @code{url} library, an Emacs Lisp library
102for parsing URIs and retrieving the resources to which they refer.
5b637222
CY
103(The library is so-named for historical reasons; nowadays, the ``URI''
104terminology is regarded as the more general one, and ``URL'' is
105technically obsolete despite its widespread vernacular usage.)
4009494e 106
82f84fa3
CY
107@node URI Parsing
108@chapter URI Parsing
109
110 A URI consists of several @dfn{components}, each having a different
111meaning. For example, the URI
4009494e
GM
112
113@example
82f84fa3 114http://www.gnu.org/software/emacs/
4009494e
GM
115@end example
116
82f84fa3
CY
117@noindent
118specifies the scheme component @samp{http}, the hostname component
119@samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
120
121@cindex parsed URIs
5b637222
CY
122 The format of URIs is specified by RFC 3986. The @code{url} library
123provides the Lisp function @code{url-generic-parse-url}, a (mostly)
124standard-compliant URI parser, as well as function
125@code{url-recreate-url}, which converts a parsed URI back into a URI
126string.
127
128@defun url-generic-parse-url uri-string
129This function returns a parsed version of the string @var{uri-string}.
82f84fa3
CY
130@end defun
131
5b637222 132@defun url-recreate-url uri-obj
82f84fa3 133@cindex unparsing URLs
5b637222 134Given a parsed URI, this function returns the corresponding URI string.
82f84fa3
CY
135@end defun
136
137@cindex parsed URI
138 The return value of @code{url-generic-parse-url}, and the argument
5b637222 139expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
1df7defd 140structure whose slots hold the various components of the URI@.
5b637222
CY
141@xref{top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
142details about CL structures. Most of the other functions in the
143@code{url} library act on parsed URIs.
144
145@menu
146* Parsed URIs:: Format of parsed URI structures.
147* URI Encoding:: Non-@acronym{ASCII} characters in URIs.
148@end menu
149
150@node Parsed URIs
151@section Parsed URI structures
152
153 Each parsed URI structure contains the following slots:
82f84fa3
CY
154
155@table @code
4009494e 156@item type
1df7defd 157The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
82f84fa3
CY
158Types}, for a list of schemes that the @code{url} library knows how to
159process. This slot can also be @code{nil}, if the URI is not fully
160specified.
161
4009494e 162@item user
82f84fa3
CY
163The user name (a string), or @code{nil}.
164
4009494e 165@item password
82f84fa3
CY
166The user password (a string), or @code{nil}. The use of this URI
167component is strongly discouraged; nowadays, passwords are transmitted
168by other means, not as part of a URI.
169
4009494e 170@item host
82f84fa3
CY
171The host name (a string), or @code{nil}. If present, this is
172typically a domain name or IP address.
173
4009494e 174@item port
82f84fa3
CY
175The port number (an integer), or @code{nil}. Omitting this component
176usually means to use the ``standard'' port associated with the URI
177scheme.
178
5a4c42ba 179@item filename
82f84fa3
CY
180The combination of the ``path'' and ``query'' components of the URI (a
181string), or @code{nil}. If the query component is present, it is the
182substring following the first @samp{?} character, and the path
183component is the substring before the @samp{?}. The meaning of these
5b637222 184components is scheme-dependent; they do not necessarily refer to a
82f84fa3
CY
185file on a disk.
186
4009494e 187@item target
82f84fa3
CY
188The fragment component (a string), or @code{nil}. The fragment
189component specifies a ``secondary resource'', such as a section of a
190webpage.
191
5a4c42ba 192@item fullness
1df7defd 193This is @code{t} if the URI is fully specified, i.e., the
82f84fa3
CY
194hierarchical components of the URI (the hostname and/or username
195and/or password) are preceded by @samp{//}.
4009494e
GM
196@end table
197
198@findex url-type
199@findex url-user
200@findex url-password
201@findex url-host
202@findex url-port
5a4c42ba 203@findex url-filename
4009494e
GM
204@findex url-target
205@findex url-attributes
5a4c42ba 206@findex url-fullness
5b637222 207These slots have accessors named @code{url-@var{part}}, where
82f84fa3
CY
208@var{part} is the slot name. For example, the accessor for the
209@code{host} slot is the function @code{url-host}. The @code{url-port}
210accessor returns the default port for the URI scheme if the parsed
211URI's @var{port} slot is @code{nil}.
212
213 The slots can be set using @code{setf}. For example:
5a4c42ba
AA
214
215@example
216(setf (url-port url) 80)
217@end example
218
5b637222
CY
219@node URI Encoding
220@section URI Encoding
221
222@cindex percent encoding
223 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
224one respect: it allows non-@acronym{ASCII} characters in URI strings.
225
226 Strictly speaking, RFC 3986 compatible URIs may only consist of
227@acronym{ASCII} characters; non-@acronym{ASCII} characters are
228represented by converting them to UTF-8 byte sequences, and performing
229@dfn{percent encoding} on the bytes. For example, the o-umlaut
230character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
231then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
232@acronym{ASCII} characters must also be percent encoded when they
233appear in URI components.)
234
235 The function @code{url-encode-url} can be used to convert a URI
236string containing arbitrary characters to one that is properly
237percent-encoded in accordance with RFC 3986.
238
239@defun url-encode-url url-string
240This function return a properly URI-encoded version of
241@var{url-string}. It also performs @dfn{URI normalization},
1df7defd 242e.g., converting the scheme component to lowercase if it was
5b637222
CY
243previously uppercase.
244@end defun
245
246 To convert between a string containing arbitrary characters and a
247percent-encoded all-@acronym{ASCII} string, use the functions
248@code{url-hexify-string} and @code{url-unhex-string}:
249
250@defun url-hexify-string string &optional allowed-chars
251This function performs percent-encoding on @var{string}, and returns
252the result.
253
254If @var{string} is multibyte, it is first converted to a UTF-8 byte
255string. Each byte corresponding to an allowed character is left
256as-is, while all other bytes are converted to a three-character
257sequence: @samp{%} followed by two upper-case hex digits.
258
259@vindex url-unreserved-chars
260@cindex unreserved characters
261The allowed characters are specified by @var{allowed-chars}. If this
262argument is @code{nil}, the allowed characters are those specified as
263@dfn{unreserved characters} by RFC 3986 (see the variable
264@code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
265be a vector whose @var{n}-th element is non-@code{nil} if character
266@var{n} is allowed.
267@end defun
268
269@defun url-unhex-string string &optional allow-newlines
270This function replaces percent-encoding sequences in @var{string} with
271their character equivalents, and returns the resulting string.
272
273If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
274carriage returns and line feeds, which are normally forbidden in URIs.
275@end defun
276
4009494e
GM
277@node Retrieving URLs
278@chapter Retrieving URLs
279
5b637222 280 The @code{url} library defines the following three functions for
1df7defd 281retrieving the data specified by a URL@. The actual retrieval protocol
5b637222
CY
282depends on the URL's URI scheme, and is performed by lower-level
283scheme-specific functions. (Those lower-level functions are not
284documented here, and generally should not be called directly.)
285
286 In each of these functions, the @var{url} argument can be either a
287string or a parsed URL structure. If it is a string, that string is
288passed through @code{url-encode-url} before using it, to ensure that
289it is properly URI-encoded (@pxref{URI Encoding}).
290
4009494e 291@defun url-retrieve-synchronously url
5b637222
CY
292This function synchronously retrieves the data specified by @var{url},
293and returns a buffer containing the data. The return value is
294@code{nil} if there is no data associated with the URL (as is the case
295for @code{dired}, @code{info}, and @code{mailto} URLs).
4009494e
GM
296@end defun
297
b74c9672 298@defun url-retrieve url callback &optional cbargs silent no-cookies
5b637222
CY
299This function retrieves @var{url} asynchronously, calling the function
300@var{callback} when the object has been completely retrieved. The
301return value is the buffer into which the data will be inserted, or
302@code{nil} if the process has already completed.
303
304The callback function is called this way:
305
306@example
307(apply @var{callback} @var{status} @var{cbargs})
308@end example
309
310@noindent
311where @var{status} is a plist representing what happened during the
312retrieval, with most recent events first, or an empty list if no
313events have occurred. Each pair in the plist is one of:
314
315@table @code
316@item (:redirect @var{redirected-to})
317This means that the request was redirected to the URL
318@var{redirected-to}.
319
320@item (:error (@var{error-symbol} . @var{data}))
321This means that an error occurred. If so desired, the error can be
322signaled with @code{(signal @var{error-symbol} @var{data})}.
323@end table
324
325When the callback function is called, the current buffer is the one
326containing the retrieved data (if any). The buffer also contains any
327MIME headers associated with the data retrieval.
328
329If the optional argument @var{silent} is non-@code{nil}, progress
330messages are suppressed. If the optional argument @var{no-cookies} is
331non-@code{nil}, cookies are not stored or sent.
b74c9672
GM
332@end defun
333
b74c9672 334@defun url-queue-retrieve url callback &optional cbargs silent no-cookies
5b637222
CY
335This function acts like @code{url-retrieve}, but with limits on the
336number of concurrently-running network processes. The option
337@code{url-queue-parallel-processes} controls the number of concurrent
338processes, and the option @code{url-queue-timeout} sets a timeout in
339seconds.
340
341To use this function, you must @code{(require 'url-queue)}.
4009494e
GM
342@end defun
343
5b637222
CY
344@vindex url-queue-parallel-processes
345@defopt url-queue-parallel-processes
346The value of this option is an integer specifying the maximum number
347of concurrent @code{url-queue-retrieve} network processes. If the
348number of @code{url-queue-retrieve} calls is larger than this number,
349later ones are queued until ealier ones are finished.
350@end defopt
351
352@vindex url-queue-timeout
353@defopt url-queue-timeout
354The value of this option is a number specifying the maximum lifetime
355of a @code{url-queue-retrieve} network process, once it is started.
356If a process is not finished by then, it is killed and removed from
357the queue.
358@end defopt
359
4009494e
GM
360@node Supported URL Types
361@chapter Supported URL Types
362
5b637222
CY
363This chapter describes functions and variables affecting URL retrieval
364for specific schemes.
365
4009494e
GM
366@menu
367* http/https:: Hypertext Transfer Protocol.
368* file/ftp:: Local files and FTP archives.
775b55af 369* info:: Emacs "Info" pages.
4009494e
GM
370* mailto:: Sending email.
371* news/nntp/snews:: Usenet news.
372* rlogin/telnet/tn3270:: Remote host connectivity.
373* irc:: Internet Relay Chat.
374* data:: Embedded data URLs.
375* nfs:: Networked File System
4009494e 376* ldap:: Lightweight Directory Access Protocol
4009494e
GM
377* man:: Unix man pages.
378@end menu
379
380@node http/https
381@section @code{http} and @code{https}
382
5b637222
CY
383The @code{http} scheme refers to the Hypertext Transfer Protocol. The
384@code{url} library supports HTTP version 1.1, specified in RFC 2616.
385Its default port is 80.
386
387 The @code{https} scheme is a secure version of @code{http}, with
1df7defd 388transmission via SSL@. It is defined in RFC 2069, and its default port
5b637222
CY
389is 443. When using @code{https}, the @code{url} library performs SSL
390encryption via the @code{ssl} library, by forcing the @code{ssl}
391gateway method to be used. @xref{Gateways in general}.
4009494e
GM
392
393@defopt url-honor-refresh-requests
5b637222
CY
394If this option is non-@code{nil} (the default), the @code{url} library
395honors the HTTP @samp{Refresh} header, which is used by servers to
396direct clients to reload documents from the same URL or a or different
397one. If the value is @code{nil}, the @samp{Refresh} header is
398ignored; any other value means to ask the user on each request.
4009494e
GM
399@end defopt
400
4009494e
GM
401@menu
402* Cookies::
403* HTTP language/coding::
404* HTTP URL Options::
405* Dealing with HTTP documents::
406@end menu
407
408@node Cookies
409@subsection Cookies
410
411@defopt url-cookie-file
412The file in which cookies are stored, defaulting to @file{cookies} in
413the directory specified by @code{url-configuration-directory}.
414@end defopt
415
416@defopt url-cookie-confirmation
417Specifies whether confirmation is require to accept cookies.
418@end defopt
419
420@defopt url-cookie-multiple-line
421Specifies whether to put all cookies for the server on one line in the
422HTTP request to satisfy broken servers like
423@url{http://www.hotmail.com}.
424@end defopt
425
426@defopt url-cookie-trusted-urls
427A list of regular expressions matching URLs from which to accept
428cookies always.
429@end defopt
430
431@defopt url-cookie-untrusted-urls
432A list of regular expressions matching URLs from which to reject
433cookies always.
434@end defopt
435
436@defopt url-cookie-save-interval
437The number of seconds between automatic saves of cookies to disk.
438Default is one hour.
439@end defopt
440
441
442@node HTTP language/coding
443@subsection Language and Encoding Preferences
444
445HTTP allows clients to express preferences for the language and
135305ed 446encoding of documents which servers may honor. For each of these
4009494e
GM
447variables, the value is a string; it can specify a single choice, or
448it can be a comma-separated list.
449
da0bbbc4 450Normally, this list is ordered by descending preference. However, each
4009494e
GM
451element can be followed by @samp{;q=@var{priority}} to specify its
452preference level, a decimal number from 0 to 1; e.g., for
453@code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
454en;q=0.7"}}. An element that has no @samp{;q} specification has
455preference level 1.
456
457@defopt url-mime-charset-string
458@cindex character sets
459@cindex coding systems
460This variable specifies a preference for character sets when documents
461can be served in more than one encoding.
462
463HTTP allows specifying a series of MIME charsets which indicate your
464preferred character set encodings, e.g., Latin-9 or Big5, and these
465can be weighted. The default series is generated automatically from
466the associated MIME types of all defined coding systems, sorted by the
467coding system priority specified in Emacs. @xref{Recognize Coding, ,
468Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
469@end defopt
470
471@defopt url-mime-language-string
472@cindex language preferences
473A string specifying the preferred language when servers can serve
474files in several languages. Use RFC 1766 abbreviations, e.g.,
475@samp{en} for English, @samp{de} for German.
476
477The string can be @code{"*"} to get the first available language (as
478opposed to the default).
479@end defopt
480
481@node HTTP URL Options
482@subsection HTTP URL Options
483
484HTTP supports an @samp{OPTIONS} method describing things supported by
485the URL@.
486
487@defun url-http-options url
1df7defd 488Returns a property list describing options available for URL@. The
4009494e
GM
489property list members are:
490
491@table @code
492@item methods
493A list of symbols specifying what HTTP methods the resource
494supports.
495
496@item dav
497@cindex DAV
498A list of numbers specifying what DAV protocol/schema versions are
499supported.
500
501@item dasl
502@cindex DASL
503A list of supported DASL search types supported (string form).
504
505@item ranges
506A list of the units available for use in partial document fetches.
507
508@item p3p
509@cindex P3P
510The @dfn{Platform For Privacy Protection} description for the resource.
511Currently this is just the raw header contents.
512@end table
513
514@end defun
515
516@node Dealing with HTTP documents
517@subsection Dealing with HTTP documents
518
519HTTP URLs are retrieved into a buffer containing the HTTP headers
520followed by the body. Since the headers are quasi-MIME, they may be
521processed using the MIME library. @xref{Top,, Emacs MIME,
e0535b76 522emacs-mime, The Emacs MIME Manual}.
4009494e
GM
523
524@node file/ftp
525@section file and ftp
526@cindex files
527@cindex FTP
528@cindex File Transfer Protocol
529@cindex compressed files
530@cindex dired
531
5b637222
CY
532The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
533@code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
534Such URLs have the form
535
4009494e
GM
536@example
537ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
538file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
539@end example
540
5b637222
CY
541@noindent
542If the URL specifies a local file, it is retrieved by reading the file
543contents in the usual way. If it specifies a remote file, it is
544retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
545The GNU Emacs Manual}.
4009494e 546
5b637222
CY
547 When retrieving a compressed file, it is automatically uncompressed
548if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
549@file{.bz2}, or @file{.xz}. (The list of supported suffixes is
550hard-coded, and cannot be altered by customizing
551@code{jka-compr-compression-info-list}.)
4009494e
GM
552
553@defopt url-directory-index-file
5b637222
CY
554This option specifies the filename to look for when a @code{file} or
555@code{ftp} URL specifies a directory. The default is
556@file{index.html}. If this file exists and is readable, it is viewed.
557Otherwise, Emacs visits the directory using Dired.
4009494e
GM
558@end defopt
559
560@node info
561@section info
562@cindex Info
563@cindex Texinfo
564@findex Info-goto-node
565
5b637222
CY
566The @code{info} scheme is non-standard. Such URLs have the form
567
4009494e
GM
568@example
569info:@var{file}#@var{node}
570@end example
571
5b637222
CY
572@noindent
573and are retrieved by invoking @code{Info-goto-node} with argument
574@samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
575@samp{Top} node is opened.
4009494e
GM
576
577@node mailto
578@section mailto
579
580@cindex mailto
581@cindex email
5b637222
CY
582A @code{mailto} URL specifies an email message to be sent to a given
583email address. For example, @samp{mailto:foo@@bar.com} specifies
584sending a message to @samp{foo@@bar.com}. The ``retrieval method''
585for such URLs is to open a mail composition buffer in which the
1df7defd 586appropriate content (e.g., the recipient address) has been filled in.
4009494e 587
5b637222 588 As defined in RFC 2368, a @code{mailto} URL has the form
4009494e 589
4009494e
GM
590@example
591@samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
592@end example
4009494e 593
5b637222
CY
594@noindent
595where an arbitrary number of @var{header}s can be added. If the
596@var{header} is @samp{body}, then @var{contents} is put in the message
597body; otherwise, a @var{header} header field is created with
598@var{contents} as its contents. Note that the @code{url} library does
599not perform any checking of @var{header} or @var{contents}, so you
600should check them before sending the message.
601
602@defopt url-mail-command
603@vindex mail-user-agent
604The value of this variable is the function called whenever url needs
605to send mail. This should normally be left its default, which is the
606standard mail-composition command @code{compose-mail}. @xref{Sending
607Mail,,, emacs, The GNU Emacs Manual}.
608@end defopt
609
610 If the document containing the @code{mailto} URL itself possessed a
611known URL, Emacs automatically inserts an @samp{X-Url-From} header
612field into the mail buffer, specifying that URL.
4009494e
GM
613
614@node news/nntp/snews
615@section @code{news}, @code{nntp} and @code{snews}
616@cindex news
617@cindex network news
618@cindex usenet
619@cindex NNTP
620@cindex snews
621
5b637222
CY
622The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
6231738, are used for reading Usenet newsgroups. For compatibility with
624non-standard-compliant news clients, the @code{url} library allows
625host and port fields to be included in @code{news} URLs, even though
626they are properly only allowed for @code{nntp} and @code{snews}.
627
628 @code{news} and @code{nntp} URLs have the following form:
4009494e
GM
629
630@table @samp
631@item news:@var{newsgroup}
632Retrieves a list of messages in @var{newsgroup};
633@item news:@var{message-id}
634Retrieves the message with the given @var{message-id};
635@item news:*
636Retrieves a list of all available newsgroups;
637@item nntp://@var{host}:@var{port}/@var{newsgroup}
638@itemx nntp://@var{host}:@var{port}/@var{message-id}
639@itemx nntp://@var{host}:@var{port}/*
640Similar to the @samp{news} versions.
641@end table
642
5b637222
CY
643 The default port for @code{nntp} (and @code{news}) is 119. The
644difference between an @code{nntp} URL and a @code{news} URL is that an
645@code{nttp} URL may specify an article by its number. The
646@samp{snews} scheme is the same as @samp{nntp}, except that it is
647tunneled through SSL and has default port 563.
4009494e 648
5b637222 649 These URLs are retrieved via the Gnus package.
4009494e 650
4009494e
GM
651@cindex environment variable
652@vindex NNTPSERVER
5b637222
CY
653@defopt url-news-server
654This variable specifies the default news server from which to fetch
1df7defd 655news, if no server was specified in the URL@. The default value,
5b637222
CY
656@code{nil}, means to use the server specified by the standard
657environment variable @samp{NNTPSERVER}, or @samp{news} if that
658environment variable is unset.
4009494e
GM
659@end defopt
660
661@node rlogin/telnet/tn3270
662@section rlogin, telnet and tn3270
663@cindex rlogin
664@cindex telnet
665@cindex tn3270
666@cindex terminal emulation
667@findex terminal-emulator
668
5b637222
CY
669These URL schemes are defined in RFC 1738, and are used for logging in
670via a terminal emulator. They have the form
671
4009494e
GM
672@example
673telnet://@var{user}:@var{password}@@@var{host}:@var{port}
674@end example
5b637222
CY
675
676@noindent
677but the @var{password} component is ignored.
4009494e
GM
678
679To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
680@code{telnet} or @code{tn3270} (the program names and arguments are
681hardcoded) session is run in a @code{terminal-emulator} buffer.
682Well-known ports are used if the URL does not specify a port.
683
684@node irc
685@section irc
686@cindex IRC
687@cindex Internet Relay Chat
688@cindex ZEN IRC
689@cindex ERC
690@cindex rcirc
5b637222
CY
691
692 The @code{irc} scheme is defined in the Internet Draft at
693@url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
694was never approved as an RFC). Such URLs have the form
695
696@example
697irc://@var{host}:@var{port}/@var{target},@var{needpass}
698@end example
699
700@noindent
701and are retrieved by opening an @acronym{IRC} session using the
702function specified by @code{url-irc-function}.
4009494e
GM
703
704@defopt url-irc-function
5b637222
CY
705The value of this option is a function, which is called to open an IRC
706connection for @code{irc} URLs. This function must take five
707arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
708@var{password}. The @var{channel} argument specifies the channel to
709join immediately, and may be @code{nil}.
710
711The default is @code{url-irc-rcirc}, which uses the Rcirc package.
712Other options are @code{url-irc-erc} (which uses ERC) and
713@code{url-irc-zenirc} (which uses ZenIRC).
4009494e 714@end defopt
4009494e
GM
715
716@node data
717@section data
718@cindex data URLs
719
5b637222
CY
720 The @code{data} scheme, defined in RFC 2397, contains MIME data in
721the URL itself. Such URLs have the form
722
4009494e
GM
723@example
724data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
725@end example
726
5b637222 727@noindent
4009494e
GM
728@var{media-type} is a MIME @samp{Content-Type} string, possibly
729including parameters. It defaults to
730@samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
731omitted but the charset parameter supplied. If @samp{;base64} is
732present, the @var{data} are base64-encoded.
733
734@node nfs
735@section nfs
736@cindex NFS
737@cindex Network File System
738@cindex automounter
739
5b637222
CY
740The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
741except that it points to a file on a remote host that is handled by an
742NFS automounter on the local host. Such URLs have the form
743
4009494e
GM
744@example
745nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
746@end example
747
4009494e
GM
748@defvar url-nfs-automounter-directory-spec
749@end defvar
750A string saying how to invoke the NFS automounter. Certain @samp{%}
751sequences are recognized:
752
753@table @samp
754@item %h
755The hostname of the NFS server;
756@item %n
757The port number of the NFS server;
758@item %u
759The username to use to authenticate;
760@item %p
761The password to use to authenticate;
762@item %f
763The filename on the remote server;
764@item %%
765A literal @samp{%}.
766@end table
767
768Each can be used any number of times.
769
4009494e
GM
770@node ldap
771@section ldap
772@cindex LDAP
773@cindex Lightweight Directory Access Protocol
774
775The LDAP scheme is defined in RFC 2255.
776
4009494e
GM
777@node man
778@section man
779@cindex @command{man}
780@cindex Unix man pages
781@findex man
782
5b637222
CY
783The @code{man} scheme is a non-standard one. Such URLs have the form
784
4009494e
GM
785@example
786@samp{man:@var{page-spec}}
787@end example
788
5b637222
CY
789@noindent
790and are retrieved by passing @var{page-spec} to the Lisp function
791@code{man}.
4009494e
GM
792
793@node General Facilities
794@chapter General Facilities
795
796@menu
797* Disk Caching::
798* Proxies::
799* Gateways in general::
800* History::
801@end menu
802
803@node Disk Caching
804@section Disk Caching
805@cindex Caching
806@cindex Persistent Cache
807@cindex Disk Cache
808
809The disk cache stores retrieved documents locally, whence they can be
810retrieved more quickly. When requesting a URL that is in the cache,
811the library checks to see if the page has changed since it was last
812retrieved from the remote machine. If not, the local copy is used,
813saving the transmission over the network.
814@cindex Cleaning the cache
815@cindex Clearing the cache
816@cindex Cache cleaning
817Currently the cache isn't cleared automatically.
818@c Running the @code{clean-cache} shell script
819@c fist is recommended, to allow for future cleaning of the cache. This
820@c shell script will remove all files that have not been accessed since it
821@c was last run. To keep the cache pared down, it is recommended that this
822@c script be run from @i{at} or @i{cron} (see the manual pages for
823@c crontab(5) or at(1) for more information)
824
825@defopt url-automatic-caching
826Setting this variable non-@code{nil} causes documents to be cached
827automatically.
828@end defopt
829
830@defopt url-cache-directory
831This variable specifies the
832directory to store the cache files. It defaults to sub-directory
833@file{cache} of @code{url-configuration-directory}.
834@end defopt
835
4009494e
GM
836@defopt url-cache-creation-function
837The cache relies on a scheme for mapping URLs to files in the cache.
838This variable names a function which sets the type of cache to use.
839It takes a URL as argument and returns the absolute file name of the
840corresponding cache file. The two supplied possibilities are
841@code{url-cache-create-filename-using-md5} and
842@code{url-cache-create-filename-human-readable}.
843@end defopt
844
845@defun url-cache-create-filename-using-md5 url
846Creates a cache file name from @var{url} using MD5 hashing.
847This is creates entries with very few cache collisions and is fast.
848@cindex MD5
849@smallexample
850(url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
851 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
852@end smallexample
853@end defun
854
855@defun url-cache-create-filename-human-readable url
856Creates a cache file name from @var{url} more obviously connected to
857@var{url} than for @code{url-cache-create-filename-using-md5}, but
858more likely to conflict with other files.
859@smallexample
860(url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
861 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
862@end smallexample
863@end defun
864
9c766321 865@defun url-cache-expired
18778f71
GM
866This function returns non-nil if a cache entry has expired (or is absent).
867The arguments are a URL and optional expiration delay in seconds
868(default @var{url-cache-expire-time}).
9c766321
JD
869@end defun
870
18778f71
GM
871@defopt url-cache-expire-time
872This variable is the default number of seconds to use for the
873expire-time argument of the function @code{url-cache-expired}.
874@end defopt
875
9c766321 876@defun url-fetch-from-cache
18778f71
GM
877This function takes a URL as its argument and returns a buffer
878containing the data cached for that URL.
9c766321
JD
879@end defun
880
4009494e
GM
881@c Fixme: never actually used currently?
882@c @defopt url-standalone-mode
883@c @cindex Relying on cache
884@c @cindex Cache only mode
885@c @cindex Standalone mode
886@c If this variable is non-@code{nil}, the library relies solely on the
887@c cache for fetching documents and avoids checking if they have changed
888@c on remote servers.
889@c @end defopt
890
891@c With a large cache of documents on the local disk, it can be very handy
892@c when traveling, or any other time the network connection is not active
893@c (a laptop with a dial-on-demand PPP connection, etc). Emacs/W3 can rely
894@c solely on its cache, and avoid checking to see if the page has changed
895@c on the remote server. In the case of a dial-on-demand PPP connection,
896@c this will keep the phone line free as long as possible, only bringing up
897@c the PPP connection when asking for a page that is not located in the
898@c cache. This is very useful for demonstrations as well.
899
900@node Proxies
901@section Proxies and Gatewaying
902
903@c fixme: check/document url-ns stuff
904@cindex proxy servers
905@cindex proxies
906@cindex environment variables
907@vindex HTTP_PROXY
908Proxy servers are commonly used to provide gateways through firewalls
909or as caches serving some more-or-less local network. Each protocol
910(HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
911conventionally configured commonly amongst different programs through
912environment variables of the form @code{@var{protocol}_proxy}, where
913@var{protocol} is one of the supported network protocols (@code{http},
914@code{ftp} etc.). The library recognizes such variables in either
915upper or lower case. Their values are of one of the forms:
916@itemize @bullet
917@item @code{@var{host}:@var{port}}
918@item A full URL;
919@item Simply a host name.
920@end itemize
921
922@vindex NO_PROXY
923The @code{NO_PROXY} environment variable specifies URLs that should be
924excluded from proxying (on servers that should be contacted directly).
925This should be a comma-separated list of hostnames, domain names, or a
926mixture of both. Asterisks can be used as wildcards, but other
927clients may not support that. Domain names may be indicated by a
928leading dot. For example:
929@example
930NO_PROXY="*.aventail.com,home.com,.seanet.com"
931@end example
932@noindent says to contact all machines in the @samp{aventail.com} and
933@samp{seanet.com} domains directly, as well as the machine named
934@samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
935and @code{no_proxy} are also tried, in that order.
936
937Proxies may also be specified directly in Lisp.
938
939@defopt url-proxy-services
940This variable is an alist of URL schemes and proxy servers that
941gateway them. The items are of the form @w{@code{(@var{scheme}
942. @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
943gatewayed through @var{portnumber} on the specified @var{host}. An
944exception is the pseudo scheme @code{"no_proxy"}, which is paired with
945a regexp matching host names not to be proxied. This variable is
946initialized from the environment as above.
947
948@example
949(setq url-proxy-services
950 '(("http" . "proxy.aventail.com:80")
951 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
952@end example
953@end defopt
954
955@node Gateways in general
956@section Gateways in General
957@cindex gateways
958@cindex firewalls
959
960The library provides a general gateway layer through which all
961networking passes. It can both control access to the network and
962provide access through gateways in firewalls. This may make direct
963connections in some cases and pass through some sort of gateway in
964others.@footnote{Proxies (which only operate over HTTP) are
965implemented using this.} The library's basic function responsible for
966making connections is @code{url-open-stream}.
967
968@defun url-open-stream name buffer host service
969@cindex opening a stream
970@cindex stream, opening
971Open a stream to @var{host}, possibly via a gateway. The other
972arguments are as for @code{open-network-stream}. This will not make a
973connection if @code{url-gateway-unplugged} is non-@code{nil}.
974@end defun
975
976@defvar url-gateway-local-host-regexp
977This is a regular expression that matches local hosts that do not
978require the use of a gateway. If @code{nil}, all connections are made
979through the gateway.
980@end defvar
981
982@defvar url-gateway-method
983This variable controls which gateway method is used. It may be useful
984to bind it temporarily in some applications. It has values taken from
985a list of symbols. Possible values are:
986
987@table @code
988@item telnet
989@cindex @command{telnet}
990Use this method if you must first telnet and log into a gateway host,
991and then run telnet from that host to connect to outside machines.
992
993@item rlogin
994@cindex @command{rlogin}
995This method is identical to @code{telnet}, but uses @command{rlogin}
996to log into the remote machine without having to send the username and
997password over the wire every time.
998
999@item socks
1000@cindex @sc{socks}
1001Use if the firewall has a @sc{socks} gateway running on it. The
1002@sc{socks} v5 protocol is defined in RFC 1928.
1003
1004@c @item ssl
1005@c This probably shouldn't be documented
1006@c Fixme: why not? -- fx
1007
1008@item native
1009This method uses Emacs's builtin networking directly. This is the
1010default. It can be used only if there is no firewall blocking access.
1011@end table
1012@end defvar
1013
1014The following variables control the gateway methods.
1015
1016@defopt url-gateway-telnet-host
1017The gateway host to telnet to. Once logged in there, you then telnet
1018out to the hosts you want to connect to.
1019@end defopt
1020@defopt url-gateway-telnet-parameters
1021This should be a list of parameters to pass to the @command{telnet} program.
1022@end defopt
1023@defopt url-gateway-telnet-password-prompt
1024This is a regular expression that matches the password prompt when
1025logging in.
1026@end defopt
1027@defopt url-gateway-telnet-login-prompt
1028This is a regular expression that matches the username prompt when
1029logging in.
1030@end defopt
1031@defopt url-gateway-telnet-user-name
1032The username to log in with.
1033@end defopt
1034@defopt url-gateway-telnet-password
1035The password to send when logging in.
1036@end defopt
1037@defopt url-gateway-prompt-pattern
1038This is a regular expression that matches the shell prompt.
1039@end defopt
1040
1041@defopt url-gateway-rlogin-host
1042Host to @samp{rlogin} to before telnetting out.
1043@end defopt
1044@defopt url-gateway-rlogin-parameters
1045Parameters to pass to @samp{rsh}.
1046@end defopt
1047@defopt url-gateway-rlogin-user-name
1048User name to use when logging in to the gateway.
1049@end defopt
1050@defopt url-gateway-prompt-pattern
1051This is a regular expression that matches the shell prompt.
1052@end defopt
1053
1054@defopt socks-server
1055This specifies the default server, it takes the form
1056@w{@code{("Default server" @var{server} @var{port} @var{version})}}
1057where @var{version} can be either 4 or 5.
1058@end defopt
1059@defvar socks-password
1060If this is @code{nil} then you will be asked for the password,
1061otherwise it will be used as the password for authenticating you to
1062the @sc{socks} server.
1063@end defvar
1064@defvar socks-username
1065This is the username to use when authenticating yourself to the
1066@sc{socks} server. By default this is your login name.
1067@end defvar
1068@defvar socks-timeout
1069This controls how long, in seconds, to wait for responses from the
1070@sc{socks} server; it is 5 by default.
1071@end defvar
1072@c fixme: these have been effectively commented-out in the code
1073@c @defopt socks-server-aliases
1074@c This a list of server aliases. It is a list of aliases of the form
1075@c @var{(alias hostname port version)}.
1076@c @end defopt
1077@c @defopt socks-network-aliases
1078@c This a list of network aliases. Each entry in the list takes the form
1079@c @var{(alias (network))} where @var{alias} is a string that names the
1080@c @var{network}. The networks can contain a pair (not a dotted pair) of
1081@c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1082@c address and a netmask, a domain name or a unique hostname or @sc{ip}
1083@c address.
1084@c @end defopt
1085@c @defopt socks-redirection-rules
1086@c This a list of redirection rules. Each rule take the form
1087@c @var{(Destination network Connection type)} where @var{Destination
1088@c network} is a network alias from @code{socks-network-aliases} and
1089@c @var{Connection type} can be @code{nil} in which case a direct
1090@c connection is used, or it can be an alias from
1091@c @code{socks-server-aliases} in which case that server is used as a
1092@c proxy.
1093@c @end defopt
1094@defopt socks-nslookup-program
1095@cindex @command{nslookup}
1096This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1097@end defopt
1098
1099@menu
1100* Suppressing network connections::
1101@end menu
1102@c * Broken hostname resolution::
1103
1104@node Suppressing network connections
1105@subsection Suppressing Network Connections
1106
1107@cindex network connections, suppressing
1108@cindex suppressing network connections
1109@cindex bugs, HTML
1110@cindex HTML `bugs'
1111In some circumstances it is desirable to suppress making network
1112connections. A typical case is when rendering HTML in a mail user
1113agent, when external URLs should not be activated, particularly to
bcd6cd23 1114avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
4009494e
GM
1115like. To arrange this, bind the following variable for the duration
1116of such processing.
1117
1118@defvar url-gateway-unplugged
1119If this variable is non-@code{nil} new network connections are never
1120opened by the URL library.
1121@end defvar
1122
1123@c @node Broken hostname resolution
1124@c @subsection Broken Hostname Resolution
1125
1126@c @cindex hostname resolver
1127@c @cindex resolver, hostname
1128@c Some C libraries do not include the hostname resolver routines in
1129@c their static libraries. If Emacs was linked statically, and was not
1130@c linked with the resolver libraries, it will not be able to get to any
1131@c machines off the local network. This is characterized by being able
1132@c to reach someplace with a raw ip number, but not its hostname
1133@c (@url{http://129.79.254.191/} works, but
1134@c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1135@c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1136@c rebuilt linked against the resolver library, it can use the external
1137@c @command{nslookup} program instead.
1138
1139@c @defopt url-gateway-broken-resolution
1140@c @cindex @code{nslookup} program
1141@c @cindex program, @code{nslookup}
1142@c If non-@code{nil}, this variable says to use the program specified by
1143@c @code{url-gateway-nslookup-program} program to do hostname resolution.
1144@c @end defopt
1145
1146@c @defopt url-gateway-nslookup-program
1147@c The name of the program to do hostname lookup if Emacs can't do it
1148@c directly. This program should expect a single argument on the command
1149@c line---the hostname to resolve---and should produce output similar to
1150@c the standard Unix @command{nslookup} program:
1151@c @example
1152@c Name: www.cs.indiana.edu
1153@c Address: 129.79.254.191
1154@c @end example
1155@c @end defopt
1156
1157@node History
1158@section History
1159
1160@findex url-do-setup
1161The library can maintain a global history list tracking URLs accessed.
1162URL completion can be done from it. The history mechanism is set up
1163automatically via @code{url-do-setup} when it is configured to be on.
1164Note that the size of the history list is currently not limited.
1165
1166@vindex url-history-hash-table
bcd6cd23 1167The history ``list'' is actually a hash table,
4009494e
GM
1168@code{url-history-hash-table}. It contains access times keyed by URL
1169strings. The times are in the format returned by @code{current-time}.
1170
1171@defun url-history-update-url url time
1172This function updates the history table with an entry for @var{url}
1173accessed at the given @var{time}.
1174@end defun
1175
1176@defopt url-history-track
1177If non-@code{nil}, the library will keep track of all the URLs
1178accessed. If it is @code{t}, the list is saved to disk at the end of
1179each Emacs session. The default is @code{nil}.
1180@end defopt
1181
1182@defopt url-history-file
1183The file storing the history list between sessions. It defaults to
1184@file{history} in @code{url-configuration-directory}.
1185@end defopt
1186
1187@defopt url-history-save-interval
1188@findex url-history-setup-save-timer
1189The number of seconds between automatic saves of the history list.
1190Default is one hour. Note that if you change this variable directly,
1191rather than using Custom, after @code{url-do-setup} has been run, you
1192need to run the function @code{url-history-setup-save-timer}.
1193@end defopt
1194
1195@defun url-history-parse-history &optional fname
1196Parses the history file @var{fname} (default @code{url-history-file})
1197and sets up the history list.
1198@end defun
1199
1200@defun url-history-save-history &optional fname
1201Saves the current history to file @var{fname} (default
1202@code{url-history-file}).
1203@end defun
1204
1205@defun url-completion-function string predicate function
1206You can use this function to do completion of URLs from the history.
1207@end defun
1208
1209@node Customization
1210@chapter Customization
1211
4009494e 1212@cindex environment variables
5b637222
CY
1213 The following environment variables affect the @code{url} library's
1214operation at startup.
4009494e
GM
1215
1216@table @code
1217@item TMPDIR
1218@vindex TMPDIR
1219@vindex url-temporary-directory
1220If this is defined, @var{url-temporary-directory} is initialized from
1221it.
1222@end table
1223
5b637222
CY
1224 The following user options affect the general operation of
1225@code{url} library.
1226
1227@defopt url-configuration-directory
1228@cindex configuration files
1229The value of this variable specifies the name of the directory where
1230the @code{url} library stores its various configuration files, cache
1231files, etc.
4009494e 1232
5b637222
CY
1233The default value specifies a subdirectory named @file{url/} in the
1234standard Emacs user data directory specified by the variable
1235@code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1236the old default was @file{~/.url}, and this directory is used instead
1237if it exists.
1238@end defopt
4009494e
GM
1239
1240@defopt url-debug
1241@cindex debugging
da0bbbc4 1242Specifies the types of debug messages which are logged to
4009494e
GM
1243the @code{*URL-DEBUG*} buffer.
1244@code{t} means log all messages.
1245A number means log all messages and show them with @code{message}.
da0bbbc4 1246It may also be a list of the types of messages to be logged.
4009494e
GM
1247@end defopt
1248@defopt url-personal-mail-address
1249@end defopt
1250@defopt url-privacy-level
1251@end defopt
1252@defopt url-uncompressor-alist
1253@end defopt
1254@defopt url-passwd-entry-func
1255@end defopt
1256@defopt url-standalone-mode
1257@end defopt
1258@defopt url-bad-port-list
1259@end defopt
1260@defopt url-max-password-attempts
1261@end defopt
1262@defopt url-temporary-directory
1263@end defopt
1264@defopt url-show-status
1265@end defopt
1266@defopt url-confirmation-func
1267The function to use for asking yes or no functions. This is normally
1268either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1269function taking a single argument (the prompt) and returning @code{t}
1270only if an affirmative answer is given.
1271@end defopt
1272@defopt url-gateway-method
1273@c fixme: describe gatewaying
1274A symbol specifying the type of gateway support to use for connections
1275from the local machine. The supported methods are:
1276
1277@table @code
1278@item telnet
1279Run telnet in a subprocess to connect;
1280@item rlogin
1281Rlogin to another machine to connect;
1282@item socks
1283Connect through a socks server;
1284@item ssl
1285Connect with SSL;
1286@item native
1287Connect directly.
1288@end table
1289@end defopt
1290
1291@node GNU Free Documentation License
1292@appendix GNU Free Documentation License
1293@include doclicense.texi
1294
1295@node Function Index
1296@unnumbered Command and Function Index
1297@printindex fn
1298
1299@node Variable Index
1300@unnumbered Variable Index
1301@printindex vr
1302
1303@node Concept Index
1304@unnumbered Concept Index
1305@printindex cp
1306
4009494e 1307@bye