Imported Upstream version 2.23.05
[hcoop/zz_old/debian/webalizer.git] / DNS.README
1 The Webalizer - A log file analysis program -- DNS information
2
3 The webalizer has the ability to perform reverse DNS lookups, and
4 fully supports both IPv4 and IPv6 addressing schemes. This document
5 attempts to explain how it works, and some things that you should be
6 aware of when using the DNS lookup features.
7
8 Note: The Reverse DNS feature may be enabled or disabled at compile
9 time. DNS lookup code is enabled by default. You can run The
10 Webalizer using the '-vV' command line options to determine what
11 options are enabled in the version you are using.
12
13
14 How it works
15 ------------
16
17 DNS lookups are made against a DNS cache file containing IP addresses
18 and resolved names. If the IP address is not found in the cache file,
19 it will be left as an IP address. In order for this to happen, a
20 cache file MUST be specified when the Webalizer is run, either using
21 the '-D' command line switch, or a "DNSCache" configuration file
22 keyword. If no cache file is specified, no attempts to perform DNS
23 lookups will be done. The cache file can be made three different ways.
24
25 1) You can have the Webalizer pre-process the specified log file at
26 run-time, creating the cache file before processing the log file
27 normally. This is done by setting the number of DNS Children
28 processes to run, either by using the '-N' command line switch or
29 the "DNSChildren" configuration keyword. This will cause the
30 Webalizer to spawn the specified number of processes which will
31 be used to do reverse DNS lookups.. generally, a larger number
32 of processes will result in faster resolution of the log, however
33 if set too high may cause overall system degradation. A setting
34 of between 5 and 20 should be acceptable, and there is a maximum
35 limit of 100. If used, a cache filename MUST be specified also,
36 using either the '-D' command line switch, or the "DNSCache"
37 configuration keyword. Using this method, normal processing will
38 continue only after all IP addresses have been processed, and the
39 cache file is created/updated.
40
41 2) You can pre-process the log file as a standalone process, creating
42 the cache file that will be used later by the Webalizer. This is
43 done by running the Webalizer with a name of 'webazolver' (ie: the
44 name 'webazolver' is a symbolic link to 'webalizer') and specifying
45 the cache filename (either with '-D' or DNSCache). If the number
46 of child processes is not given, the default of 5 will be used. In
47 this mode, the log will be read and processed, creating a DNS cache
48 file or updating an existing one, and the program will then exit
49 without any further processing.
50
51 3) You can use The Webalizer (DNS) Cache file Manager program 'wcmgr'
52 to create and manipulate a cache file. A blank cache file can be
53 created which would be later populated, or data for the cache file
54 can be imported using tab delimited text files. See the wcmgr(1)
55 man page for usage information.
56
57
58 Run-time DNS cache file creation/update
59 ---------------------------------------
60
61 The creation/update of a DNS cache file at run-time occurs as follows:
62
63 1) The log file is read, creating a list of all IP addresses that are
64 not already cached (or cached but expired) and need to be resolved.
65 Addresses are expired based on the TTL value specified using the
66 'CacheTTL' configuration option or after 7 days (default) if no TTL
67 is specified.
68
69 2) The specified number of children processes are forked, and are used
70 to perform DNS lookups.
71
72 3) Each IP address is given, one at a time, to the next available child
73 process until all IP addresses have been processed. Each child will
74 update the cache file when a result is returned. This may be either
75 a resolved name or a failed lookup, in which case the address will be
76 left unresolved. Unresolved addresses are not normally cached, but
77 can be, if enabled using the 'CacheIPs' configuration file keyword.
78
79 4) Once all IP addresses have been processed and the cache file updated,
80 the Webalizer will process the log normally. Each record it finds
81 that has an unresolved IP address will be looked up in the cache file
82 to see if a hostname is available (ie: was previously found).
83
84 Because there may be a significant amount of time between the initial
85 unresolved IP list and normal processing, the Webalizer should not be
86 run against live log files (ie: a log file that is actively being written
87 to by a server), otherwise there may be additional records present that
88 were not resolved.
89
90
91 Stand-Alone DNS cache file creation/update
92 ------------------------------------------
93
94 The creation/update of the DNS cache file, when run in stand-alone mode,
95 occurs as follows:
96
97 1) The log file is read, creating a list of all IP addresses that are
98 not already cached (or cached but expired) and need to be resolved.
99
100 2) The specified number of children processes are forked, and are used
101 to perform DNS lookups. If the number of processes was not specified,
102 the default of 5 will be used.
103
104 3) Each IP address is given, one at a time, to the next available child
105 process until all IP addresses have been processed. Each child will
106 update the cache file when a result is returned.
107
108 4) Once all IP addresses have been processed and the cache file updated,
109 the program will terminate without any further processing.
110
111
112 Larger sites may prefer to use a stand-alone process to create the DNS
113 cache file, and then run the Webalizer against the cache file. This
114 allows a single cache file to be used for many virtual hosts, and reduces
115 the processing needed if many sites are being processed. The Webalizer
116 can be used in stand alone mode by running it as 'webazolver'. When
117 run in this fashion, it will only create the cache file and then exit
118 without any further processing. A cache filename MUST be specified,
119 however unlike when running the Webalizer normally, the number of child
120 processes does not have to be given (will default to 5). All normal
121 configuration and command line options are recognized, however, many
122 of them will simply be ignored.. this allows the use of a standard
123 configuration file for both normal use and stand alone use.
124
125
126 Examples:
127 ---------
128
129 webalizer -c test.conf -N 10 -D dns_cache.db /var/log/my_www_log
130
131 This will use the configuration file 'test.conf' to obtain normal
132 configuration options such as hostname and output directory.. it
133 will then either create or update the file 'dns_cache.db' in the
134 default output directory (using 10 child processes) based on the
135 IP addresses it finds in the log /var/lib/my_www_log, and then
136 process that log file normally.
137
138
139 webalizer -o out -D dns_cache.db /var/log/my_www_log
140
141 This will process the log file /var/log/my_www_log, resolving IP
142 addresses from the cache file 'dns_cache.db' found in the default
143 output directory "out". The cache file must be present as it will
144 not be created with this command.
145
146
147 for i in /var/log/*/access_log; do
148 webazolver -N 20 -D /var/lib/dns_cache.db $i
149 done
150
151 The above is an example of how to run through multiple log files
152 creating a single DNS cache file.. this might be typically used on
153 a larger site that has many virtual hosts, all keeping their log
154 files in a separate directory. It will process each access_log it
155 finds in /var/log/* and create a cache file (var/lib/dns_cache.db).
156 This cache file can then be used to process the logs normally with
157 with the Webalizer in a read-only fashion (see next example).
158
159
160 for i in /etc/webalizer/*.conf; do webalizer -c $i -D /etc/cache.db; done
161
162 This will process each configuration file found in /etc/webalizer,
163 using the DNS cache file /etc/cache.db. This will also typically be
164 used on a larger site with multiple hosts.. Each configuration file
165 will specify a site specific log file, hostname, output directory, etc.
166 The cache file used will typically be created using a command similar
167 to the one previous to this example.
168
169
170 Cache File Maintenance
171 ----------------------
172
173 The Webalizer DNS cache files generally require very little or no
174 special attention. There are times though when some maintenance
175 is required, such as occasional purging of very old cache entries.
176 The Webalizer never removes a record once it's inserted into the
177 cache. If a record expires based on its timestamp, the next time
178 that address is seen in a log, its name is looked up again and the
179 timestamp is updated. However, there will always be addresses that
180 are never seen again, which will cause the cache files to continue
181 to grow in size over time. On extremely busy sites or sites that
182 attract many one time visitors, the cache file may grow extremely
183 large, yet only contain a small amount of valid entries. Using
184 The Webalizer (DNS) Cache file Manager ('wcmgr'), cache files can
185 be purged, removing expired entries and shrinking the file size.
186 A TTL (time to live) value can be specified, so the length of time
187 an entry remains in the cache can be varied depending on individual
188 site requirements. In addition to purging cache files, 'wcmgr' can
189 also be used to list cache file contents, import/export cache data,
190 lookup/add/delete individual entries and gather overall statistics
191 regarding the cache file (number of records, number expired, etc..).
192
193 To purge a cache file using 'wcmgr', an example command would be:
194
195 wcmgr -p31 /path/to/dns.cache
196
197 This would purge the 'dns.cache' cache file of any records that are
198 over 31 days old, and would reclaim the space that those records
199 were using in the file. If you would like to see the records that
200 get purged, adding the command line option '-v' (verbose) will cause
201 the program to print each entry and its age as they are removed.
202 You can also use the 'wcmgr' to display statistics on cache files
203 to aid in determining when a cache file should be purged. See the
204 'wcmgr' man page (wcmgr.1) for additional information on the various
205 options available.
206
207
208 Stupid Cache Tricks
209 -------------------
210
211 The DNS cache files used by The Webalizer allow for efficient IP address
212 to name translations. Resolved names are normally generated by using an
213 existing DNS name server to query the address, either locally or over
214 the Internet. However, using The Webalizer (DNS) Cache file Manager,
215 almost any IP address to Name translation can be included in the cache.
216 One such example would be for mapping local network addresses to real
217 names, even though those addresses may not have real DNS entries on the
218 network (or may be 'local' addresses prohibited from use on the Internet).
219 A simple tab delimited text file can be created and imported into a cache
220 for use by The Webalizer, which will then be used to convert the local
221 IP addresses to real names. Additional configuration options for The
222 Webalizer can then be used as would be normally. For example, consider
223 a small business with 10 computers and a DSL router to the Internet.
224 Each machine on the local network would use a private IP address that
225 would not be resolved using an external (public) DNS server, so would
226 always be reported by The Webalizer as 'unknown/unresolved'. A simple
227 cache file could be created to map those unresolved addresses into more
228 meaningful names, which could then be further processed by the Webalizer.
229 An example might look something like:
230
231 # Local machines
232 192.168.123.254 0 0 gw.widgetsareus.lan
233 192.168.123.253 0 0 mail.widgetsareus.lan
234 192.168.123.250 0 0 sales.widgetsareus.lan
235 192.168.123.240 0 0 service.widgetsareus.lan
236 192.168.123.237 0 0 mgr.widgetsareus.lan
237 192.168.123.235 0 0 support1.widgetsareus.lan
238 192.168.123.234 0 0 support2.widgetsareus.lan
239 192.168.123.232 0 0 pres.widgetsareus.lan
240 192.168.123.230 0 0 vp.widgetsareus.lan
241 192.168.123.225 0 0 reception.widgetsareus.lan
242 192.168.123.224 0 0 finance.widgetsareus.lan
243 127.0.0.1 0 1 127.0.0.1
244
245
246 There are a couple of things here that should be noted. The first
247 is that the timestamps (first zero on each line above) are set to
248 zero. This tells The Webalizer that these cached entries are to
249 be considered 'permanent', and should never be expired (infinite
250 TTL or time to live). The second thing to note is that the resolved
251 names are using a non-standard TLD (top level domain) of '.lan'.
252 The Webalizer will map this special TLD to mean "Local Network" in
253 its reports, which allows local traffic to be grouped separately
254 from normal Internet traffic. Lastly, you may notice that the
255 last line of the file contains an entry with the same IP address
256 where a name should be. This entry will prevent the Webalizer
257 from ever trying to lookup 127.0.0.1, which is the 'localhost'
258 address, when it is found in a log. The second number after the IP
259 address (1) tells the Webalizer that it is an unresolved entry, not
260 a resolved hostname (ie: has no name). Entries such as this one can
261 be used to reduce DNS lookups on addresses that are known not to
262 resolve.
263
264
265 Considerations
266 --------------
267
268 Processing of live log files is discouraged, as the chances of log records
269 being written between the time of DNS resolution and normal processing will
270 cause problems.
271
272 If you are using STDIN for the input stream (log file) and have run-time
273 DNS cache file creation/update enabled.. the program will exit after the
274 cache file has been created/updated and no output will be produced. If
275 you must use STDIN for the input log, you will need to process the stream
276 twice, once to create/update the cache file, and again to produce the
277 reports. The reason for this is that stream inputs from STDIN cannot
278 be 'rewound' to the beginning like files can, so must be given twice.
279
280 Cached DNS addresses have a default TTL (time to live) of 7 days. This
281 may now be changed using the CacheTTL config file keyword to any value
282 from 1 to 100 (days). You may also now specify if unresolved addresses
283 should be stored in the DNS cache. Normally, unresolved IP addresses
284 are NOT saved in the cache and are looked up each time the program is
285 run.
286
287 There is an absolute maximum of 100 child processes that may be created,
288 however the actual number of children should be significantly less than
289 the maximum.. typical usage should be between 5 and 20.
290
291 Special thanks to Henning P. Schmiedehausen <hps@tanstaafl.de> for the
292 original dns-resolver code he submitted, which was the basis for this
293 implementation. Also thanks to Jose Carlos Medeiros for the inital IPv6
294 support code.
295