Commit | Line | Data |
---|---|---|
e015f748 CE |
1 | The Webalizer - A log file analysis program -- DNS information |
2 | ||
3 | The webalizer has the ability to perform reverse DNS lookups, and | |
4 | fully supports both IPv4 and IPv6 addressing schemes. This document | |
5 | attempts to explain how it works, and some things that you should be | |
6 | aware of when using the DNS lookup features. | |
7 | ||
8 | Note: The Reverse DNS feature may be enabled or disabled at compile | |
9 | time. DNS lookup code is enabled by default. You can run The | |
10 | Webalizer using the '-vV' command line options to determine what | |
11 | options are enabled in the version you are using. | |
12 | ||
13 | ||
14 | How it works | |
15 | ------------ | |
16 | ||
17 | DNS lookups are made against a DNS cache file containing IP addresses | |
18 | and resolved names. If the IP address is not found in the cache file, | |
19 | it will be left as an IP address. In order for this to happen, a | |
20 | cache file MUST be specified when the Webalizer is run, either using | |
21 | the '-D' command line switch, or a "DNSCache" configuration file | |
22 | keyword. If no cache file is specified, no attempts to perform DNS | |
23 | lookups will be done. The cache file can be made three different ways. | |
24 | ||
25 | 1) You can have the Webalizer pre-process the specified log file at | |
26 | run-time, creating the cache file before processing the log file | |
27 | normally. This is done by setting the number of DNS Children | |
28 | processes to run, either by using the '-N' command line switch or | |
29 | the "DNSChildren" configuration keyword. This will cause the | |
30 | Webalizer to spawn the specified number of processes which will | |
31 | be used to do reverse DNS lookups.. generally, a larger number | |
32 | of processes will result in faster resolution of the log, however | |
33 | if set too high may cause overall system degradation. A setting | |
34 | of between 5 and 20 should be acceptable, and there is a maximum | |
35 | limit of 100. If used, a cache filename MUST be specified also, | |
36 | using either the '-D' command line switch, or the "DNSCache" | |
37 | configuration keyword. Using this method, normal processing will | |
38 | continue only after all IP addresses have been processed, and the | |
39 | cache file is created/updated. | |
40 | ||
41 | 2) You can pre-process the log file as a standalone process, creating | |
42 | the cache file that will be used later by the Webalizer. This is | |
43 | done by running the Webalizer with a name of 'webazolver' (ie: the | |
44 | name 'webazolver' is a symbolic link to 'webalizer') and specifying | |
45 | the cache filename (either with '-D' or DNSCache). If the number | |
46 | of child processes is not given, the default of 5 will be used. In | |
47 | this mode, the log will be read and processed, creating a DNS cache | |
48 | file or updating an existing one, and the program will then exit | |
49 | without any further processing. | |
50 | ||
51 | 3) You can use The Webalizer (DNS) Cache file Manager program 'wcmgr' | |
52 | to create and manipulate a cache file. A blank cache file can be | |
53 | created which would be later populated, or data for the cache file | |
54 | can be imported using tab delimited text files. See the wcmgr(1) | |
55 | man page for usage information. | |
56 | ||
57 | ||
58 | Run-time DNS cache file creation/update | |
59 | --------------------------------------- | |
60 | ||
61 | The creation/update of a DNS cache file at run-time occurs as follows: | |
62 | ||
63 | 1) The log file is read, creating a list of all IP addresses that are | |
64 | not already cached (or cached but expired) and need to be resolved. | |
65 | Addresses are expired based on the TTL value specified using the | |
66 | 'CacheTTL' configuration option or after 7 days (default) if no TTL | |
67 | is specified. | |
68 | ||
69 | 2) The specified number of children processes are forked, and are used | |
70 | to perform DNS lookups. | |
71 | ||
72 | 3) Each IP address is given, one at a time, to the next available child | |
73 | process until all IP addresses have been processed. Each child will | |
74 | update the cache file when a result is returned. This may be either | |
75 | a resolved name or a failed lookup, in which case the address will be | |
76 | left unresolved. Unresolved addresses are not normally cached, but | |
77 | can be, if enabled using the 'CacheIPs' configuration file keyword. | |
78 | ||
79 | 4) Once all IP addresses have been processed and the cache file updated, | |
80 | the Webalizer will process the log normally. Each record it finds | |
81 | that has an unresolved IP address will be looked up in the cache file | |
82 | to see if a hostname is available (ie: was previously found). | |
83 | ||
84 | Because there may be a significant amount of time between the initial | |
85 | unresolved IP list and normal processing, the Webalizer should not be | |
86 | run against live log files (ie: a log file that is actively being written | |
87 | to by a server), otherwise there may be additional records present that | |
88 | were not resolved. | |
89 | ||
90 | ||
91 | Stand-Alone DNS cache file creation/update | |
92 | ------------------------------------------ | |
93 | ||
94 | The creation/update of the DNS cache file, when run in stand-alone mode, | |
95 | occurs as follows: | |
96 | ||
97 | 1) The log file is read, creating a list of all IP addresses that are | |
98 | not already cached (or cached but expired) and need to be resolved. | |
99 | ||
100 | 2) The specified number of children processes are forked, and are used | |
101 | to perform DNS lookups. If the number of processes was not specified, | |
102 | the default of 5 will be used. | |
103 | ||
104 | 3) Each IP address is given, one at a time, to the next available child | |
105 | process until all IP addresses have been processed. Each child will | |
106 | update the cache file when a result is returned. | |
107 | ||
108 | 4) Once all IP addresses have been processed and the cache file updated, | |
109 | the program will terminate without any further processing. | |
110 | ||
111 | ||
112 | Larger sites may prefer to use a stand-alone process to create the DNS | |
113 | cache file, and then run the Webalizer against the cache file. This | |
114 | allows a single cache file to be used for many virtual hosts, and reduces | |
115 | the processing needed if many sites are being processed. The Webalizer | |
116 | can be used in stand alone mode by running it as 'webazolver'. When | |
117 | run in this fashion, it will only create the cache file and then exit | |
118 | without any further processing. A cache filename MUST be specified, | |
119 | however unlike when running the Webalizer normally, the number of child | |
120 | processes does not have to be given (will default to 5). All normal | |
121 | configuration and command line options are recognized, however, many | |
122 | of them will simply be ignored.. this allows the use of a standard | |
123 | configuration file for both normal use and stand alone use. | |
124 | ||
125 | ||
126 | Examples: | |
127 | --------- | |
128 | ||
129 | webalizer -c test.conf -N 10 -D dns_cache.db /var/log/my_www_log | |
130 | ||
131 | This will use the configuration file 'test.conf' to obtain normal | |
132 | configuration options such as hostname and output directory.. it | |
133 | will then either create or update the file 'dns_cache.db' in the | |
134 | default output directory (using 10 child processes) based on the | |
135 | IP addresses it finds in the log /var/lib/my_www_log, and then | |
136 | process that log file normally. | |
137 | ||
138 | ||
139 | webalizer -o out -D dns_cache.db /var/log/my_www_log | |
140 | ||
141 | This will process the log file /var/log/my_www_log, resolving IP | |
142 | addresses from the cache file 'dns_cache.db' found in the default | |
143 | output directory "out". The cache file must be present as it will | |
144 | not be created with this command. | |
145 | ||
146 | ||
147 | for i in /var/log/*/access_log; do | |
148 | webazolver -N 20 -D /var/lib/dns_cache.db $i | |
149 | done | |
150 | ||
151 | The above is an example of how to run through multiple log files | |
152 | creating a single DNS cache file.. this might be typically used on | |
153 | a larger site that has many virtual hosts, all keeping their log | |
154 | files in a separate directory. It will process each access_log it | |
155 | finds in /var/log/* and create a cache file (var/lib/dns_cache.db). | |
156 | This cache file can then be used to process the logs normally with | |
157 | with the Webalizer in a read-only fashion (see next example). | |
158 | ||
159 | ||
160 | for i in /etc/webalizer/*.conf; do webalizer -c $i -D /etc/cache.db; done | |
161 | ||
162 | This will process each configuration file found in /etc/webalizer, | |
163 | using the DNS cache file /etc/cache.db. This will also typically be | |
164 | used on a larger site with multiple hosts.. Each configuration file | |
165 | will specify a site specific log file, hostname, output directory, etc. | |
166 | The cache file used will typically be created using a command similar | |
167 | to the one previous to this example. | |
168 | ||
169 | ||
170 | Cache File Maintenance | |
171 | ---------------------- | |
172 | ||
173 | The Webalizer DNS cache files generally require very little or no | |
174 | special attention. There are times though when some maintenance | |
175 | is required, such as occasional purging of very old cache entries. | |
176 | The Webalizer never removes a record once it's inserted into the | |
177 | cache. If a record expires based on its timestamp, the next time | |
178 | that address is seen in a log, its name is looked up again and the | |
179 | timestamp is updated. However, there will always be addresses that | |
180 | are never seen again, which will cause the cache files to continue | |
181 | to grow in size over time. On extremely busy sites or sites that | |
182 | attract many one time visitors, the cache file may grow extremely | |
183 | large, yet only contain a small amount of valid entries. Using | |
184 | The Webalizer (DNS) Cache file Manager ('wcmgr'), cache files can | |
185 | be purged, removing expired entries and shrinking the file size. | |
186 | A TTL (time to live) value can be specified, so the length of time | |
187 | an entry remains in the cache can be varied depending on individual | |
188 | site requirements. In addition to purging cache files, 'wcmgr' can | |
189 | also be used to list cache file contents, import/export cache data, | |
190 | lookup/add/delete individual entries and gather overall statistics | |
191 | regarding the cache file (number of records, number expired, etc..). | |
192 | ||
193 | To purge a cache file using 'wcmgr', an example command would be: | |
194 | ||
195 | wcmgr -p31 /path/to/dns.cache | |
196 | ||
197 | This would purge the 'dns.cache' cache file of any records that are | |
198 | over 31 days old, and would reclaim the space that those records | |
199 | were using in the file. If you would like to see the records that | |
200 | get purged, adding the command line option '-v' (verbose) will cause | |
201 | the program to print each entry and its age as they are removed. | |
202 | You can also use the 'wcmgr' to display statistics on cache files | |
203 | to aid in determining when a cache file should be purged. See the | |
204 | 'wcmgr' man page (wcmgr.1) for additional information on the various | |
205 | options available. | |
206 | ||
207 | ||
208 | Stupid Cache Tricks | |
209 | ------------------- | |
210 | ||
211 | The DNS cache files used by The Webalizer allow for efficient IP address | |
212 | to name translations. Resolved names are normally generated by using an | |
213 | existing DNS name server to query the address, either locally or over | |
214 | the Internet. However, using The Webalizer (DNS) Cache file Manager, | |
215 | almost any IP address to Name translation can be included in the cache. | |
216 | One such example would be for mapping local network addresses to real | |
217 | names, even though those addresses may not have real DNS entries on the | |
218 | network (or may be 'local' addresses prohibited from use on the Internet). | |
219 | A simple tab delimited text file can be created and imported into a cache | |
220 | for use by The Webalizer, which will then be used to convert the local | |
221 | IP addresses to real names. Additional configuration options for The | |
222 | Webalizer can then be used as would be normally. For example, consider | |
223 | a small business with 10 computers and a DSL router to the Internet. | |
224 | Each machine on the local network would use a private IP address that | |
225 | would not be resolved using an external (public) DNS server, so would | |
226 | always be reported by The Webalizer as 'unknown/unresolved'. A simple | |
227 | cache file could be created to map those unresolved addresses into more | |
228 | meaningful names, which could then be further processed by the Webalizer. | |
229 | An example might look something like: | |
230 | ||
231 | # Local machines | |
232 | 192.168.123.254 0 0 gw.widgetsareus.lan | |
233 | 192.168.123.253 0 0 mail.widgetsareus.lan | |
234 | 192.168.123.250 0 0 sales.widgetsareus.lan | |
235 | 192.168.123.240 0 0 service.widgetsareus.lan | |
236 | 192.168.123.237 0 0 mgr.widgetsareus.lan | |
237 | 192.168.123.235 0 0 support1.widgetsareus.lan | |
238 | 192.168.123.234 0 0 support2.widgetsareus.lan | |
239 | 192.168.123.232 0 0 pres.widgetsareus.lan | |
240 | 192.168.123.230 0 0 vp.widgetsareus.lan | |
241 | 192.168.123.225 0 0 reception.widgetsareus.lan | |
242 | 192.168.123.224 0 0 finance.widgetsareus.lan | |
243 | 127.0.0.1 0 1 127.0.0.1 | |
244 | ||
245 | ||
246 | There are a couple of things here that should be noted. The first | |
247 | is that the timestamps (first zero on each line above) are set to | |
248 | zero. This tells The Webalizer that these cached entries are to | |
249 | be considered 'permanent', and should never be expired (infinite | |
250 | TTL or time to live). The second thing to note is that the resolved | |
251 | names are using a non-standard TLD (top level domain) of '.lan'. | |
252 | The Webalizer will map this special TLD to mean "Local Network" in | |
253 | its reports, which allows local traffic to be grouped separately | |
254 | from normal Internet traffic. Lastly, you may notice that the | |
255 | last line of the file contains an entry with the same IP address | |
256 | where a name should be. This entry will prevent the Webalizer | |
257 | from ever trying to lookup 127.0.0.1, which is the 'localhost' | |
258 | address, when it is found in a log. The second number after the IP | |
259 | address (1) tells the Webalizer that it is an unresolved entry, not | |
260 | a resolved hostname (ie: has no name). Entries such as this one can | |
261 | be used to reduce DNS lookups on addresses that are known not to | |
262 | resolve. | |
263 | ||
264 | ||
265 | Considerations | |
266 | -------------- | |
267 | ||
268 | Processing of live log files is discouraged, as the chances of log records | |
269 | being written between the time of DNS resolution and normal processing will | |
270 | cause problems. | |
271 | ||
272 | If you are using STDIN for the input stream (log file) and have run-time | |
273 | DNS cache file creation/update enabled.. the program will exit after the | |
274 | cache file has been created/updated and no output will be produced. If | |
275 | you must use STDIN for the input log, you will need to process the stream | |
276 | twice, once to create/update the cache file, and again to produce the | |
277 | reports. The reason for this is that stream inputs from STDIN cannot | |
278 | be 'rewound' to the beginning like files can, so must be given twice. | |
279 | ||
280 | Cached DNS addresses have a default TTL (time to live) of 7 days. This | |
281 | may now be changed using the CacheTTL config file keyword to any value | |
282 | from 1 to 100 (days). You may also now specify if unresolved addresses | |
283 | should be stored in the DNS cache. Normally, unresolved IP addresses | |
284 | are NOT saved in the cache and are looked up each time the program is | |
285 | run. | |
286 | ||
287 | There is an absolute maximum of 100 child processes that may be created, | |
288 | however the actual number of children should be significantly less than | |
289 | the maximum.. typical usage should be between 5 and 20. | |
290 | ||
291 | Special thanks to Henning P. Schmiedehausen <hps@tanstaafl.de> for the | |
292 | original dns-resolver code he submitted, which was the basis for this | |
293 | implementation. Also thanks to Jose Carlos Medeiros for the inital IPv6 | |
294 | support code. | |
295 |