| 1 | The Webalizer - A web server log file analysis tool |
| 2 | Copyright 1997-2011 by Bradford L. Barrett |
| 3 | |
| 4 | Distributed under the GNU GPL. See the files "COPYING" and |
| 5 | "Copyright" supplied with the distribution for additional info. |
| 6 | |
| 7 | |
| 8 | What is The Webalizer? |
| 9 | ---------------------- |
| 10 | |
| 11 | The Webalizer is a web server log file analysis program which produces |
| 12 | usage statistics in HTML format for viewing with a browser. The results |
| 13 | are presented in both columnar and graphical format, which facilitates |
| 14 | interpretation. Yearly, monthly, daily and hourly usage statistics are |
| 15 | presented, along with the ability to display usage by site, URL, referrer, |
| 16 | user agent (browser), search string, entry/exit page, username and country |
| 17 | (some information is only available if supported and present in the log |
| 18 | files being processed). Processed data may also be exported into most |
| 19 | database and spreadsheet programs that support tab delimited data formats. |
| 20 | |
| 21 | The Webalizer supports CLF (common log format) log files, as well as |
| 22 | Combined log formats as defined by NCSA and others, and variations |
| 23 | of these which it attempts to handle intelligently. In addition, The |
| 24 | Webalizer supports wu-ftpd xferlog (FTP) formatted logs, squid proxy logs |
| 25 | and W3C extended format logs. |
| 26 | |
| 27 | Gzip compressed logs may be used as input directly. Any log filename |
| 28 | that ends with a '.gz' extension will be assumed to be in gzip format and |
| 29 | uncompressed on the fly as it is being read. The Webalizer now also has |
| 30 | the ability to handle BZip2 compressed logs, if enabled at compile time. |
| 31 | Similar to gzipped logs, any log filename that ends with a '.bz2' will be |
| 32 | assumed to be in bzip2 format and uncompressed on the fly as it is being |
| 33 | read. |
| 34 | |
| 35 | For sites that do not enable hostname lookups (DNS resolution) on their |
| 36 | web servers (and have only IP addresses in their logs), The Webalizer |
| 37 | provides its own internal DNS lookup capability as well as geolocation |
| 38 | services (GeoDB). The optional GeoIP library from MaxMind Inc. is also |
| 39 | supported and may be used instead of the native GeoDB database. |
| 40 | |
| 41 | A utility program, "The Webalizer (DNS) Cache file Manager", or 'wcmgr' |
| 42 | is also provided which allows the creation and manipulation of the DNS |
| 43 | cache files used and produced by the webalizer. See the file DNS.README |
| 44 | for additional information regarding DNS support. |
| 45 | |
| 46 | This documentation applies to The Webalizer Version 2.21 |
| 47 | |
| 48 | Running the Webalizer |
| 49 | --------------------- |
| 50 | |
| 51 | The Webalizer was designed to be run from a Unix command line prompt or |
| 52 | as a cron job. There are several command line options which will modify |
| 53 | the results it produces, and configuration files can be used as well. |
| 54 | The format of the command line is: |
| 55 | |
| 56 | webalizer [options ...] [log-file] |
| 57 | |
| 58 | Where 'options' can be one or more of the supported command line |
| 59 | switches described below. 'log-file' is the name of the log file |
| 60 | to process (see below for more detailed information). If a dash |
| 61 | ("-") is specified for the log-file name, STDIN will be used. |
| 62 | |
| 63 | |
| 64 | Once executed, the general flow of the program follows: |
| 65 | |
| 66 | o A default configuration file is scanned for. A file named |
| 67 | 'webalizer.conf' is searched for in the current directory, and if |
| 68 | found, its configuration data is parsed. If the file is not |
| 69 | present in the current directory, the file '/etc/webalizer.conf' |
| 70 | is searched for and, if found, is used instead. |
| 71 | |
| 72 | o Any command line arguments given to the program are parsed. This |
| 73 | may include the specification of a configuration file, which is |
| 74 | processed at the time it is encountered. |
| 75 | |
| 76 | o If a log file was specified, it is opened and made ready for |
| 77 | processing. If no log file was given, or the filename '-' is |
| 78 | specified on the command line, STDIN is used for input. |
| 79 | |
| 80 | o If an output directory was specified, the program does a 'chdir' to |
| 81 | that directory in preparation for generating output. If no output |
| 82 | directory was given, the current directory is used. |
| 83 | |
| 84 | o If a non-zero number of DNS Children processes were specified, they |
| 85 | will be started, and the specified log file will be processed, |
| 86 | either creating or updating the specified DNS cache file. |
| 87 | |
| 88 | o If no hostname was given, the program attempts to get the hostname |
| 89 | using a uname system call. If that fails, 'localhost' is used. |
| 90 | |
| 91 | o A history file is searched for. This file keeps previous month |
| 92 | totals used on the main index.html page. The default file is |
| 93 | named 'webalizer.hist', kept in the specified output directory, |
| 94 | however may be changed using the "HistoryName" configuration file |
| 95 | keyword. |
| 96 | |
| 97 | o If incremental processing was specified, a data file is searched for |
| 98 | and loaded if found, containing the 'internal state' data of the |
| 99 | program at the end of a previous run. The default file is named |
| 100 | 'webalizer.current', kept in the specified output directory, however |
| 101 | may be changed using the "IncrementalName" configuration file keyword. |
| 102 | |
| 103 | o Main processing begins on the log file. If the log spans multiple |
| 104 | months, a separate HTML document is created for each month. |
| 105 | |
| 106 | o After main processing, the main 'index.html' page is created, which |
| 107 | has totals by month and links to each months HTML document. |
| 108 | |
| 109 | o A new history file is saved to disk, which includes totals generated |
| 110 | by The Webalizer during the current run. |
| 111 | |
| 112 | o If incremental processing was specified, a data file is written that |
| 113 | contains the 'internal state' data at the end of this run. |
| 114 | |
| 115 | |
| 116 | Incremental Processing |
| 117 | ---------------------- |
| 118 | |
| 119 | Version 1.2x of The Webalizer adds incremental run capability. Simply |
| 120 | put, this allows processing large log files by breaking them up into |
| 121 | smaller pieces, and processing these pieces instead. What this means |
| 122 | in real terms is that you can now rotate your log files as often as you |
| 123 | want, and still be able to produce monthly usage statistics without the |
| 124 | loss of any detail. This is accomplished by saving and restoring all |
| 125 | relevant internal data to a disk file between runs. Doing so allows the |
| 126 | program to 'start where it left off' so to speak, and allows the |
| 127 | preservation of detail from one run to the next. |
| 128 | |
| 129 | Some special precautions need to be taken when using the incremental |
| 130 | run capability of The Webalizer. Configuration options should not be |
| 131 | changed between runs, as that could cause corruption of the internal |
| 132 | stored data. For example, changing the MangleAgents level will cause |
| 133 | different representations of user agents to be stored, producing invalid |
| 134 | results in the user agents section of the report. If you need to change |
| 135 | configuration options, do it at the end of the month after normal |
| 136 | processing of the previous month and before processing the current month. |
| 137 | You may also want to delete the 'webalizer.current' file as well (or |
| 138 | whatever name was specified using the "IncrementalName" configuration |
| 139 | option). |
| 140 | |
| 141 | The Webalizer also attempts to prevent data duplication by keeping |
| 142 | track of the timestamp of the last record processed. This timestamp |
| 143 | is then compared to current records being processed, and any records |
| 144 | that were logged previous to that timestamp are ignored. This, in |
| 145 | theory, should allow you to re-process logs that have already been |
| 146 | processed, or process logs that contain a mix of processed/not yet |
| 147 | processed records, and not produce duplication of statistics. The |
| 148 | only time this may break is if you have duplicate timestamps in two |
| 149 | separate log files... any records in the second log file that do have |
| 150 | the same timestamp as the last record in the previous log file processed, |
| 151 | will be discarded as if they had already been processed. There are |
| 152 | lots of ways to prevent this however, for example, stopping the web |
| 153 | server before rotating logs will prevent this situation. This setup |
| 154 | also necessitates that you always process logs in chronological order, |
| 155 | otherwise data loss will occur as a result of the timestamp compare. |
| 156 | |
| 157 | |
| 158 | Output Produced |
| 159 | --------------- |
| 160 | |
| 161 | The Webalizer produces several reports (html) and graphics for each |
| 162 | month processed. In addition, a summary page is generated for the |
| 163 | current and previous months (up to 12), a history file is created |
| 164 | and if incremental mode is used, the current month's processed data. |
| 165 | The exact location and names of these files can be changed using |
| 166 | configuration files and command line options. The files produced, |
| 167 | (default names) are: |
| 168 | |
| 169 | index.html - Main summary page (extension may be changed) |
| 170 | usage.png - Yearly graph displayed on the main index page |
| 171 | usage_YYYYMM.html - Monthly summary page (extension may be changed) |
| 172 | usage_YYYYMM.png - Monthly usage graph for specified month/year |
| 173 | daily_usage_YYYYMM.png - Daily usage graph for specified month/year |
| 174 | hourly_usage_YYYYMM.png - Hourly usage graph for specified month/year |
| 175 | site_YYYYMM.html - All sites listing (if enabled) |
| 176 | url_YYYYMM.html - All urls listing (if enabled) |
| 177 | ref_YYYYMM.html - All referrers listing (if enabled) |
| 178 | agent_YYYYMM.html - All user agents listing (if enabled) |
| 179 | search_YYYYMM.html - All search strings listing (if enabled) |
| 180 | webalizer.hist - Previous month history (may be changed) |
| 181 | webalizer.current - Incremental Data (may be changed) |
| 182 | site_YYYYMM.tab - tab delimited sites file |
| 183 | url_YYYYMM.tab - tab delimited urls file |
| 184 | ref_YYYYMM.tab - tab delimited referrers file |
| 185 | agent_YYYYMM.tab - tab delimited user agents file |
| 186 | user_YYYYMM.tab - tab delimited usernames file |
| 187 | search_YYYYMM.tab - tab delimited search string file |
| 188 | |
| 189 | The yearly (index) report shows statistics for a 12 month period, and |
| 190 | links to each month. The monthly report has detailed statistics for |
| 191 | that month with additional links to any URLs and referrers found. |
| 192 | The various totals shown are explained below. |
| 193 | |
| 194 | Hits |
| 195 | |
| 196 | Any request made to the server which is logged, is considered a 'hit'. |
| 197 | The requests can be for anything... html pages, graphic images, audio |
| 198 | files, CGI scripts, etc... Each valid line in the server log is |
| 199 | counted as a hit. This number represents the total number of requests |
| 200 | that were made to the server during the specified report period. |
| 201 | |
| 202 | Files |
| 203 | |
| 204 | Some requests made to the server, require that the server then send |
| 205 | something back to the requesting client, such as a html page or graphic |
| 206 | image. When this happens, it is considered a 'file' and the files |
| 207 | total is incremented. The relationship between 'hits' and 'files' can |
| 208 | be thought of as 'incoming requests' and 'outgoing responses'. |
| 209 | |
| 210 | Pages |
| 211 | |
| 212 | Pages are, well, pages! Generally, any HTML document, or anything |
| 213 | that generates an HTML document, would be considered a page. This |
| 214 | does not include the other stuff that goes into a document, such as |
| 215 | graphic images, audio clips, etc... This number represents the number |
| 216 | of 'pages' requested only, and does not include the other 'stuff' that |
| 217 | is in the page. What actually constitutes a 'page' can vary from |
| 218 | server to server. The default action is to treat anything with the |
| 219 | extension '.htm', '.html' or '.cgi' as a page. A lot of sites will |
| 220 | probably define other extensions, such as '.phtml', '.php3' and '.pl' |
| 221 | as pages as well. Some people consider this number as the number of |
| 222 | 'pure' hits... I'm not sure if I totally agree with that viewpoint. |
| 223 | Some other programs (and people :) refer to this as 'Pageviews'. |
| 224 | |
| 225 | Sites |
| 226 | |
| 227 | Each request made to the server comes from a unique 'site', which can |
| 228 | be referenced by a name or ultimately, an IP address. The 'sites' |
| 229 | number shows how many unique IP addresses made requests to the server |
| 230 | during the reporting time period. This DOES NOT mean the number of |
| 231 | unique individual users (real people) that visited, which is impossible |
| 232 | to determine using just logs and the HTTP protocol (however, this |
| 233 | number might be about as close as you will get). |
| 234 | |
| 235 | Visits |
| 236 | |
| 237 | Whenever a request is made to the server from a given IP address |
| 238 | (site), the amount of time since a previous request by the address |
| 239 | is calculated (if any). If the time difference is greater than a |
| 240 | pre-configured 'visit timeout' value (or has never made a request before), |
| 241 | it is considered a 'new visit', and this total is incremented (both |
| 242 | for the site, and the IP address). The default timeout value is 30 |
| 243 | minutes (can be changed), so if a user visits your site at 1:00 in |
| 244 | the afternoon, and then returns at 3:00, two visits would be registered. |
| 245 | Note: in the 'Top Sites' table, the visits total should be discounted |
| 246 | on 'Grouped' records, and thought of as the "Minimum number of visits" |
| 247 | that came from that grouping instead. Note: Visits only occur on |
| 248 | PageType requests, that is, for any request whose URL is one of the |
| 249 | 'page' types defined with the PageType and PagePrefix option, and not |
| 250 | excluded by the OmitPage option. Due to the limitation of the HTTP |
| 251 | protocol, log rotations and other factors, this number should not be |
| 252 | taken as absolutely accurate, rather, it should be considered a pretty |
| 253 | close "guess". |
| 254 | |
| 255 | KBytes |
| 256 | |
| 257 | The KBytes (kilobytes) value shows the amount of data, in KB, that |
| 258 | was sent out by the server during the specified reporting period. This |
| 259 | value is generated directly from the log file, so it is up to the |
| 260 | web server to produce accurate numbers in the logs (some web servers |
| 261 | do stupid things when it comes to reporting the number of bytes). In |
| 262 | general, this should be a fairly accurate representation of the amount |
| 263 | of outgoing traffic the server had, regardless of the web servers |
| 264 | reporting quirks. |
| 265 | |
| 266 | Note: A kilobyte is 1024 bytes, not 1000 :) |
| 267 | |
| 268 | Top Entry and Exit Pages |
| 269 | |
| 270 | The Top Entry and Exit tables give a rough estimate of what URLs |
| 271 | are used to enter your site, and what the last pages viewed are. |
| 272 | Because of limitations in the HTTP protocol, log rotations, etc... |
| 273 | this number should be considered a good "rough guess" of the actual |
| 274 | numbers, however will give a good indication of the overall trend in |
| 275 | where users come into, and exit, your site. |
| 276 | |
| 277 | |
| 278 | Command Line Options |
| 279 | -------------------- |
| 280 | |
| 281 | The Webalizer supports many different configuration options that will |
| 282 | alter the way the program behaves and generates output. Most of these |
| 283 | can be specified on the command line, while some can only be specified |
| 284 | in a configuration file. The command line options are listed below, |
| 285 | with references to the corresponding configuration file keywords. |
| 286 | |
| 287 | -------------------------------------------------------------------------- |
| 288 | |
| 289 | General Options |
| 290 | --------------- |
| 291 | |
| 292 | -h Display all available command line options and exit program. |
| 293 | |
| 294 | -v Be Verbose. This will cause the program to print additional |
| 295 | information at run time. It is the same as specifying |
| 296 | "Quiet no", "ReallyQuiet no" and "Debug yes" config options. |
| 297 | |
| 298 | -V Display the program version and exit. Additional program |
| 299 | specific information will be displayed if 'verbose' mode is |
| 300 | also used (e.g. '-vV'), which can be useful when submitting |
| 301 | bug reports. |
| 302 | |
| 303 | -d Display additional 'debugging' information for errors and |
| 304 | warnings produced during processing. This normally would |
| 305 | not be used except to determine why you are getting all those |
| 306 | errors and wanted to see the actual data. Normally The |
| 307 | Webalizer will just tell you it found an error, not the |
| 308 | actual data. This option will display the data as well. |
| 309 | Config file keyword: Debug |
| 310 | |
| 311 | -F Specify the log file type to process. Normally, the |
| 312 | Webalizer expects to find a valid CLF or Combined format |
| 313 | we server log file. This option allows you to process |
| 314 | wu-ftpd xferlogs, squid and W3C formatted web logs as well. |
| 315 | Values can be either 'clf', 'ftp', 'squid' or 'w3c' with |
| 316 | 'clf' being the default. Only the first character needs |
| 317 | to be specified (eg: -Fs will process a squid log). |
| 318 | Config file keyword: LogType |
| 319 | |
| 320 | -f Fold out of sequence log records back into analysis, by |
| 321 | treating them as if they were the same date/time as the |
| 322 | last good record. Normally, out of sequence log records |
| 323 | are ignored. If you run apache, don't worry about this. |
| 324 | Config file keyword: FoldSeqErr |
| 325 | |
| 326 | -i Ignore history file. USE WITH CAUTION. This causes The |
| 327 | Webalizer to ignore any existing history file produced from |
| 328 | previous runs and generate its output from scratch. The |
| 329 | effect will be as if The Webalizer is being run for the |
| 330 | first time and any previous statistics will be lost (although |
| 331 | the HTML documents, if any, will not be deleted) on the main |
| 332 | index.html (yearly) web page. |
| 333 | Config file keyword: IgnoreHist |
| 334 | |
| 335 | -b Ignore incremental data file. USE WITH CAUTION. This causes |
| 336 | The Webalizer to ignore any existing incremental (state) data |
| 337 | file produced by previous runs. By ignoring the incremental |
| 338 | data file, all previous processing for the current month will |
| 339 | be lost, and those logs must be re-processed. |
| 340 | Config file keyword: IgnoreState |
| 341 | |
| 342 | -p Preserve state (incremental processing). This allows the |
| 343 | processing of partial logs in increments. At the end of |
| 344 | the program, all relevant internal data is saved, so that |
| 345 | it may be restored the next time the program is run. This |
| 346 | allows sites that must rotate their logs more than once a |
| 347 | month to still be able to use The Webalizer, and not worry |
| 348 | about having to gather and feed an entire months logs to |
| 349 | the program at the end of the month. See the section on |
| 350 | "Incremental Processing" below for additional information. |
| 351 | The default is to not perform incremental processing. Use |
| 352 | this command line option to enable the feature. |
| 353 | Config file keyword: Incremental |
| 354 | |
| 355 | -q Quiet mode. Normally, The Webalizer will produce various |
| 356 | messages while it runs letting you know what its doing. |
| 357 | This option will suppress those messages. It should be |
| 358 | noted that this WILL NOT suppress errors and warnings, which |
| 359 | are output to STDERR. |
| 360 | Config file keyword: Quiet |
| 361 | |
| 362 | -Q ReallyQuiet mode. This allows suppression of _all_ messages |
| 363 | generated by The Webalizer, including warnings and errors. |
| 364 | Useful when The Webalizer is run as a cron job. |
| 365 | Config file keyword: ReallyQuiet |
| 366 | |
| 367 | -T Display timing information. The Webalizer keeps track of the |
| 368 | time it begins and ends processing, and normally displays the |
| 369 | total processing time at the end of each run. If quiet mode |
| 370 | (-q or 'Quiet yes' in configuration file) is specified, this |
| 371 | information is not displayed. This option forces the display |
| 372 | of timing totals if quiet mode has been specified, otherwise |
| 373 | it is redundant and will have no effect. |
| 374 | Config file keyword: TimeMe |
| 375 | |
| 376 | -c file This option specifies a configuration file to use. Configuration |
| 377 | files allow greater control over how The Webalizer behaves, and |
| 378 | there are several ways to use them. As of version 0.98, The |
| 379 | Webalizer searches for a default configuration file in the |
| 380 | current directory named "webalizer.conf", and if not found, |
| 381 | will search in the /etc/ directory for a file of the same name. |
| 382 | In addition, you may specify a configuration file to use with |
| 383 | this command line option. |
| 384 | |
| 385 | -n name This option specifies the hostname for the reports generated. |
| 386 | The hostname is used in the title of all reports, and is also |
| 387 | prepended to URLs in the reports. This allows The Webalizer |
| 388 | to be run on log files for 'virtual' web servers or web servers |
| 389 | that are different than the machine the reports are located on, |
| 390 | and still allows clicking on the URLs to go to the proper |
| 391 | location. If a hostname is not specified, either on the |
| 392 | command line or in a configuration file, The Webalizer attempts |
| 393 | to determine the hostname using a 'uname' system call. If this |
| 394 | fails, "localhost" will be used as the hostname. |
| 395 | Config file keyword: HostName |
| 396 | |
| 397 | -o dir This options specifies the output directory for the reports. |
| 398 | If not specified here or in a configuration file, the current |
| 399 | default directory will be used for output. |
| 400 | Config file keyword: OutputDir |
| 401 | |
| 402 | -x name This option allows the generated pages to have an extension |
| 403 | other than '.html', which is the default. Do not include the |
| 404 | leading period ('.') when you specify the extension. |
| 405 | Config file keyword: HTMLExtension |
| 406 | |
| 407 | -P name Specify the file extensions for 'pages'. Pages (sometimes |
| 408 | called 'PageViews') are normally html documents and CGI |
| 409 | scripts that display the whole page, not just parts of it. |
| 410 | Some system will need to define a few more, such as 'phtml', |
| 411 | 'php3' or 'pl' in order to have them counted as well. The |
| 412 | default is 'htm*' and 'cgi' for web logs and 'txt' for ftp. |
| 413 | Config file keyword: PageType |
| 414 | |
| 415 | -O name Specify URLs which are not counted as 'pages'. Requests |
| 416 | matching one of these URLs will not be counted as a page, even |
| 417 | if they have an extension matching one of the PageTypes defined |
| 418 | above or have no extension at all. |
| 419 | Config file keyword: OmitPage |
| 420 | |
| 421 | -t name This option specifies the title string for all reports. This |
| 422 | string is used, in conjunction with the hostname (if not blank) |
| 423 | to produce the actual title. If not specified, the default of |
| 424 | "Usage Statistics for" will be used. |
| 425 | Config file keyword: ReportTitle |
| 426 | |
| 427 | -Y Suppress Country graph. Normally, The Webalizer produces |
| 428 | country statistics in both Graph and Columnar forms. This |
| 429 | option will suppress the Country Graph from being generated. |
| 430 | Config file keyword: CountryGraph |
| 431 | |
| 432 | -G Suppress hourly graph. Normally, The Webalizer produces |
| 433 | hourly statistics in both Graph and Columnar forms. This |
| 434 | option will suppress the Hourly Graph only from being generated. |
| 435 | Config file keyword: HourlyGraph |
| 436 | |
| 437 | -H Suppress Hourly statistics. Normally, The Webalizer produces |
| 438 | hourly statistics in both Graph and Columnar forms. This |
| 439 | option will suppress the Hourly Statistics table only from |
| 440 | being generated. |
| 441 | Config file keyword: HourlyStats |
| 442 | |
| 443 | -K num Specify how many months should be displayed in the main index |
| 444 | (yearly summary) table. Default is 12 months. Can be set to |
| 445 | anything between 12 and 120 months (1 to 10 years). |
| 446 | Config file keyword: IndexMonths |
| 447 | |
| 448 | -k num Specify how many months should be displayed in the main index |
| 449 | (yearly summary) graph. Default is 12 months. Can be set to |
| 450 | anything between 12 and 72 months (1 to 6 years). |
| 451 | Config file keyword: GraphMonths |
| 452 | |
| 453 | -L Disable Graph Legends. The color coded legends displayed on |
| 454 | the in-line graphs can be disabled with this option. The |
| 455 | default is to display the legends. |
| 456 | Config file keyword: GraphLegend |
| 457 | |
| 458 | -l num Graph Lines. Specify the number of background reference |
| 459 | lines displayed on the in-line graphics produced. The default |
| 460 | is 2 lines, however can range anywhere from zero ('0') for |
| 461 | no lines, up to 20 lines (looks funny!). |
| 462 | Config file keyword: GraphLines |
| 463 | |
| 464 | -P name Page type. This is the extension of files you consider to |
| 465 | be pages for Pages calculations (sometimes called 'pageviews'). |
| 466 | The default is 'htm*' and 'cgi' (plus whatever HTMLExtension |
| 467 | you specified if it is different). Don't use a period! |
| 468 | |
| 469 | -m num Specify a 'visit timeout'. Visits are calculated by looking at |
| 470 | the time difference between the current and last request made |
| 471 | by a specific host. If the difference is greater that the |
| 472 | visit timeout value, the request is considered a new visit. |
| 473 | This value is specified in number of seconds. The default |
| 474 | is 30 minutes (1800). |
| 475 | Config file keyword: VisitTimeout |
| 476 | |
| 477 | -M num Mangle user agent names. Normally, The Webalizer will keep |
| 478 | track of the user agent field verbatim. Unfortunately, there are |
| 479 | a ton of different names that user agents go by, and the field |
| 480 | also reports other items such as machine type and OS used. For |
| 481 | Example, Netscape 4.03 running on Windows 95 will report a |
| 482 | different string than Netscape 4.03 running on Windows NT, so even |
| 483 | though they are the same browser type, they will be considered |
| 484 | as two totally different browsers by The Webalizer. For that |
| 485 | matter, Netscape 4.0 running on Windows NT will report different |
| 486 | names if one is run on an Alpha and the other on an Intel |
| 487 | processor! Internet Exploder is even worse, as it reports itself |
| 488 | as if it were Netscape and you have to search the given string a |
| 489 | little deeper to discover that it is really MSIE! In order to |
| 490 | consolidate generic browser types, this option will cause The |
| 491 | Webalizer to 'mangle' the user agent field, attempting to |
| 492 | consolidate generic browser types. There are 6 levels that can be |
| 493 | specified, each producing different levels of detail. Level 5 |
| 494 | displays only the browser name (MSIE or Mozilla) and the major |
| 495 | version number. Level 4 will also display the minor version |
| 496 | number (single decimal place). Level 3 will display the minor |
| 497 | version number to two decimal places. Level 2 will add any |
| 498 | sub-level designation (such as Mozilla/3.01Gold or MSIE 3.0b). |
| 499 | Level 1 will also attempt to add the system type. The default |
| 500 | Level 0 will disable name mangling and leave the user agent |
| 501 | field unmodified, producing the greatest amount of detail. |
| 502 | Configuration file keyword: MangleAgents |
| 503 | |
| 504 | -g num This option allows you to specify the level of domains name |
| 505 | grouping to be performed. The numeric value represents the |
| 506 | level of grouping, and can be thought of as the 'number of |
| 507 | dots' to be displayed. The default value of 0 disables any |
| 508 | domain name grouping. |
| 509 | Configuration file keyword: GroupDomains |
| 510 | |
| 511 | -D name This allows the specification of a DNS Cache file name. This |
| 512 | filename MUST be specified if you have dns lookups enabled |
| 513 | (using the -N command line switch or DNSChildren configuration |
| 514 | keyword). The filename is relative to the default output |
| 515 | directory if an absolute path is not specified (ie: starts |
| 516 | with a leading '/'). This option is only available if DNS |
| 517 | support was enabled at compile time, otherwise an 'Invalid |
| 518 | Keyword' error will be generated. See the DNS.README file |
| 519 | for additional information regarding DNS lookups. |
| 520 | Configuration file keyword: DNSCache |
| 521 | |
| 522 | -N num Number of DNS child processes to use for reverse DNS lookups. |
| 523 | If specified, a DNSCache name MUST be specified also. If you |
| 524 | do not wish a DNS cache file to be generated, specify a value |
| 525 | of zero ('0') to disable it. This does not prevent using an |
| 526 | existing cache file, only the generation of one at run time. |
| 527 | See the DNS.README file for additional information. |
| 528 | Configuration file keyword: DNSChildren |
| 529 | |
| 530 | -j Enable native GeoDB geolocation services. |
| 531 | Configuration file keyword: GeoDB |
| 532 | |
| 533 | -J name Specify an alternate GeoDB database filename to use. This |
| 534 | shouldn't normally be needed. If used, the filename 'name' |
| 535 | is relative to the output directory being used unless an |
| 536 | absolute path is specified (ie: starts with a leading '/'). |
| 537 | Configuration file keyword: GeoDBDatabase |
| 538 | |
| 539 | -w Enable GeoIP support if it is available. |
| 540 | Configuration file keyword: GeoIP |
| 541 | |
| 542 | -W name Specify an alternate GeoIP database filename to use. This |
| 543 | shouldn't normally be needed. If used, the filename 'name' |
| 544 | is relative to the specified output directory unless an |
| 545 | absolute name is given (ie: starts with a leading '/'). |
| 546 | Configuration file keyword: GeoIPDatabase |
| 547 | |
| 548 | -z name Specify location of the country flag graphics and enable |
| 549 | their display in the top country table. The directory name |
| 550 | is relative to the output directory unless an absolute path |
| 551 | is specified (ie: starts with a leading '/'). |
| 552 | Configuration file keyword: FlagDir |
| 553 | |
| 554 | Hide Options |
| 555 | ------------ |
| 556 | |
| 557 | The following options take a string argument to use as a comparison |
| 558 | for matching. Except for the IndexAlias option, the string argument |
| 559 | can be plain text, or plain text that either starts or ends with the |
| 560 | wildcard character '*'. |
| 561 | |
| 562 | For Example: |
| 563 | |
| 564 | Given the string "yourmama/was/here", the arguments "was", "*here" and |
| 565 | "your*" will all produce a match. |
| 566 | |
| 567 | |
| 568 | -a name This option allows hiding of user agents (browsers) from the |
| 569 | "Top User Agents" table in the report. This option really |
| 570 | isn't too useful as there are a zillion different names that |
| 571 | current browsers go by, depending where they were obtained, |
| 572 | however you might have some particular user agents that hit |
| 573 | your site a lot that you would like to exclude from the list. |
| 574 | You must have a web server that includes user agents in its |
| 575 | log files for this option to be of any use. In addition, it |
| 576 | is also useless if you disable the user agent table in the |
| 577 | report (see the -A command line option or "TopAgents" |
| 578 | configuration file keyword). You can specify as many of these |
| 579 | as you want on the command line. The wildcard character '*' |
| 580 | can be used either in front of or at the end of the string. |
| 581 | (ie: Mozilla/4.0* would match anything that starts with the |
| 582 | string "Mozilla/4.0"). |
| 583 | Config file keyword: HideAgent |
| 584 | |
| 585 | -r name This option allows hiding of referrers from the "Top Referrer" |
| 586 | table in the report. Referrers are URLs, either on your own |
| 587 | local site or a remote site, that referred the user to a URL |
| 588 | on your web server. This option is normally used to hide |
| 589 | your own server from the table, as your own pages are usually |
| 590 | the top referrers to your own pages (well, you get the idea). |
| 591 | You must have a web server that includes referrer information |
| 592 | in the log files for this option to be of any use. In addition, |
| 593 | it is also useless if you disable the referrers table in the |
| 594 | report (see the -R command line option or "TopReferrers" |
| 595 | configuration file keyword). You can specify as many of these |
| 596 | as you like on the command line. |
| 597 | Config file keyword: HideReferrer |
| 598 | |
| 599 | -s name This option allows hiding of sites from the "Top Sites" table |
| 600 | in the report. Normally, you will only want to hide your own |
| 601 | domain name from the report, as it usually is one of the top |
| 602 | sites to visit your web server. This option is of no use if |
| 603 | you disable the top sites table in the report (see the -S |
| 604 | command line option or "TopSites" configuration file option). |
| 605 | Config file keyword: HideSite |
| 606 | |
| 607 | -X This causes all individual sites to be hidden, which results |
| 608 | in only grouped sites to be displayed on the report. |
| 609 | Config file keyword: HideAllSites |
| 610 | |
| 611 | -u name This option allows hiding of URLs from the "Top URLs" table |
| 612 | in the report. Normally, this option is used to hide images, |
| 613 | audio files and other objects your web server dishes out that |
| 614 | would otherwise clutter up the table. This option is of no |
| 615 | use if you disable the top URLs table in the report (see the |
| 616 | -U command line option or "TopURLs" configuration file keyword). |
| 617 | Config file keyword: HideURL |
| 618 | |
| 619 | -I name This option allows you to specify additional index.html aliases. |
| 620 | The Webalizer usually strips the string 'index.*' from URLs |
| 621 | before processing (unless disabled using the 'DefaultIndex' |
| 622 | config option), which has the effect of turning a URL such |
| 623 | as /somedir/index.html into just /somedir/ which is really the |
| 624 | same URL and should be treated as such. This option allows you |
| 625 | to specify _additional_ strings that are to be treated the same |
| 626 | way. Use with care, improper use could cause unexpected results. |
| 627 | For example, if you specify the alias string of 'home', a URL |
| 628 | such as /somedir/homepages/brad/home.html would be converted |
| 629 | into just /somedir/ which probably isn't what was intended. |
| 630 | This option is useful if your web server uses a different default |
| 631 | index page other than the standard 'index.html' or 'index.htm', |
| 632 | such as 'home.html' or 'homepage.html'. The string specified |
| 633 | is searched for _anywhere_ in the URL, so "home.htm" would |
| 634 | turn both "/somedir/home.htm" and "/somedir/home.html" into |
| 635 | just "/somedir/". Wildcards are _not_ allowed on this one. |
| 636 | Config file keyword: IndexAlias |
| 637 | |
| 638 | Table Size Options |
| 639 | ------------------ |
| 640 | |
| 641 | -e num This option specifies the number of entries to display in the |
| 642 | "Top Entry Pages" table. To disable the table, use a value of |
| 643 | zero (0). |
| 644 | Config file keyword: TopEntry |
| 645 | |
| 646 | -E num This option specifies the number of entries to display in the |
| 647 | "Top Exit Pages" table. To disable the table, use a value of |
| 648 | zero (0). |
| 649 | Config file keyword: TopExit |
| 650 | |
| 651 | -A num This option specifies the number of entries to display in the |
| 652 | "Top User Agents" table. To disable the table, use a value of |
| 653 | zero (0). |
| 654 | Config file keyword: TopAgents |
| 655 | |
| 656 | -C num This option specifies the number of entries to display in the |
| 657 | "Top Countries" table. To disable the table, use a value of |
| 658 | zero (0). |
| 659 | Config file keyword: TopCountries |
| 660 | |
| 661 | -R num This option specifies the number of entries to display in the |
| 662 | "Top Referrers" table. To disable the table, use a value of |
| 663 | zero (0). |
| 664 | Config file keyword: TopReferrers |
| 665 | |
| 666 | -S num This option specifies the number of entries to display in the |
| 667 | "Top Sites" table. To disable the table, use a value of |
| 668 | zero (0). |
| 669 | Config file keyword: TopSites |
| 670 | |
| 671 | -U num This option specifies the number of entries to display in the |
| 672 | "Top URLs" table. To disable the table, use a value of |
| 673 | zero (0). |
| 674 | Config file keyword: TopURLs |
| 675 | |
| 676 | -------------------------------------------------------------------------- |
| 677 | |
| 678 | |
| 679 | CONFIGURATION FILES |
| 680 | ------------------- |
| 681 | |
| 682 | The Webalizer allows configuration files to be used in order to simplify |
| 683 | life for all. There are several ways that configuration files are accessed |
| 684 | by the Webalizer. When The Webalizer first executes, it looks for a |
| 685 | default configuration file named "webalizer.conf" in the current directory, |
| 686 | and if not found there, will look for "/etc/webalizer.conf". In addition, |
| 687 | configuration files may be specified on the command line with the '-c' |
| 688 | option. There are lots of different ways you can combine the use of |
| 689 | configuration files and command line options to produce various results. |
| 690 | The Webalizer always looks for and reads configuration options from a |
| 691 | default configuration file before doing anything else. Because of this, |
| 692 | you can override options found in the default file by use of additional |
| 693 | configuration files specified on the command line or command line options |
| 694 | themselves. If you specify a configuration file on the command line, you |
| 695 | can override options in it by additional command line options which follow. |
| 696 | For example, most users will most likely want to create the default file |
| 697 | /etc/webalizer.conf and place options in it to specify the hostname, log |
| 698 | file, table options, etc... At the end of the month when a different log |
| 699 | file is to be used (the end of month log), you can run The Webalizer as |
| 700 | usual, but put the different filename on the end of the command line, which |
| 701 | will override the log file specified in the configuration file. It should |
| 702 | be noted that you cannot override some configuration file options by the |
| 703 | use of command line arguments. For example, if you specify "Quiet yes" in |
| 704 | a configuration file, you cannot override this with a command line argument, |
| 705 | as the command line option only _enables_ the feature (-q option). |
| 706 | |
| 707 | The configuration files are standard ASCII text files that may be created |
| 708 | or edited using any standard editor. Blank lines and lines that begin |
| 709 | with a pound sign ('#') are ignored. Any other lines are considered to |
| 710 | be configuration lines, and have the form "Keyword Value", where the |
| 711 | 'Keyword' is one of the currently available configuration keywords defined |
| 712 | below, and 'Value' is the value to assign to that particular option. Any |
| 713 | text found after the keyword up to the end of the line is considered the |
| 714 | keyword's value, so you should not include anything after the actual value |
| 715 | on the line that is not actually part of the value being assigned. The |
| 716 | file "sample.conf" provided with the distribution contains lots of useful |
| 717 | documentation and examples as well. It should be noted that you do not |
| 718 | have to use any configuration files at all, in which case, default values |
| 719 | will be used (which should be sufficient for most sites). |
| 720 | |
| 721 | -------------------------------------------------------------------------- |
| 722 | |
| 723 | General Configuration Keywords |
| 724 | ------------------------------ |
| 725 | |
| 726 | LogFile This defines the log file to use. It should be a fully qualified |
| 727 | name (ie: contain the path), but relative names will work as |
| 728 | well. If not specified, the logfile defaults to STDIN. |
| 729 | |
| 730 | LogType This specified the log file type being used. Normally, The |
| 731 | Webalizer processes web logs in either CLF or Combined format. |
| 732 | You may also process wu-ftpd xferlog formatted logs, squid |
| 733 | proxy logs or W3C formatted web logs by setting the appropriate |
| 734 | type using this keyword. Values may be either 'clf', 'ftp', |
| 735 | 'squid' or 'w3c'. Ensure that you specify the proper file type, |
| 736 | otherwise you will be presented with a long stream of 'invalid |
| 737 | record' messages when the Webalizer is run ;) |
| 738 | Command line argument: -F |
| 739 | |
| 740 | OutputDir This defines the output directory to use for the reports. If |
| 741 | it is not specified, the current directory is used. |
| 742 | Command line argument: -o |
| 743 | |
| 744 | HistoryName Allows specification of a history path/filename if desired. |
| 745 | The default is to use the file named 'webalizer.hist', kept |
| 746 | in the normal output directory (OutputDir above). Any name |
| 747 | specified is relative to the normal output directory unless |
| 748 | an absolute path name is given (ie: starts with a '/'). |
| 749 | |
| 750 | ReportTitle This specifies the title to use for the generated reports. |
| 751 | It is used in conjunction with the hostname (unless blank) |
| 752 | to produce the final report titles. If not defined, the |
| 753 | default of "Usage Statistics for" is used. |
| 754 | Command line argument: -t |
| 755 | |
| 756 | HostName This defines the hostname. The hostname is used in the |
| 757 | report title as well as being prepended to URLs in the |
| 758 | "Top URLs" table. This allows The Webalizer to be run |
| 759 | on "virtual" web servers, or servers that do not reside |
| 760 | on the local machine, and allows clicking on the URL to |
| 761 | go to the right place. If not specified, The Webalizer |
| 762 | attempts to get the hostname via a 'uname' system call, |
| 763 | and if that fails, will default to "localhost". |
| 764 | Command line argument: -n |
| 765 | |
| 766 | UseHTTPS Causes the links in the 'Top URLs' table to use 'https://' |
| 767 | instead of the default 'http://' prefix. Not much use if |
| 768 | you run a mix of secure/insecure servers on your machine. |
| 769 | Only useful if you run the analysis on a secure servers |
| 770 | logs, and want the links in the table to work properly. |
| 771 | |
| 772 | HTAccess Enables the creation of a default .htaccess file in the |
| 773 | output directory. If enabled, the file will be created |
| 774 | (with a single "DirectoryIndex" directive), unless one |
| 775 | already exists. The default is 'no', which disables the |
| 776 | creation of any .htaccess files. |
| 777 | |
| 778 | Quiet This allows you to enable or disable informational messages |
| 779 | while it is running. The values for this keyword can be |
| 780 | either 'yes' or 'no'. Using "Quiet yes" will suppress these |
| 781 | messages, while "Quiet no" will enable them. The default |
| 782 | is 'no' if not specified, which will allow The Webalizer |
| 783 | to display informational messages. It should be noted that |
| 784 | this option has no effect on Warning or Error messages that |
| 785 | may be generated, as they go to STDERR. |
| 786 | Command line argument: -q |
| 787 | |
| 788 | ReallyQuiet This allows all generated output to be suppressed, including |
| 789 | warning and error messages. The values for this keyword |
| 790 | can be either 'yes' or 'no', with 'no' being the default. |
| 791 | Command line argument: -Q |
| 792 | |
| 793 | TimeMe This allows you to display timing information regardless of |
| 794 | any "quiet mode" specified. Useful only if you did in fact |
| 795 | tell the webalizer to be quiet either by using the -q command |
| 796 | line option or the "Quiet" keyword, otherwise timing stats |
| 797 | are normally displayed anyway. Values may be either 'yes' |
| 798 | or 'no', with the default being 'no'. |
| 799 | Command line argument: -T |
| 800 | |
| 801 | GMTTime This keyword allows timestamps to be displayed in GMT (UTC) |
| 802 | time instead of local time. Normally The Webalizer will |
| 803 | display timestamps in the time-zone of the local machine |
| 804 | (ie: PST or EDT). This keyword allows you to specify the |
| 805 | display of timestamps in GMT (UTC) time instead. Values |
| 806 | may be either 'yes' or 'no'. Default is 'no'. |
| 807 | |
| 808 | Debug This tells The Webalizer to display additional information |
| 809 | when it encounters Warnings or Errors. Normally, The |
| 810 | Webalizer will just tell you it found a bad record or |
| 811 | field. This option will enable the display of the actual |
| 812 | data that produced the Warning or Error as well. Useful |
| 813 | only if you start getting lots of Warnings or Errors and |
| 814 | want to determine the cause. Values may be either 'yes' |
| 815 | or 'no', with the default being 'no'. |
| 816 | Command line argument: -d |
| 817 | |
| 818 | IgnoreHist This suppresses the reading of a history file. USE WITH |
| 819 | EXTREME CAUTION as the history file is how The Webalizer |
| 820 | keeps track of previous months. The effect of this option |
| 821 | is as if The Webalizer was being run for the very first |
| 822 | time, and any previous data is discarded. Values may be |
| 823 | either 'yes' or 'no', with the default being 'no'. |
| 824 | Command line argument: -i |
| 825 | |
| 826 | IgnoreState This suppresses the reading of an existing incremental |
| 827 | data file. USE WITH EXTREME CAUTION! By ignoring an |
| 828 | existing incremental data file, all previous processing |
| 829 | for the current month will be lost, and those logs must |
| 830 | be re-processed. Values may be 'yes' or 'no', with the |
| 831 | default being 'no'. |
| 832 | Command line argument: -b |
| 833 | |
| 834 | FoldSeqErr Allows log records that are out of sequence to be folded |
| 835 | back into the analysis, by treating them as if they had |
| 836 | the same date/time as the last good record. Normally, |
| 837 | out of sequence log records are simply ignored. If you |
| 838 | run apache, don't worry about this. |
| 839 | |
| 840 | VisitTimeout Set the 'visit timeout' value. Visits are determined by |
| 841 | looking at the time difference between the current and last |
| 842 | request made by a specific site. If the difference in time |
| 843 | is greater than the visit timeout value, the request is |
| 844 | considered a new visit. The value is in number of seconds, |
| 845 | and defaults to 30 minutes (1800). |
| 846 | Command line argument: -m |
| 847 | |
| 848 | PageType Allows you to define the 'page' type extension. Normally, |
| 849 | people consider HTML and CGI scripts as 'pages'. This |
| 850 | option allows you to specify what extensions you consider |
| 851 | a page. Default is 'htm*' and 'cgi' for web logs, and |
| 852 | 'txt' for ftp logs. |
| 853 | Command line argument: -P |
| 854 | |
| 855 | PagePrefix Allows all requests with a specified prefix to be considered |
| 856 | as 'pages'. If you want everything under /documents to be |
| 857 | treated as pages no matter what their extension is. Also |
| 858 | useful if you have cgi-scripts with PATH_INFO. |
| 859 | |
| 860 | OmitPage Allows specified URLs to not be counted as pages under any |
| 861 | circumstance, even if they have an extension matching a |
| 862 | PageType or PagePrefix as defined above. |
| 863 | |
| 864 | GraphLegend Enable/disable the display of color coded legends on the |
| 865 | produced graphs. Default is 'yes', to display them. |
| 866 | Command line argument: -L |
| 867 | |
| 868 | GraphLines Specify the number of background reference lines to display |
| 869 | on produced graphs. The default is 2. To disable the use |
| 870 | of background lines, use zero ('0'). |
| 871 | Command line argument: -l |
| 872 | |
| 873 | IndexMonths Specify the number of months to display in the main index |
| 874 | (yearly summary) table. Default is 12 months. Can be set |
| 875 | to anything between 12 and 120 months (1 to 10 years). |
| 876 | Command line argument: -K |
| 877 | |
| 878 | YearHeaders Enable/disable the display of year headers in the main index |
| 879 | (yearly summary) table. If enabled, year headers will be |
| 880 | shown when the table is displaying more than 16 months worth |
| 881 | of data. Values can be 'yes' or 'no'. Default is 'yes'. |
| 882 | |
| 883 | GraphMonths Specify the number of months to display in the main index |
| 884 | (yearly summary) graph. Default is 12 months. Can be set |
| 885 | to anything between 12 and 72 months (1 to 6 years). |
| 886 | Command line argument: -k |
| 887 | |
| 888 | CountryGraph This keyword is used to either enable or disable the creation |
| 889 | and display of the Country Usage graph. Values may be either |
| 890 | 'yes' or 'no', with the default being 'yes'. |
| 891 | Command line argument: -Y |
| 892 | |
| 893 | CountryFlags Enables or disables the display of flags in the top country |
| 894 | table. If enabled, the default directory 'flags' directly |
| 895 | under the output directory will be used unless a different |
| 896 | path is specified with the 'FlagDir' option below. |
| 897 | Command line argument: -zflags |
| 898 | |
| 899 | FlagDir Specifies the location of flag graphics. If not specified, |
| 900 | the default is in the 'flags' directory directly under the |
| 901 | output directory being used for the reports. If specified, |
| 902 | the display of flags will be enabled by default. |
| 903 | Command line argument: -z |
| 904 | |
| 905 | DailyGraph This keyword is used to either enable or disable the creation |
| 906 | and display of the Daily Usage graph. Values may be either |
| 907 | 'yes' or 'no', with the default being 'yes'. |
| 908 | |
| 909 | DailyStats This keyword is used to either enable or disable the creation |
| 910 | and display of the Daily Usage statistics table. Values may |
| 911 | be either 'yes' or 'no', with the default being 'yes'. |
| 912 | |
| 913 | HourlyGraph This keyword is used to either enable or disable the creation |
| 914 | and display of the Hourly Usage graph. Values may be either |
| 915 | 'yes' or 'no', with the default being 'yes'. |
| 916 | Command line argument: -G |
| 917 | |
| 918 | HourlyStats This keyword is used to either enable or disable the creation |
| 919 | and display of the Hourly Usage statistics table. Values may |
| 920 | be either 'yes' or 'no', with the default being 'yes'. |
| 921 | Command line argument: -H |
| 922 | |
| 923 | IndexAlias This allows additional 'index.html' aliases to be defined. |
| 924 | Normally, The Webalizer scans for and strips the string |
| 925 | "index." from URLs before processing them (unless disabled |
| 926 | using the DefaultIndex config option below). This turns a |
| 927 | URL such as /somedir/index.html into just /somedir/ which |
| 928 | is really the same URL. This keyword allows _additional_ |
| 929 | names to be treated in the same fashion for sites that use |
| 930 | different default names, such as "home.html". The string |
| 931 | is scanned for anywhere in the URL, so care should be used |
| 932 | if and when you define additional aliases. For example, |
| 933 | if you were to use an alias such as 'home', the URL |
| 934 | /somedir/homepages/brad/home.html would be turned into just |
| 935 | /somedir/ which probably isn't the intended result. Instead, |
| 936 | you should have specified 'home.htm' which would correctly |
| 937 | turn the URL into /somedir/homepages/brad/ like intended. |
| 938 | It should also be noted that specified aliases are scanned |
| 939 | for in EVERY log record... A bunch of aliases will noticeably |
| 940 | degrade performance as each record has to be scanned for |
| 941 | every alias defined. You don't have to specify 'index.' as |
| 942 | it is always the default (unless disabled with the config |
| 943 | option "DefaultIndex" described below). |
| 944 | Command line argument: -I |
| 945 | |
| 946 | DefaultIndex This option is used to enable/disable the use of "index." as |
| 947 | a default index name to be stripped from the end of a URL. |
| 948 | Most sites should not need to use this option, however some |
| 949 | may find it useful, particularly those whose default index |
| 950 | file name is something different, or those sites that use |
| 951 | 'index.php' or similar URLs to generate dynamic content. |
| 952 | This option does not effect any of the names that may be |
| 953 | defined using the IndexAlias option, and those names will |
| 954 | still function as described. Values may be 'yes' or 'no', |
| 955 | with 'yes' being the default. |
| 956 | |
| 957 | MangleAgents The MangleAgents keyword specifies the level of user agent |
| 958 | name mangling, if any. There are 6 levels that may be specified, |
| 959 | each producing a different level of detail displayed. Level 5 |
| 960 | displays only the browser name (MSIE or Mozilla) and the major |
| 961 | version number. Level 4 adds the minor version (single |
| 962 | decimal place). Level 3 adds the minor version to two decimal |
| 963 | places. Level 2 will also add any sub-level designation |
| 964 | (such as Mozilla/3.01Gold or MSIE 3.0b). Level 1 will also |
| 965 | attempt to add the system type. The default level 0 will |
| 966 | leave the user agent field unmodified and produces the |
| 967 | greatest amount of detail. |
| 968 | Command line argument: -M |
| 969 | |
| 970 | SearchEngine This keyword allows specification of search engines and |
| 971 | their query strings. Search strings are obtained from |
| 972 | the referrer field in the record, and in order to work |
| 973 | properly, the Webalizer needs to know what query strings |
| 974 | different search engines use. The SearchEngine allows |
| 975 | you to specify the search engine and its query string |
| 976 | to parse the search string from. The line is formatted |
| 977 | as: "SearchEngine engine-string query-string" where |
| 978 | 'engine-string' is a substring for matching the search |
| 979 | engine with, such as "yahoo.com" or "altavista". The |
| 980 | 'query-string' is the unique query string that is added |
| 981 | to the URL for the search engine, such as "search=" or |
| 982 | "MT=" with the actual search strings appended to the |
| 983 | end. There is no command line option for this keyword. |
| 984 | |
| 985 | SearchCaseI The SearchCaseI option specifies if search strings should |
| 986 | be lowercased (case insensitive) or not. Since most |
| 987 | search engines use case insensitive searches (ie: a |
| 988 | search for "Hello" is the same as "HELLO" or "hello"), |
| 989 | converting to lowercase will improve keyword accuracy, |
| 990 | which is the default. If desired, case sensitivity can |
| 991 | be forced with this option. The value can be 'yes' or |
| 992 | 'no', with 'yes' (case insensitive) being the default. |
| 993 | |
| 994 | Incremental This allows incremental processing to be enabled or disabled. |
| 995 | Incremental processing allows processing partial logs without |
| 996 | the loss of detail data from previous runs in the same month. |
| 997 | This feature saves the 'internal state' of the program so that |
| 998 | it may be restored in following runs. See the section above |
| 999 | titled "Incremental Processing" for additional information. |
| 1000 | The value may be 'yes' or 'no', with the default being 'no'. |
| 1001 | Command line argument: -p |
| 1002 | |
| 1003 | IncrementalName |
| 1004 | Allows specification of the incremental data filename if |
| 1005 | desired. Normally, the file named "webalizer.current' is |
| 1006 | used, kept in the standard output directory. If specified, |
| 1007 | filenames are relative to the standard output directory, |
| 1008 | unless an absolute name is given (ie: starts with '/'). |
| 1009 | |
| 1010 | StripCGI Determines if CGI variables should be stripped from the |
| 1011 | end of URLs or not. Normally, these variables are removed |
| 1012 | from URLs to improve accuracy, however some sites may wish |
| 1013 | to keep them preserved (particularly on highly dynamic |
| 1014 | sites). Values may be either 'yes' or 'no', with 'yes' |
| 1015 | being the default. |
| 1016 | |
| 1017 | TrimSquidURL Allows squid log URLs to be reduced in granularity by |
| 1018 | truncating them after a specified number of '/' path |
| 1019 | separators after the http:// portion. A value of 1 will |
| 1020 | cause all URLs to be summarized by domain only. The |
| 1021 | default value is zero (0), which leaves URLs unmodified. |
| 1022 | |
| 1023 | DNSCache Specifies the DNS cache filename. This name is relative |
| 1024 | to the default output directory unless an absolute name |
| 1025 | is given (ie: starts with '/'). See the DNS.README file |
| 1026 | for additional information. |
| 1027 | Command line argument: -D |
| 1028 | |
| 1029 | DNSChildren The number of DNS children processes to run in order to |
| 1030 | create/update the DNS cache file. If specified, the DNS |
| 1031 | cache filename must also be specified (see above). Use |
| 1032 | a value of zero ('0') to disable. See the DNS.README |
| 1033 | file for additional information. |
| 1034 | Command line argument: -N |
| 1035 | |
| 1036 | CacheIPs Specifies if unresolved addresses should also be cached |
| 1037 | in the DNS database. If enabled, unresolved IP addresses |
| 1038 | will be stored along with resolved addresses. This may |
| 1039 | be useful on some sites that have lots of unresolved IPs |
| 1040 | visiting so they are not looked up each time the program |
| 1041 | is run. Values may be 'yes' or 'no'. Default is 'no'. |
| 1042 | |
| 1043 | CacheTTL Specifies the Time To Live (TTL) value for cached DNS |
| 1044 | entries in days. Default value is 7 (1 week). Can be |
| 1045 | any value between 1 and 100. |
| 1046 | |
| 1047 | GeoDB Controls the use of the native GeoDB geolocation services |
| 1048 | provided by The Webalizer. Values may be 'yes' or 'no' |
| 1049 | with 'no' being the default. |
| 1050 | Command line argument: -j |
| 1051 | |
| 1052 | GeoDBDatabase Specifies and alternate GeoDB database filename to use. |
| 1053 | This is relative to the output directory being used unless |
| 1054 | an absolute path is given (ie: starts with a '/'). |
| 1055 | Command line argument: -J |
| 1056 | |
| 1057 | GeoIP Controls the use of GeoIP geolocation services. If The |
| 1058 | Webalizer was compiled with GeoIP support, it is used by |
| 1059 | default. Values may be 'yes' or 'no'. Default is 'yes'. |
| 1060 | Command line argument: -w |
| 1061 | |
| 1062 | GeoIPDatabase Specifies an alternate GeoIP database filename to use. |
| 1063 | This name is relative to the default output directory |
| 1064 | unless an absolute name is given (ie: starts with '/'). |
| 1065 | Command line argument: -W |
| 1066 | |
| 1067 | |
| 1068 | Top Table Keywords |
| 1069 | ------------------ |
| 1070 | |
| 1071 | TopAgents This allows you to specify how many "Top" user agents are |
| 1072 | displayed in the "Top User Agents" table. The default |
| 1073 | is 15. If you do not want to display user agent statistics, |
| 1074 | specify a value of zero (0). The display of user agents |
| 1075 | will only work if your web server includes this information |
| 1076 | in its log file (ie: a combined log format file). |
| 1077 | Command line argument: -A |
| 1078 | |
| 1079 | AllAgents Will cause a separate HTML page to be generated for all |
| 1080 | normally visible User Agents. A link will be added to |
| 1081 | the bottom of the "Top User Agents" table if enabled. |
| 1082 | Value can be either 'yes' or 'no', with 'no' being the |
| 1083 | default. |
| 1084 | |
| 1085 | TopCountries This allows you to specify how many "Top" countries are |
| 1086 | displayed in the "Top Countries" table. The default is |
| 1087 | 30. If you want to disable the countries table, specify |
| 1088 | a value of zero (0). |
| 1089 | Command line argument: -C |
| 1090 | |
| 1091 | TopReferrers This allows you to specify how many "Top" referrers are |
| 1092 | displayed in the "Top Referrers" table. The default is |
| 1093 | 30. If you want to disable the referrers table, specify |
| 1094 | a value of zero (0). The display of referrer information |
| 1095 | will only work if your web server includes this information |
| 1096 | in its log file (ie: a combined log format file). |
| 1097 | Command line argument: -R |
| 1098 | |
| 1099 | AllReferrers Will cause a separate HTML page to be generated for all |
| 1100 | normally visible Referrers. A link will be added to the |
| 1101 | "Top Referrers" table if enabled. Value can be either |
| 1102 | 'yes' or 'no', with 'no' being the default. |
| 1103 | |
| 1104 | TopSites This allows you to specify how many "Top" sites are |
| 1105 | displayed in the "Top Sites" table. The default is 30. |
| 1106 | If you want to disable the sites table, specify a value |
| 1107 | of zero (0). |
| 1108 | Command line argument: -S |
| 1109 | |
| 1110 | TopKSites Identical to TopSites, except for the 'by KByte' table. |
| 1111 | Default is 10. No command line switch for this one. |
| 1112 | |
| 1113 | AllSites Will cause a separate HTML page to be generated for all |
| 1114 | normally visible Sites. A link will be added to the |
| 1115 | bottom of the "Top Sites" table if enabled. Value can |
| 1116 | be either 'yes' or 'no', with 'no' being the default. |
| 1117 | |
| 1118 | TopURLs This allows you to specify how many "Top" URLs are |
| 1119 | displayed in the "Top URLs" table. The default is 30. |
| 1120 | If you want to disable the URLs table, specify a value |
| 1121 | of zero (0). |
| 1122 | Command line argument: -U |
| 1123 | |
| 1124 | TopKURLs Identical to TopURLs, except for the 'by KByte' table. |
| 1125 | Default is 10. No command line switch for this one. |
| 1126 | |
| 1127 | AllURLs Will cause a separate HTML page to be generated for all |
| 1128 | normally visible URLs. A link will be added to the |
| 1129 | bottom of the "Top URLs" table if enabled. Value can |
| 1130 | be either 'yes' or 'no', with 'no' being the default. |
| 1131 | |
| 1132 | TopEntry Allows you to specify how many "Top Entry Pages" are |
| 1133 | displayed in the table. The default is 10. If you |
| 1134 | want to disable the table, specify a value of zero (0). |
| 1135 | Command line argument: -e |
| 1136 | |
| 1137 | TopExit Allows you to specify how many "Top Exit Pages" are |
| 1138 | displayed in the table. The default is 10. If you |
| 1139 | want to disable the table, specify a value of zero (0). |
| 1140 | Command line argument: -E |
| 1141 | |
| 1142 | TopSearch Allows you to specify how many "Top Search Strings" are |
| 1143 | displayed in the table. The default is 20. If you |
| 1144 | want to disable the table, specify a value of zero (0). |
| 1145 | Only works if using combined log format (ie: contains |
| 1146 | referrer information). |
| 1147 | |
| 1148 | TopUsers This allows you to specify how many "Top" usernames are |
| 1149 | displayed in the "Top Usernames" table. Usernames are |
| 1150 | only available if you use http authentication on your |
| 1151 | web server, or when processing wu-ftpd xferlogs. The |
| 1152 | default value is 20. If you want to disable the Username |
| 1153 | table, specify a value of zero (0). |
| 1154 | |
| 1155 | AllUsers Will cause a separate HTML page to be generated for all |
| 1156 | normally visible usernames. A link will be added to the |
| 1157 | bottom of the "Top Usernames" table if enabled. Value |
| 1158 | can be either 'yes' or 'no', with 'no' being the default. |
| 1159 | |
| 1160 | AllSearchStr Will create a separate HTML page to be generated for all |
| 1161 | normally visible Search Strings. A link will be added |
| 1162 | to the bottom of the "Top Search Strings" table if |
| 1163 | enabled. Value can be either 'yes' or 'no', with 'no' |
| 1164 | being the default. |
| 1165 | |
| 1166 | |
| 1167 | Hide Object Keywords |
| 1168 | -------------------- |
| 1169 | |
| 1170 | These keywords allow you to hide user agents, referrers, sites, URLs |
| 1171 | and usernames from the various "Top" tables. The value for these keywords |
| 1172 | are the same as those used in their command line counterparts. You |
| 1173 | can specify as many of these as you want without limit. Refer to the |
| 1174 | section above on "Command Line Options" for a description of the string |
| 1175 | formatting used as the value. Values cannot exceed 80 characters in |
| 1176 | length. |
| 1177 | |
| 1178 | HideAgent This allows specified user agents to be hidden from the |
| 1179 | "Top User Agents" table. Not very useful, since there |
| 1180 | a zillion different names by which browsers go by today, |
| 1181 | but could be useful if there is a particular user agent |
| 1182 | (ie: robots, spiders, real-audio, etc..) that hits your |
| 1183 | site frequently enough to make it into the top user agent |
| 1184 | listing. This keyword is useless if 1) your log file does |
| 1185 | not provide user agent information or 2) you disable the |
| 1186 | user agent table. |
| 1187 | Command line argument: -a |
| 1188 | |
| 1189 | HideReferrer This allows you to hide specified referrers from the |
| 1190 | "Top Referrers" table. Normally, you would only specify |
| 1191 | your own web server to be hidden, as it is usually the |
| 1192 | top generator of references to your own pages. Of course, |
| 1193 | this keyword is useless if 1) your log file does not include |
| 1194 | referrer information or 2) you disable the top referrers |
| 1195 | table. |
| 1196 | Command line argument: -r |
| 1197 | |
| 1198 | HideSite This allows you to hide specified sites from the "Top |
| 1199 | Sites" table. Normally, you would only specify your own |
| 1200 | web server or other local machines to be hidden, as they |
| 1201 | are usually the highest hitters of your web site, especially |
| 1202 | if you have their browsers home page pointing to it. |
| 1203 | Command line argument: -s |
| 1204 | |
| 1205 | HideAllSites This allows hiding all individual sites from the display, |
| 1206 | which can be useful when a lot of groupings are being |
| 1207 | used (since grouped records cannot be hidden). It is |
| 1208 | particularly useful in conjunction with the GroupDomain |
| 1209 | feature, however can be useful in other situations as well. |
| 1210 | Value can be either 'yes' or 'no', with 'no' the default. |
| 1211 | Command line argument: -X |
| 1212 | |
| 1213 | HideURL This allows you to hide URLs from the "Top URLs" table. |
| 1214 | Normally, this is used to hide items such as graphic files, |
| 1215 | audio files or other 'non-html' files that are transferred |
| 1216 | to the visiting user. |
| 1217 | Command line argument: -u |
| 1218 | |
| 1219 | HideUser This allows you to hide Usernames from the "Top Usernames" |
| 1220 | table. Usernames are only available if you use http based |
| 1221 | authentication on your web server. |
| 1222 | |
| 1223 | |
| 1224 | Group Object Keywords |
| 1225 | --------------------- |
| 1226 | |
| 1227 | The Group* keywords allow object grouping based on Site, URL, Referrer, |
| 1228 | User Agent and Usernames. Combined with the Hide* keywords, you can |
| 1229 | customize exactly what will be displayed in the 'Top' tables. For example, |
| 1230 | to only display totals for a particular directory, use a GroupURL and |
| 1231 | HideURL with the same value (ie: '/help/*'). Group processing is only |
| 1232 | done after the individual record has been fully processed, so name mangling |
| 1233 | and site total updates have already been performed. Because of this, groups |
| 1234 | are not counted in the main site total (as that would cause duplication). |
| 1235 | Groups can be displayed in bold and shaded as well. Grouped records are |
| 1236 | not, by default, hidden from the report. This allows you to display a |
| 1237 | grouped total, while still being able to see the individual records, even |
| 1238 | if they are part of the group. If you want to hide the detail records, |
| 1239 | follow the Group* directive with a Hide* one using the same value. There |
| 1240 | are no command line switches for these keywords. The Group* keywords also |
| 1241 | accept an optional label to be displayed instead of the actual value used. |
| 1242 | This label should be separated from the value by at least one whitespace |
| 1243 | character, such as a space or tab character. If the match string contains |
| 1244 | whitespace (spaces or tabs), the string should be quoted, using either |
| 1245 | single or double quotes. See the sample configuration file for examples. |
| 1246 | |
| 1247 | GroupReferrer Allows grouping Referrers. Can be handy for some of the |
| 1248 | major search engines that have multiple host names a |
| 1249 | referral could come from. |
| 1250 | |
| 1251 | GroupURL This keyword allows grouping URLs. Useful for grouping |
| 1252 | complete directory trees. |
| 1253 | |
| 1254 | GroupSite This keywords allows grouping Sites. Most used for |
| 1255 | grouping top level domains and unresolved IP address |
| 1256 | for local dial-ups, etc... |
| 1257 | |
| 1258 | GroupAgent Groups User Agents. A handy example of how you could use |
| 1259 | this one is to use "Mozilla" and "MSIE" as the values for |
| 1260 | GroupAgent and HideAgent keywords. Make sure you put the |
| 1261 | "MSIE" one first. |
| 1262 | |
| 1263 | GroupDomains Allows automatic grouping of domains. The numeric value |
| 1264 | represents the level of grouping, and can be thought of |
| 1265 | as 'the number of dots' to display. A 1 will display |
| 1266 | second level domains only (xxx.xxx), a 2 will display |
| 1267 | third level domains (xxx.xxx.xxx) etc... The default |
| 1268 | value of 0 disables any domain grouping. |
| 1269 | Command line argument: -g |
| 1270 | |
| 1271 | GroupUser Allows grouping of usernames. Combined with a group |
| 1272 | name, this can be handy for displaying statistics on |
| 1273 | a particular group of users without displaying their |
| 1274 | real usernames. |
| 1275 | |
| 1276 | GroupShading Allows shading of table rows for groups. Value can be |
| 1277 | 'yes' or 'no', with the default being 'yes'. |
| 1278 | |
| 1279 | GroupHighlight Allows bolding of table rows for groups. Value can be |
| 1280 | 'yes' or 'no', with the default being 'yes'. |
| 1281 | |
| 1282 | |
| 1283 | Ignore/Include Object Keywords |
| 1284 | ---------------------- |
| 1285 | |
| 1286 | These keywords allow you to completely ignore log records when generating |
| 1287 | statistics, or to force their inclusion regardless of ignore criteria. |
| 1288 | Records can be ignored or included based on site, URL, user agent, referrer |
| 1289 | and username. Be aware that by choosing to ignore records, the accuracy of |
| 1290 | the generated statistics become skewed, making it impossible to produce |
| 1291 | an accurate representation of load on the web server. These keywords |
| 1292 | behave identical to the Hide* keywords above, where the value can have |
| 1293 | a leading or trailing wildcard '*'. These keywords, like the Hide* ones, |
| 1294 | have an absolute limit of 80 characters for their values. These keywords |
| 1295 | do not have any command line switch counterparts, so they may only be |
| 1296 | specified in a configuration file. It should also be pointed out that |
| 1297 | using the Ignore/Include combination to selectively exclude an entire |
| 1298 | site while including a particular 'chunk' is _extremely_ inefficient, |
| 1299 | and should be avoided. Try grep'ing the records into a separate file |
| 1300 | and process it instead. |
| 1301 | |
| 1302 | IgnoreSite This allows specified sites to be completely ignored from |
| 1303 | the generated statistics. |
| 1304 | |
| 1305 | IgnoreURL This allows specified URLs to be completely ignored from |
| 1306 | the generated statistics. One use for this keyword would |
| 1307 | be to ignore all hits to a 'temporary' directory where |
| 1308 | development work is being done, but is not accessible to |
| 1309 | the outside world. |
| 1310 | |
| 1311 | IgnoreReferrer This allows records to be ignored based on the referrer |
| 1312 | field. |
| 1313 | |
| 1314 | IgnoreAgent This allows specified User Agent records to be completely |
| 1315 | ignored from the statistics. Maybe useful if you really |
| 1316 | don't want to see all those hits from MSIE :) |
| 1317 | |
| 1318 | IgnoreUser This allows specified username records to be completely |
| 1319 | ignored from the statistics. Usernames can only be used |
| 1320 | if you use http authentication on your server. |
| 1321 | |
| 1322 | IncludeSite Force the record to be processed based on hostname. This |
| 1323 | takes precedence over the Ignore* keywords. |
| 1324 | |
| 1325 | IncludeURL Force the record to be processed based on URL. This takes |
| 1326 | precedence over the Ignore* keywords. |
| 1327 | |
| 1328 | IncludeReferrer Force the record to be processed based on referrer. |
| 1329 | This takes precedence over the Ignore* keywords. |
| 1330 | |
| 1331 | IncludeAgent Force the record to be processed based on user agent. |
| 1332 | This takes precedence over the Ignore* keywords. |
| 1333 | |
| 1334 | IncludeUser Force the record to be processed based on username. |
| 1335 | Usernames are only available if you use http based |
| 1336 | authentication on your server. This takes precedence over |
| 1337 | the Ignore* keywords. |
| 1338 | |
| 1339 | |
| 1340 | Dump Object Keywords |
| 1341 | -------------------- |
| 1342 | |
| 1343 | The Dump* Keywords allow text files to be generated that can then be used |
| 1344 | for import into most database, spreadsheet and other external programs. |
| 1345 | The file is a standard tab delimited text file, meaning that each column |
| 1346 | is separated by a tab (0x09) character. A header record may be included |
| 1347 | if required, using the 'DumpHeader' keyword. Since these files contain |
| 1348 | all records that have been processed, including normally hidden records, |
| 1349 | an alternate location for the files can be specified using the 'DumpPath' |
| 1350 | keyword, otherwise they will be located in the default output directory. |
| 1351 | |
| 1352 | DumpPath Specifies an alternate location for the dump files. The |
| 1353 | default output location will be used otherwise. The value |
| 1354 | is the path portion to use, and normally should be an |
| 1355 | absolute path (ie: has a leading '/' character), however |
| 1356 | relative path names can be used as well, and will be |
| 1357 | relative to the output directory location. |
| 1358 | |
| 1359 | DumpExtension Allows the dump filename extensions to be specified. The |
| 1360 | default extension is "tab", however may be changed with |
| 1361 | this option. |
| 1362 | |
| 1363 | DumpHeader Allows a header record to be written as the first record |
| 1364 | of the file. Value can be either 'yes' or 'no', with |
| 1365 | the default being 'no'. |
| 1366 | |
| 1367 | DumpSites Dump tab delimited sites file. Value can be either 'yes' |
| 1368 | or 'no', with the default being 'no'. The filename used |
| 1369 | is site_YYYYMM.tab (YYYY=year, MM=month). |
| 1370 | |
| 1371 | DumpURLs Dump tab delimited url file. Value can be either 'yes' or |
| 1372 | 'no', with the default being 'no'. The filename used is |
| 1373 | url_YYYYMM.tab (YYYY=year, MM=month). |
| 1374 | |
| 1375 | DumpReferrers Dump tab delimited referrer file. Value can be either |
| 1376 | 'yes' or 'no', with the default being 'no'. Filename |
| 1377 | used is ref_YYYYMM.tab (YYYY=year, MM=month). Referrer |
| 1378 | information is only available if present in the log |
| 1379 | file (ie: combined web server log). |
| 1380 | |
| 1381 | DumpAgents Dump tab delimited user agent file. Value can be either |
| 1382 | 'yes' or 'no', with the default being 'no'. Filename |
| 1383 | used is agent_YYYYMM.tab (YYYY=year, MM=month). User |
| 1384 | agent information is only available if present in the |
| 1385 | log file (ie: combined web server log). |
| 1386 | |
| 1387 | DumpUsers Dump tab delimited username file. Value can be either |
| 1388 | 'yes' or 'no', with the default being 'no'. Filename |
| 1389 | used is user_YYYYMM.tab (YYYY=year, MM=month). The |
| 1390 | username data is only available if processing a wu-ftpd |
| 1391 | xferlog or http authentication is used on the web server |
| 1392 | and that information is present in the log. |
| 1393 | |
| 1394 | DumpSearchStr Dump tab delimited search string file. Value can be |
| 1395 | either 'yes' or 'no', with the default being 'no'. |
| 1396 | Filename used is search_YYYYMM.tab (YYYY=year, MM=month). |
| 1397 | the search string data is only available if referrer |
| 1398 | information is present in the log being processed and |
| 1399 | recognized search engines were found and processed. |
| 1400 | |
| 1401 | |
| 1402 | |
| 1403 | HTML Generation Keywords |
| 1404 | ------------------------ |
| 1405 | |
| 1406 | These keywords allow you to customize the HTML code that The Webalizer |
| 1407 | produces, such as adding a corporate logo or links to other web pages. |
| 1408 | You can specify as many of these keywords as you like, and they will be |
| 1409 | used in the order that they are found in the file. Values cannot exceed |
| 1410 | 80 characters in length, so you may have to break long lines up into two |
| 1411 | or more lines. There are no command line counterparts to these keywords. |
| 1412 | |
| 1413 | HTMLExtension Allows generated pages to use something other than the |
| 1414 | default 'html' extension for the filenames. Do not |
| 1415 | include the leading period ('.') when you specify the |
| 1416 | extension. |
| 1417 | Command line argument: -x |
| 1418 | |
| 1419 | HTMLPre Allows code to be inserted at the very beginning of the |
| 1420 | HTML files. Defaults to the standard HTML 3.2 DOCTYPE |
| 1421 | record. Be careful not to include any HTML here, as it |
| 1422 | is inserted _before_ the <HTML> tag in the file. Use it |
| 1423 | for server-side scripting capabilities, such as php3, to |
| 1424 | insert scripting files and other directives. |
| 1425 | |
| 1426 | HTMLHead Allows you to insert HTML code between the <HEAD></HEAD> |
| 1427 | block. There is no default. Useful for adding scripts |
| 1428 | to the HTML page, such as Javascript or php3, or even |
| 1429 | just for adding a few META tags to the document. |
| 1430 | |
| 1431 | HTMLBody This keyword defines HTML code to be placed immediately |
| 1432 | after the <HEAD> section of the report, just before the |
| 1433 | title and "summary period/generated on" lines. If used, |
| 1434 | the first HTMLHead line MUST include a <BODY> tag. Put |
| 1435 | whatever else you want in subsequent lines, but keep in |
| 1436 | mind the placement of this code in relation to the title |
| 1437 | and other aspects of the web page. Some typical uses |
| 1438 | are to change the page colors and possibly add a corporate |
| 1439 | logo (graphic) in the top right. If not specified, a |
| 1440 | default <BODY> tag is used that defines page color, text |
| 1441 | color and link colors (see "sample.conf" file for example). |
| 1442 | |
| 1443 | HTMLPost This keyword defines HTML code that is placed after the |
| 1444 | title and "summary period/generated on" lines, just before |
| 1445 | the initial horizontal rule <HR> tag. Normally this keyword |
| 1446 | isn't needed, but is provided in case you included a large |
| 1447 | graphic or some other weird formatting tag in the HTMLHead |
| 1448 | section that needs to be cleaned up or terminated before the |
| 1449 | main report section. |
| 1450 | |
| 1451 | HTMLTail This keyword defines HTML code that is placed at the bottom |
| 1452 | right side of the report. It is inserted in a <TABLE> section |
| 1453 | between table data <TD>..</TD> tags, and is top and right |
| 1454 | aligned within the table. Normally this keyword is used to |
| 1455 | provide a link back to your home page or insert a small |
| 1456 | graphic at the bottom right of the page. |
| 1457 | |
| 1458 | HTMLEnd This allows insertion of closing code, at the very end of |
| 1459 | the page. The default is to put the closing </BODY> and |
| 1460 | </HTML> tags. If specified, you _must_ specify these tags |
| 1461 | yourself. |
| 1462 | |
| 1463 | LinkReferrer This specifies if the referrers listed in the top referrer |
| 1464 | table should be displayed as plain text, or as a link to the |
| 1465 | referrer. Values can be either 'yes' or 'no', with 'no' |
| 1466 | being the default. |
| 1467 | |
| 1468 | |
| 1469 | Graph Color Commands |
| 1470 | -------------------- |
| 1471 | |
| 1472 | These keywords allow altering the colors used in the various graphs |
| 1473 | produced by the Webalizer. The value is specified as a standard HTML |
| 1474 | RGB hexdecimal color string, without the leading '#' character. The |
| 1475 | value is case insensitive. If not specified, the default color shown |
| 1476 | will be used. |
| 1477 | |
| 1478 | ColorHit Color used for 'Hits'. Default is '00805C' (green) |
| 1479 | |
| 1480 | ColorFile Color used for 'Files'. Default is '0040FF' (blue) |
| 1481 | |
| 1482 | ColorSite Color used for 'Sites'. Default is 'FF8000' (orange) |
| 1483 | |
| 1484 | ColorKbyte Color used for 'KBytes'. Default is 'FF0000' (red) |
| 1485 | |
| 1486 | ColorPage Color used for 'Pages'. Default is '00E0FF' (cyan) |
| 1487 | |
| 1488 | ColorVisit Color used for 'Visits'. Default is 'FFFF00' (yellow) |
| 1489 | |
| 1490 | ColorMisc Color used for miscellaneous titles in various 'Top' |
| 1491 | tables (not graphs). Default is '00E0FF' (cyan) |
| 1492 | |
| 1493 | PieColor1 Pie Chart color #1. Default is '800080' (purple) |
| 1494 | |
| 1495 | PieColor2 Pie Chart color #2. Default is '80FFC0' (lt. green) |
| 1496 | |
| 1497 | PieColor3 Pie Chart color #3. Default is 'FF00FF' (lt. purple) |
| 1498 | |
| 1499 | PieColor4 Pie Chart color #4. Default is 'FFC080' (tan) |
| 1500 | |
| 1501 | |
| 1502 | -------------------------------------------------------------------------- |
| 1503 | |
| 1504 | |
| 1505 | Notes on Web Log Files |
| 1506 | ---------------------- |
| 1507 | |
| 1508 | The Webalizer supports CLF log formats, which should work for just |
| 1509 | about everyone. If you want User Agent or Referrer information, you |
| 1510 | need to make sure your web server supplies this information in its |
| 1511 | log file, and in a format that the Webalizer can understand. While |
| 1512 | The Webalizer will try to handle many of the subtle variations in |
| 1513 | log formats, some will not work at all. Most web servers output |
| 1514 | CLF format logs by default. For Apache, in order to produce the |
| 1515 | proper log format, add the following to the httpd.conf file: |
| 1516 | |
| 1517 | LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"" |
| 1518 | |
| 1519 | This instructs the Apache web server to produce a 'combined' log |
| 1520 | that includes the referrer and user agent information on the end of |
| 1521 | each record, enclosed in quotes (This is the standard recommended |
| 1522 | by both Apache and NCSA). Netscape and other web servers have |
| 1523 | similar capabilities to alter their log formats. (note: the above |
| 1524 | works for apache servers up to V1.2. V1.3 and higher now have additional |
| 1525 | ways to specify log formats... refer to included documentation). |
| 1526 | |
| 1527 | Notes on FTP Log Files |
| 1528 | ---------------------- |
| 1529 | |
| 1530 | The Webalizer supports ftp logs produced by wu-ftpd, proftpd and others, |
| 1531 | as a standard 'xferlog'. To process an ftp log, you must either use the |
| 1532 | -Ff command line option or have "LogType ftp" in your configuration file. |
| 1533 | It is recommended that you create a separate configuration file for ftp |
| 1534 | analysis, since the values used for your web server will most likely not |
| 1535 | be suited for ftp log analysis (ie: page types, hostname, etc.. should |
| 1536 | be different). |
| 1537 | |
| 1538 | Because of the difference in web and ftp logs, there are a few limitations: |
| 1539 | |
| 1540 | o Because there is no concept of a 'response code' in ftp world, response |
| 1541 | codes are restricted to either 200 (OK) or 206 (Partial Content), based |
| 1542 | on the completion status found in xferlog (for wu-ftpd, 'i'=incomplete |
| 1543 | and will generate a 206, 'c'=complete and will generate a 200). If your |
| 1544 | ftp server doesn't supply the completion status, all requests will be |
| 1545 | assigned a response code of 200. This allows the usage graph to display |
| 1546 | all transfer requests (hits), and how many of those completed in success |
| 1547 | (files - ie: 200 response codes). |
| 1548 | |
| 1549 | o Page totals won't accurately reflect reality, since there isn't really |
| 1550 | the concept of a 'page' in regards to ftp services. I have found that |
| 1551 | setting the PageType value to "README", "FIRST", etc... seems to work |
| 1552 | fairly well however, and will give a pretty good indication of how |
| 1553 | many 'non-binary' files were requested. Of course, the content of your |
| 1554 | ftp site will be different, so your results may vary. |
| 1555 | |
| 1556 | o Visit totals also won't accurately reflect reality, since visits are |
| 1557 | triggered on PageType requests (see above). What you usually wind up |
| 1558 | with is visits=sites in most cases. |
| 1559 | |
| 1560 | o Entry/Exit pages will not be calculated for ftp logs. |
| 1561 | |
| 1562 | o For obvious reasons, referrers and user agents are not supported. |
| 1563 | |
| 1564 | o You _cannot_ analyze both web and ftp logs at the same time.. they must |
| 1565 | be done separately in different runs. |
| 1566 | |
| 1567 | |
| 1568 | Notes on Referrers |
| 1569 | ------------------ |
| 1570 | |
| 1571 | Referrers are weird critters... They take many shapes and forms, which makes |
| 1572 | it much harder to analyze than a typical URL, which at least has some |
| 1573 | standardization. What is contained in the referrer field of your log |
| 1574 | files varies depending on many factors, such as what site did the referral, |
| 1575 | what type of system it comes from and how the actual referral was generated. |
| 1576 | Why is this? Well, because a user can get to your site in many ways... They |
| 1577 | may have your site bookmarked in their browser, they may simply type your |
| 1578 | sites URL field in their browser, they could have clicked on a link on some |
| 1579 | remote web page or they may have found your site from one of the many search |
| 1580 | engines and site indexes found on the web. The Webalizer attempts to deal |
| 1581 | with all this variation in an intelligent way by doing certain things to |
| 1582 | the referrer string which makes it easier to analyze. Of course, if your |
| 1583 | web server doesn't provide referrer information, you probably don't really |
| 1584 | care and are asking yourself why you are reading this section... |
| 1585 | |
| 1586 | Most referrers will take the form of "http://somesite.com/somepage.html", |
| 1587 | which is what you will get if the user clicks on a link somewhere on the |
| 1588 | web in order to get to your site. Some will be a variation of this, and |
| 1589 | look something like "file:/some/such/sillyname", which is a reference from |
| 1590 | a HTML document on the users local machine. Several variations of this can |
| 1591 | be used, depending on what type of system the user has, if he/she is on |
| 1592 | a local network, the type of network, etc... To complicate things even |
| 1593 | more, dynamic HTML documents and HTML documents that are generated by |
| 1594 | CGI scripts or external programs produce lots of extra information which |
| 1595 | is tacked on to the end of the referrer string in an almost infinite number |
| 1596 | of ways. If the user just typed your URL into their browser or clicked on |
| 1597 | a bookmark, there won't be any information in the referrer field and will |
| 1598 | take the form "-". |
| 1599 | |
| 1600 | In order to handle all these variations, The Webalizer parses the referrer |
| 1601 | field in a certain way. First, if the referrer string begins with "http", |
| 1602 | it assumes it is a normal referral and converts the "http://" and following |
| 1603 | hostname to lowercase in order to simplify hiding if desired. For example, |
| 1604 | the referrer "HTTP://WWW.MyHost.Com/This/Is/A/HTML/Document.html" will become |
| 1605 | "http://www.myhost.com/This/Is/A/HTML/Document.html". Notice that only the |
| 1606 | "http://" and hostname are converted to lower case... The rest of the |
| 1607 | referrer field is left alone. This follows standard convention, as the |
| 1608 | actual method (HTTP) and hostname are always case insensitive, while the |
| 1609 | document name portion is case sensitive. |
| 1610 | |
| 1611 | Referrers that came from search engines, dynamic HTML documents, CGI |
| 1612 | scripts and other external programs usually tack on additional information |
| 1613 | that it used to create the page. A common example of this can be found |
| 1614 | in referrals that come from search engines and site indexes common on the |
| 1615 | web. Sometimes, these referrers URLs can be several hundred characters |
| 1616 | long and include all the information that the user typed in to search for |
| 1617 | your site. The Webalizer deals with this type of referrer by stripping |
| 1618 | off all the query information, which starts with a question mark '?'. |
| 1619 | The Referrer "http://search.yahoo.com/search?p=usa%26global%26link" will |
| 1620 | be converted to just "http://search.yahoo.com/search". |
| 1621 | |
| 1622 | When a user comes to your site by using one of their bookmarks or by |
| 1623 | typing in your URL directly into their browser, the referrer field is |
| 1624 | blank, and looks like "-". Most sites will get more of these referrals |
| 1625 | than any other type. The Webalizer converts this type of referral into |
| 1626 | the string "- (Direct Request)". This is done in order to make it easier |
| 1627 | to hide via a command line option or configuration file option. This is |
| 1628 | because the character "-" is a valid character elsewhere in a referrer |
| 1629 | field, and if not turned into something unique, could not be hidden without |
| 1630 | possibly hiding other referrers that shouldn't be. |
| 1631 | |
| 1632 | |
| 1633 | Notes on Character Escaping |
| 1634 | --------------------------- |
| 1635 | |
| 1636 | The HTTP protocol defines certain ways that URLs can look and behave. To |
| 1637 | some extent, referrer fields follow most of the same conventions. Character |
| 1638 | escaping is a technique by which non-printable or other non-ASCII (and even |
| 1639 | some ASCII) characters can be used in a URL. This is done by placing the |
| 1640 | Hexadecimal value of the character in the URL, preceded by a percent sign '%'. |
| 1641 | Since Hex values are made up of ASCII characters, any character can be |
| 1642 | escaped to ensure only printable ASCII characters are present in the URL. |
| 1643 | Some systems take this concept to the extreme and escape all sorts of stuff, |
| 1644 | even characters that don't need to be escaped. To deal with this, The |
| 1645 | Webalizer will un-escape URLs and referrers before being processed. For |
| 1646 | Example, the URL "/www.webalizer.org/%7Efoo/bar.html" is the same URL as |
| 1647 | "/www.webalizer.org/~foo/bar.html", a very common form of a URL to access |
| 1648 | users web pages. If the URLs were not un-escaped, they would be treated as |
| 1649 | two separate documents, even though they are really one and the same. |
| 1650 | |
| 1651 | |
| 1652 | Search String Analysis |
| 1653 | ---------------------- |
| 1654 | |
| 1655 | The Webalizer will do a minimal analysis on referrer strings that |
| 1656 | it finds, looking for well known search string patterns. Most of |
| 1657 | the major search engines are supported, such as Yahoo!, Altavista, |
| 1658 | Lycos, etc... Unfortunately, search engines are always changing |
| 1659 | their internal/CGI query formats, new search engines are coming on |
| 1660 | line every day, and the ability to detect _all_ search strings is |
| 1661 | nearly impossible. However, it should be accurate enough to give |
| 1662 | a good indication of what users were searching for when they stumbled |
| 1663 | across your site. Note: as of version 1.31, search engines can now |
| 1664 | be specified within a configuration file. See the sample.conf file |
| 1665 | for examples of how to specify additional search engines. |
| 1666 | |
| 1667 | |
| 1668 | |
| 1669 | Notes on Visits/Entry/Exit Figures |
| 1670 | ---------------------------------- |
| 1671 | |
| 1672 | The majority of data analyzed and reported on by The Webalizer is |
| 1673 | as accurate and correct as possible based on the input log file. |
| 1674 | However, due to the limitation of the HTTP protocol, the use of |
| 1675 | firewalls, proxy servers, multi-user systems, the rotation of your |
| 1676 | log files, and a myriad of other conditions, some of these numbers |
| 1677 | cannot, without absolute accuracy, be calculated. In particular, |
| 1678 | Visits, Entry Pages and Exit Pages are suspect to random errors |
| 1679 | due to the above and other conditions. The reason for this is |
| 1680 | twofold, 1) Log files are finite in size and time interval, and |
| 1681 | 2) There is no way to distinguish multiple individual users apart |
| 1682 | given only an IP address. Because log files are finite, they have |
| 1683 | a beginning and ending, which can be represented as a fixed time |
| 1684 | period. There is no way of knowing what happened previous to this |
| 1685 | time period, nor is it possible to predict future events based on |
| 1686 | it. Also, because it is impossible to distinguish individual users |
| 1687 | apart, multiple users that have the same IP address all appear to |
| 1688 | be a single user, and are treated as such. This is most common where |
| 1689 | corporate users sit behind a proxy/firewall to the outside world, |
| 1690 | and all requests appear to come from the same location (the address |
| 1691 | of the proxy/firewall itself). Dynamic IP assignment (used with |
| 1692 | dial-up Internet accounts) also present a problem, since the same |
| 1693 | user will appear as to come from multiple places. |
| 1694 | |
| 1695 | For example, suppose two users visit your server from XYZ company, |
| 1696 | which has their network connected to the Internet by a proxy server |
| 1697 | 'fw.xyz.com'. All requests from the network look as though they |
| 1698 | originated from 'fw.xyz.com', even though they were really initiated |
| 1699 | from two separate users on different PCs. The Webalizer would |
| 1700 | see these requests as from the same location, and would record only |
| 1701 | 1 visit, when in reality, there were two. Because entry and exit |
| 1702 | pages are calculated in conjunction with visits, this situation |
| 1703 | would also only record 1 entry and 1 exit page, when in reality, |
| 1704 | there should be 2. |
| 1705 | |
| 1706 | As another example, say a single user at XYZ company is surfing |
| 1707 | around your website.. They arrive at 11:52pm the last day of |
| 1708 | the month, and continue surfing until 12:30am, which is now a |
| 1709 | new day (in a new month). Since a common practice is to rotate |
| 1710 | (save then clear) the server logs at the end of the month, you |
| 1711 | now have the users visit logged in two different files (current |
| 1712 | and previous months). Because of this (and the fact that the |
| 1713 | Webalizer clears history between months), the first page the |
| 1714 | user requests after midnight will be counted as an entry page. |
| 1715 | This is unavoidable, since it is the first request seen by that |
| 1716 | particular IP address in the new month. |
| 1717 | |
| 1718 | For the most part, the numbers shown for visits, entry and exit |
| 1719 | pages are pretty good 'guesses', even though they may not be 100% |
| 1720 | accurate. They do provide a good indication of overall trends, |
| 1721 | and shouldn't be that far off from the real numbers to count much. |
| 1722 | You should probably consider them as the 'minimum' amount possible, |
| 1723 | since the actual (real) values should always be equal or greater |
| 1724 | in all cases. |
| 1725 | |
| 1726 | |
| 1727 | Exporting Webalizer Data |
| 1728 | ------------------------ |
| 1729 | |
| 1730 | The Webalizer now has the ability to dump all object tables to tab |
| 1731 | delimited ASCII text files, which can then be imported into most |
| 1732 | popular database and spreadsheet programs. The files are not normally |
| 1733 | produced, as on some sites they could become quite large, and are only |
| 1734 | enabled by the use of the Dump* configuration keywords. The filename |
| 1735 | extensions default to '.tab' however may be changed using the |
| 1736 | 'DumpExtension' keyword. Since this data contains all items, even |
| 1737 | those normally hidden, it may not be desirable to have them located |
| 1738 | in the output directory where they may be visible to normal web users.. |
| 1739 | For this reason, the 'DumpPath' configuration keyword is available, |
| 1740 | and allows the placement of these files somewhere outside the normal |
| 1741 | web server document tree. An optional 'header' record may be written |
| 1742 | to these files as well, and is useful when the data is to be imported |
| 1743 | into a spreadsheet.. databases will not normally need the header. If |
| 1744 | enabled, the header is simply the column names as the first record of |
| 1745 | the file, tab separated. |
| 1746 | |
| 1747 | |
| 1748 | Log files and The Webalizer |
| 1749 | --------------------------- |
| 1750 | |
| 1751 | Most sites will choose to have The Webalizer run from cron at specified |
| 1752 | intervals. Care should be taken to ensure that data is not lost as a |
| 1753 | result of log file rotations. A suggested practice is to rotate your |
| 1754 | web server logs at the end of each month as close to midnight as possible, |
| 1755 | then have The Webalizer process the 'end of month' log file before running |
| 1756 | statistics on the new, current log. On our systems, a shell script called |
| 1757 | 'rotate_logs' is run at midnight, the end of each month. This script file |
| 1758 | looks like: |
| 1759 | |
| 1760 | ------------------------- file: rotate_logs ------------------------------ |
| 1761 | #!/bin/sh |
| 1762 | |
| 1763 | # halt the server |
| 1764 | kill `cat /var/lib/httpd/logs/httpd.pid` |
| 1765 | |
| 1766 | # define backup names |
| 1767 | OLD_ACCESS_LOG=/var/lib/httpd/logs/old/access_log.`date +%y%m%d-%H%M%S` |
| 1768 | OLD_ERROR_LOG=/var/lib/httpd/logs/old/error_log.`date +%y%m%d-%H%M%S` |
| 1769 | |
| 1770 | # make end of month copy for analyzer |
| 1771 | cp /var/lib/httpd/logs/access_log /var/lib/httpd/logs/access_log.backup |
| 1772 | |
| 1773 | # move files to archive directory |
| 1774 | mv /var/lib/httpd/logs/access_log `echo $OLD_ACCESS_LOG` |
| 1775 | mv /var/lib/httpd/logs/error_log `echo $OLD_ERROR_LOG` |
| 1776 | |
| 1777 | # restart web server |
| 1778 | /usr/sbin/httpd |
| 1779 | |
| 1780 | # compress the archived files |
| 1781 | /bin/gzip $OLD_ACCESS_LOG |
| 1782 | /bin/gzip $OLD_ERROR_LOG |
| 1783 | ------------------------- end of file ------------------------------------ |
| 1784 | |
| 1785 | This script first stops the web server using a 'kill' command. Apache |
| 1786 | keeps the PID of the server in the file httpd.pid, so we use it as the |
| 1787 | argument for the kill. Next, it defines some names for the backup files, |
| 1788 | which are basically the name of the files with the date and time appended |
| 1789 | to the end of them. It then makes a copy of the log file, appended with |
| 1790 | '.backup' in the log directory, moves the current log files to an archive |
| 1791 | directory (/var/lib/httpd/logs/old) and restarts the server. This setup |
| 1792 | allows the web server to be down for the minimum amount of time needed, |
| 1793 | which is important for busy sites. If you don't want to stop the server, |
| 1794 | you can remove the initial 'kill' command, and replace the '/usr/sbin/httpd' |
| 1795 | line with "kill -1 `cat /var/lib/httpd/logs/httpd.pid`" command instead, |
| 1796 | On most web servers, this will cause a restart of the server and create |
| 1797 | the new log files in the process... |
| 1798 | |
| 1799 | At this point, we have made copies of the previous months logs, the web |
| 1800 | server is going about its business as usual, and we have all the time in |
| 1801 | the world to do any other additional processing we want. The last two |
| 1802 | lines of the script compress the archived logs using the GNU zip program |
| 1803 | (gzip). Remember, we still have a copy of the log which we can now run |
| 1804 | The Webalizer on without having to do any further processing. |
| 1805 | |
| 1806 | Next, we define two crontab entries. The first runs the above 'rotate_logs' |
| 1807 | script at midnight at the end of the month. The second runs The Webalizer |
| 1808 | on the '.backup' log file created above at 5 minutes after midnight. This |
| 1809 | gives other end of month processing jobs a chance to run so we don't bog |
| 1810 | the system down too much. If you have lots of end of month stuff going on, |
| 1811 | you can change the timing to suit your needs. The crontab entries look |
| 1812 | something like: |
| 1813 | |
| 1814 | ------------------------- crontab entries -------------------------------- |
| 1815 | # Rotate web server logs and run monthly analysis |
| 1816 | 0 0 1 * * /usr/local/adm/rotate_logs |
| 1817 | 5 0 1 * * /usr/bin/webalizer -Q /var/lib/httpd/logs/access_log.backup |
| 1818 | ------------------------- end of crontab --------------------------------- |
| 1819 | |
| 1820 | As you can see, the log rotations occur at midnight, and the analysis |
| 1821 | is done at 5 minutes after. Once you verify that The Webalizer ran |
| 1822 | successfully, the access_log.backup file can be deleted as it isn't |
| 1823 | needed any more. If you need to re-run the analysis, you still have |
| 1824 | the compressed archive copy that the shell script created. In order |
| 1825 | for the above analysis to work properly, you should have already |
| 1826 | created an /etc/webalizer.conf configuration file suitable for your |
| 1827 | site, or otherwise specify configuration options or a configuration |
| 1828 | file on the crontab command line above. |
| 1829 | |
| 1830 | If you want The Webalizer to be run more often than once a month, you |
| 1831 | can specify additional crontab entries to do this as well. Care should |
| 1832 | be taken however to ensure that The Webalizer is not running when the |
| 1833 | end of month processing above occurs, or unpredictable results may |
| 1834 | happen (such as an inability to rotate the logs due to a file lock). |
| 1835 | The easiest way is to run it on the half hour with a crontab entry like: |
| 1836 | |
| 1837 | 30 * * * * /usr/bin/webalizer |
| 1838 | |
| 1839 | |
| 1840 | Reverse DNS Lookups |
| 1841 | ------------------- |
| 1842 | |
| 1843 | The Webalizer fully supports both IPv4 and IPv6 DNS lookups, and |
| 1844 | maintains a cache of those lookups to reduce processing the same |
| 1845 | addresses in subsequent runs. The cache file can be created at |
| 1846 | run-time, or may be created before running the webalizer using either |
| 1847 | the stand alone 'webazolver' program, or The Webalizer (DNS) Cache |
| 1848 | file Manager program 'wcmgr'. In order to perform reverse lookups, |
| 1849 | a DNS Cache file must be specified, either on the command line or in |
| 1850 | a configuration file. In order to create/update the cache file at |
| 1851 | run-time, the number of DNS Children must also be specified, and can |
| 1852 | be anything between 1 and 100. This specifies the number of child |
| 1853 | processes to be forked, each of which will perform network DNS |
| 1854 | queries in order to lookup up the addresses and update the cache. |
| 1855 | Cached entries that are older than a specified TTL (time to live) |
| 1856 | will be expired, and if encountered again in a log, will be looked |
| 1857 | up at that time in order to 'freshen' them (verify the name is still |
| 1858 | the same and update its timestamp). The default TTL is 7 days, however |
| 1859 | may be set to anything between 1 and 100 days. Using the 'wcmgr' |
| 1860 | program, entries may also be marked as 'permanent', in which case |
| 1861 | they will persist (with an infinite TTL) in the cache until manually |
| 1862 | removed. See the file DNS.README for additional information. |
| 1863 | |
| 1864 | |
| 1865 | Geolocation Lookups |
| 1866 | ------------------- |
| 1867 | |
| 1868 | The Webalizer has the ability to perform geolocation lookups on IP |
| 1869 | addresses using either it's own internal GeoDB database or optionally |
| 1870 | the GeoIP database from MaxMind, Inc. (www.maxmind.com). If used, |
| 1871 | unresolved addresses will be searched for in the database and it's |
| 1872 | country of origin will be returned if found. This actually produces |
| 1873 | more accurate Country information than DNS lookups, since the DNS |
| 1874 | address space has additional gcTLDs that do not necessarily map to |
| 1875 | a specific country (such as '.net' and '.com'). It is possible to |
| 1876 | use both DNS lookups and geolocation lookups at the same time, which |
| 1877 | will cause any addresses that could not be resolved using DNS lookups |
| 1878 | to then be looked up in the database, greatly reducing the number of |
| 1879 | 'Unknown/Unresolved' entries in the generated reports. The native |
| 1880 | GeoDB geolocation database provided by The Webalizer fully supports |
| 1881 | IPv4 and IPv6 lookups, is updated regularly, and is the preferred |
| 1882 | geolocation method for use with The Webalizer. The most current |
| 1883 | version of the database can be obtained from our ftp site. |
| 1884 | |
| 1885 | |
| 1886 | Language Support |
| 1887 | ---------------- |
| 1888 | |
| 1889 | Version 1.0x of The Webalizer added language support. This |
| 1890 | support is only provided at compile time in the form of an |
| 1891 | include file containing all the strings used by The Webalizer. |
| 1892 | The source distribution contains all language files that were |
| 1893 | available at the time, with English being the default as |
| 1894 | that is the only human language I speak fluently, and me |
| 1895 | Espanol es muy malo. Several people have already indicated |
| 1896 | the desire to do translations into various languages, and as |
| 1897 | I receive the language files, will make them available via |
| 1898 | ftp at ftp://ftp.mrunix.net/pub/webalizer/lang. Unless there |
| 1899 | happens to be a binary distribution in the language you need, |
| 1900 | you will need to grab the source distribution and compile the |
| 1901 | program yourself. See the file INSTALL that comes in the source |
| 1902 | distribution for information on how to use a language other than |
| 1903 | English. |
| 1904 | |
| 1905 | It should also be noted that the GD graphics library, used to |
| 1906 | produce the in-line graphics in the output HTML, doesn't |
| 1907 | support extended character sets, so if you are translating |
| 1908 | the language file, you will no doubt encounter this problem. |
| 1909 | |
| 1910 | New: You can now specify the language to use when you are building |
| 1911 | program from source, using the configure script. Just add |
| 1912 | --with-language=language_name , where 'language_name' is the |
| 1913 | name of a valid language file in the /lang/ directory. For |
| 1914 | example, --with-language=french will build using French as |
| 1915 | the default language. You should consult the INSTALL file |
| 1916 | for additional information on building the program from source. |
| 1917 | |
| 1918 | |
| 1919 | Known Issues |
| 1920 | ------------ |
| 1921 | |
| 1922 | o Memory Usage. The Webalizer makes liberal use of memory for internal |
| 1923 | data structures during analysis. Lack of real physical memory will |
| 1924 | noticeably degrade performance by doing lots of swapping between memory |
| 1925 | and disk. One user who had a rather large log file noticed that The |
| 1926 | Webalizer took over 7 hours to run with only 16 Meg of memory. Once |
| 1927 | memory was increased, the time was reduced to a few minutes. |
| 1928 | |
| 1929 | |
| 1930 | o Performance. The Hide*, Group*, Ignore*, Include* and IndexAlias |
| 1931 | configuration options can cause a performance decrease if lots of |
| 1932 | them are used. The reason for this is that every log record must |
| 1933 | be scanned for each item in each list. For example, if you are |
| 1934 | Hiding 20 objects, Grouping 20 more, and Ignoring 5, each record |
| 1935 | is scanned, at most, 46 times (20+20+5 + an IndexAlias scan). |
| 1936 | On really large log files, this can have a profound impact. It |
| 1937 | is recommended that you use the least amount of these configuration |
| 1938 | options that you can, as it will greatly improve performance. |
| 1939 | |
| 1940 | |
| 1941 | Final Notes |
| 1942 | ----------- |
| 1943 | |
| 1944 | A lot of time and effort went into making The Webalizer, and to ensure that |
| 1945 | the results are as accurate as possible. If you find any abnormalities or |
| 1946 | inconsistent results, bugs, errors, omissions or anything else that doesn't |
| 1947 | look right, please let me know so I can investigate the problem or correct |
| 1948 | the error. This goes for the minimal documentation as well. Suggestions |
| 1949 | for future versions are also welcome and appreciated. |