Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / man-pages / pod1 / afsmonitor.pod
CommitLineData
805e021f
CE
1=head1 NAME
2
3afsmonitor - Monitors File Servers and Cache Managers
4
5=head1 SYNOPSIS
6
7=for html
8<div class="synopsis">
9
10B<afsmonitor> [B<initcmd>] [-config <I<configuration file>>]
11 S<<< [B<-frequency> <I<poll frequency, in seconds>>] >>>
12 S<<< [B<-output> <I<storage file name>>] >>> [B<-detailed>]
13 S<<< [B<-debug> <I<debug output file>>] >>>
14 S<<< [B<-fshosts> <I<list of file servers to monitor>>+] >>>
15 S<<< [B<-cmhosts> <I<list of cache managers to monitor>>+] >>>
16 S<<< [B<-buffers> <I<number of buffer slots>>] >>> [B<-version>] [B<-help>]
17
18B<afsmonitor> [B<i>] [-co <I<configuration file>>]
19 S<<< [B<-fr> <I<poll frequency, in seconds>>] >>>
20 S<<< [B<-o> <I<storage file name>>] >>> [B<-det>]
21 S<<< [B<-deb> <I<debug output file>>] >>>
22 S<<< [B<-fs> <I<list of file servers to monitor>>+] >>>
23 S<<< [B<-cm> <I<list of cache managers to monitor>>+] >>>
24 S<<< [B<-b> <I<number of buffer slots>>] >>> [B<-version>] [B<-h>]
25
26=for html
27</div>
28
29=head1 DESCRIPTION
30
31The afsmonitor command initializes a program that gathers and displays
32statistics about specified File Server and Cache Manager operations. It
33allows the issuer to monitor, from a single location, a wide range of File
34Server and Cache Manager operations on any number of machines in both
35local and foreign cells.
36
37There are 271 available File Server statistics and 571 available Cache
38Manager statistics, listed in the appendix about B<afsmonitor> statistics
39in the I<OpenAFS Administration Guide>. By default, the command displays
40all of the relevant statistics for the file server machines named by the
41B<-fshosts> argument and the client machines named by the B<-cmhosts>
42argument. To limit the display to only the statistics of interest, list
43them in the configuration file specified by the B<-config> argument. In
44addition, use the configuration file for the following purposes:
45
46=over 4
47
48=item *
49
50To set threshold values for any monitored statistic. When the value of a
51statistic exceeds the threshold, the B<afsmonitor> command displays it in
52reverse video. There are no default threshold values.
53
54=item *
55
56To invoke a program or script automatically when a statistic exceeds its
57threshold. The AFS distribution does not include any such scripts.
58
59=item *
60
61To list the file server and client machines to monitor, instead of using
62the B<-fshosts> and B<-cmhosts> arguments.
63
64=back
65
66For a description of the configuration file, see L<afsmonitor(5)>.
67
68=head1 CAUTIONS
69
70The following software must be accessible to a machine where the
71B<afsmonitor> program is running:
72
73=over 4
74
75=item *
76
77The AFS xstat libraries, which the afsmonitor program uses to gather data.
78
79=item *
80
81The curses graphics package, which most UNIX distributions provide as a
82standard utility.
83
84=back
85
86The B<afsmonitor> screens format successfully both on so-called dumb
87terminals and in windowing systems that emulate terminals. For the output
88to looks its best, the display environment needs to support reverse video
89and cursor addressing. Set the TERM environment variable to the correct
90terminal type, or to a value that has characteristics similar to the
91actual terminal type. The display window or terminal must be at least 80
92columns wide and 12 lines long.
93
94The B<afsmonitor> program must run in the foreground, and in its own
95separate, dedicated window or terminal. The window or terminal is
96unavailable for any other activity as long as the B<afsmonitor> program is
97running. Any number of instances of the B<afsmonitor> program can run on a
98single machine, as long as each instance runs in its own dedicated window
99or terminal. Note that it can take up to three minutes to start an
100additional instance.
101
102=head1 OPTIONS
103
104=over 4
105
106=item B<initcmd>
107
108Accommodates the command's use of the AFS command parser, and is optional.
109
110=item B<-config> <I<file>>
111
112Names the configuration file which lists the machines to monitor,
113statistics to display, and threshold values, if any. A partial pathname is
114interpreted relative to the current working directory. Provide this
115argument if not providing the B<-fshosts> argument, B<-cmhosts> argument,
116or neither. For instructions on creating this file, see the preceding
117B<DESCRIPTION> section, and the section on the B<afsmonitor> program in
118the I<OpenAFS Administration Guide>.
119
120=item B<-frequency> <I<poll frequency>>
121
122Specifies in seconds how often the afsmonitor program probes the File
123Servers and Cache Managers. Valid values range from C<1> to C<86400>
124(which is 24 hours); the default value is C<60>. This frequency applies to
125both File Servers and Cache Managers, but the B<afsmonitor> program
126initiates the two types of probes, and processes their results,
127separately. The actual interval between probes to a host is the probe
128frequency plus the time required for all hosts to respond.
129
130=item B<-output> <I<file>>
131
132Names the file to which the afsmonitor program writes all of the
133statistics that it collects. By default, no output file is created. See
134the section on the B<afsmonitor> command in the I<OpenAFS Administration
135Guide> for information on this file.
136
137=item B<-detailed>
138
139Formats the information in the output file named by B<-output> argument in
140a maximally readable format. Provide the B<-output> argument along with
141this one.
142
143=item B<-fshosts> <I<host>>+
144
145Names one or more machines from which to gather File Server
146statistics. For each machine, provide either a fully qualified host name,
147or an unambiguous abbreviation (the ability to resolve an abbreviation
148depends on the state of the cell's name service at the time the command is
149issued). This argument can be combined with the B<-cmhosts> argument, but
150not with the B<-config> argument.
151
152=item B<-cmhosts> <I<host>>+
153
154Names one or more machines from which to gather Cache Manager
155statistics. For each machine, provide either a fully qualified host name,
156or an unambiguous abbreviation (the ability to resolve an abbreviation
157depends on the state of the cell's name service at the time the command is
158issued). This argument can be combined with the B<-fshosts> argument, but
159not with the B<-config> argument.
160
161=item B<-buffers> <I<slots>>
162
163Is nonoperational and provided to accommodate potential future
164enhancements to the program.
165
166=item B<-debug> <I<debut output file>>
167
168Turns on debugging output, and writes debugging information to the specified
169file.
170
171=item B<-help>
172
173Prints the online help for this command. All other valid options are
174ignored.
175
176=item B<-version>
177
178Prints the program version and then exits. All other valid options
179are ignored.
180
181=back
182
183=head1 OUTPUT
184
185The afsmonitor program displays its data on three screens:
186
187=over 4
188
189=item System Overview
190
191This screen appears automatically when the B<afsmonitor> program
192initializes. It summarizes separately for File Servers and Cache Managers
193the number of machines being monitored and how many of them have I<alerts>
194(statistics that have exceeded their thresholds). It then lists the
195hostname and number of alerts for each machine being monitored, indicating
196if appropriate that a process failed to respond to the last probe.
197
198=item File Server
199
200This screen displays File Server statistics for each file server machine
201being monitored. It highlights statistics that have exceeded their
202thresholds, and identifies machines that failed to respond to the last
203probe.
204
205=item Cache Managers
206
207This screen displays Cache Manager statistics for each client machine
208being monitored. It highlights statistics that have exceeded their
209thresholds, and identifies machines that failed to respond to the last
210probe.
211
212=back
213
214Fields at the corners of every screen display the following information:
215
216=over 4
217
218=item *
219
220In the top left corner, the program name and version number.
221
222=item *
223
224In the top right corner, the screen name, current and total page numbers,
225and current and total column numbers. The page number (for example, C<p. 1
226of 3>) indicates the index of the current page and the total number of
227(vertical) pages over which data is displayed. The column number (for
228example, C<c. 1 of 235>) indicates the index of the current leftmost
229column and the total number of columns in which data appears. (The symbol
230C<<<< >>> >>>> indicates that there is additional data to the right; the
231symbol C<<<< <<< >>>> indicates that there is additional data to the
232left.)
233
234=item *
235
236In the bottom left corner, a list of the available commands. Enter the
237first letter in the command name to run that command. Only the currently
238possible options appear; for example, if there is only one page of data,
239the C<next> and C<prev> commands, which scroll the screen up and down
240respectively, do not appear. For descriptions of the commands, see the
241following section about navigating the display screens.
242
243=item *
244
245In the bottom right corner, the C<probes> field reports how many times the
246program has probed File Servers (C<fs>), Cache Managers (C<cm>), or
247both. The counts for File Servers and Cache Managers can differ. The
248C<freq> field reports how often the program sends probes.
249
250=back
251
252=head2 Navigating the afsmonitor Display Screens
253
254As noted, the lower left hand corner of every display screen displays the
255names of the commands currently available for moving to alternate screens,
256which can either be a different type or display more statistics or
257machines of the current type. To execute a command, press the lowercase
258version of the first letter in its name. Some commands also have an
259uppercase version that has a somewhat different effect, as indicated in
260the following list.
261
262=over 4
263
264=item C<cm>
265
266Switches to the C<Cache Managers> screen. Available only on the C<System
267Overview> and C<File Servers> screens.
268
269=item C<fs>
270
271Switches to the C<File Servers> screen. Available only on the C<System
272Overview> and the C<Cache Managers> screens.
273
274=item C<left>
275
276Scrolls horizontally to the left, to access the data columns situated to
277the left of the current set. Available when the C<<<< <<< >>>> symbol
278appears at the top left of the screen. Press uppercase C<L> to scroll
279horizontally all the way to the left (to display the first set of data
280columns).
281
282=item C<next>
283
284Scrolls down vertically to the next page of machine names. Available when
285there are two or more pages of machines and the final page is not
286currently displayed. Press uppercase C<N> to scroll to the final page.
287
288=item C<oview>
289
290Switches to the C<System Overview> screen. Available only on the C<Cache
291Managers> and C<File Servers> screens.
292
293=item C<prev>
294
295Scrolls up vertically to the previous page of machine names. Available
296when there are two or more pages of machines and the first page is not
297currently displayed. Press uppercase C<N> to scroll to the first page.
298
299=item C<right>
300
301Scrolls horizontally to the right, to access the data columns situated to
302the right of the current set. This command is available when the C<<<< >>>
303>>>> symbol appears at the upper right of the screen. Press uppercase C<R>
304to scroll horizontally all the way to the right (to display the final set
305of data columns).
306
307=back
308
309=head2 The System Overview Screen
310
311The C<System Overview> screen appears automatically as the B<afsmonitor>
312program initializes. This screen displays the status of as many File
313Server and Cache Manager processes as can fit in the current window;
314scroll down to access additional information.
315
316The information on this screen is split into File Server information on
317the left and Cache Manager information on the right. The header for each
318grouping reports two pieces of information:
319
320=over 4
321
322=item *
323
324The number of machines on which the program is monitoring the indicated
325process.
326
327=item *
328
329The number of alerts and the number of machines affected by them (an
330I<alert> means that a statistic has exceeded its threshold or a process
331failed to respond to the last probe).
332
333=back
334
335A list of the machines being monitored follows. If there are any alerts on
336a machine, the number of them appears in square brackets to the left of
337the hostname. If a process failed to respond to the last probe, the
338letters C<PF> (probe failure) appear in square brackets to the left of the
339hostname.
340
341=head2 The File Servers Screen
342
343The C<File Servers> screen displays the values collected at the most
344recent probe for File Server statistics.
345
346A summary line at the top of the screen (just below the standard program
347version and screen title blocks) specifies the number of monitored File
348Servers, the number of alerts, and the number of machines affected by the
349alerts.
350
351The first column always displays the hostnames of the machines running the
352monitored File Servers.
353
354To the right of the hostname column appear as many columns of statistics
355as can fit within the current width of the display screen or window; each
356column requires space for 10 characters. The name of the statistic appears
357at the top of each column. If the File Server on a machine did not respond
358to the most recent probe, a pair of dashes (C<-->) appears in each
359column. If a value exceeds its configured threshold, it is highlighted in
360reverse video. If a value is too large to fit into the allotted column
361width, it overflows into the next row in the same column.
362
363=head2 The Cache Managers Screen
364
365The C<Cache Managers> screen displays the values collected at the most
366recent probe for Cache Manager statistics.
367
368A summary line at the top of the screen (just below the standard program
369version and screen title blocks) specifies the number of monitored Cache
370Managers, the number of alerts, and the number of machines affected by the
371alerts.
372
373The first column always displays the hostnames of the machines running the
374monitored Cache Managers.
375
376To the right of the hostname column appear as many columns of statistics
377as can fit within the current width of the display screen or window; each
378column requires space for 10 characters. The name of the statistic appears
379at the top of each column. If the Cache Manager on a machine did not
380respond to the most recent probe, a pair of dashes (C<-->) appears in each
381column. If a value exceeds its configured threshold, it is highlighted in
382reverse video. If a value is too large to fit into the allotted column
383width, it overflows into the next row in the same column.
384
385=head2 Writing to an Output File
386
387Include the B<-output> argument to name the file into which the
388B<afsmonitor> program writes all of the statistics it collects. The
389output file can be useful for tracking performance over long periods of
390time, and enables the administrator to apply post-processing techniques
391that reveal system trends. The AFS distribution does not include any
392post-processing programs.
393
394The output file is in ASCII format and records the same information as the
395C<File Server> and C<Cache Manager> display screens. Each line in the
396file uses the following format to record the time at which the
397B<afsmonitor> program gathered the indicated statistic from the Cache
398Manager (C<CM>) or File Server (C<FS>) running on the machine called
399I<host_name>. If a probe failed, the error code C<-1> appears in the
400I<statistic> field.
401
402 <time> <host_name> CM|FS <statistic>
403
404If the administrator usually reviews the output file manually, rather than
405using it as input to an automated analysis program or script, including
406the B<-detail> flag formats the data in a more easily readable form.
407
408=head1 EXAMPLES
409
410For examples of commands, display screens, and configuration files, see
411the section about the B<afsmonitor> program in the I<OpenAFS
412Administration Guide>.
413
414=head1 PRIVILEGE REQUIRED
415
416None
417
418=head1 SEE ALSO
419
420L<afsmonitor(5)>
421L<fstrace(8)>,
422L<scout(1)>
423
424The I<OpenAFS Administration Guide> at
425L<http://docs.openafs.org/AdminGuide/>.
426
427=head1 COPYRIGHT
428
429IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
430
431This documentation is covered by the IBM Public License Version 1.0. It was
432converted from HTML to POD by software written by Chas Williams and Russ
433Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.