Commit | Line | Data |
---|---|---|
805e021f CE |
1 | =head1 NAME |
2 | ||
3 | afsmonitor - Monitors File Servers and Cache Managers | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | =for html | |
8 | <div class="synopsis"> | |
9 | ||
10 | B<afsmonitor> [B<initcmd>] [-config <I<configuration file>>] | |
11 | S<<< [B<-frequency> <I<poll frequency, in seconds>>] >>> | |
12 | S<<< [B<-output> <I<storage file name>>] >>> [B<-detailed>] | |
13 | S<<< [B<-debug> <I<debug output file>>] >>> | |
14 | S<<< [B<-fshosts> <I<list of file servers to monitor>>+] >>> | |
15 | S<<< [B<-cmhosts> <I<list of cache managers to monitor>>+] >>> | |
16 | S<<< [B<-buffers> <I<number of buffer slots>>] >>> [B<-version>] [B<-help>] | |
17 | ||
18 | B<afsmonitor> [B<i>] [-co <I<configuration file>>] | |
19 | S<<< [B<-fr> <I<poll frequency, in seconds>>] >>> | |
20 | S<<< [B<-o> <I<storage file name>>] >>> [B<-det>] | |
21 | S<<< [B<-deb> <I<debug output file>>] >>> | |
22 | S<<< [B<-fs> <I<list of file servers to monitor>>+] >>> | |
23 | S<<< [B<-cm> <I<list of cache managers to monitor>>+] >>> | |
24 | S<<< [B<-b> <I<number of buffer slots>>] >>> [B<-version>] [B<-h>] | |
25 | ||
26 | =for html | |
27 | </div> | |
28 | ||
29 | =head1 DESCRIPTION | |
30 | ||
31 | The afsmonitor command initializes a program that gathers and displays | |
32 | statistics about specified File Server and Cache Manager operations. It | |
33 | allows the issuer to monitor, from a single location, a wide range of File | |
34 | Server and Cache Manager operations on any number of machines in both | |
35 | local and foreign cells. | |
36 | ||
37 | There are 271 available File Server statistics and 571 available Cache | |
38 | Manager statistics, listed in the appendix about B<afsmonitor> statistics | |
39 | in the I<OpenAFS Administration Guide>. By default, the command displays | |
40 | all of the relevant statistics for the file server machines named by the | |
41 | B<-fshosts> argument and the client machines named by the B<-cmhosts> | |
42 | argument. To limit the display to only the statistics of interest, list | |
43 | them in the configuration file specified by the B<-config> argument. In | |
44 | addition, use the configuration file for the following purposes: | |
45 | ||
46 | =over 4 | |
47 | ||
48 | =item * | |
49 | ||
50 | To set threshold values for any monitored statistic. When the value of a | |
51 | statistic exceeds the threshold, the B<afsmonitor> command displays it in | |
52 | reverse video. There are no default threshold values. | |
53 | ||
54 | =item * | |
55 | ||
56 | To invoke a program or script automatically when a statistic exceeds its | |
57 | threshold. The AFS distribution does not include any such scripts. | |
58 | ||
59 | =item * | |
60 | ||
61 | To list the file server and client machines to monitor, instead of using | |
62 | the B<-fshosts> and B<-cmhosts> arguments. | |
63 | ||
64 | =back | |
65 | ||
66 | For a description of the configuration file, see L<afsmonitor(5)>. | |
67 | ||
68 | =head1 CAUTIONS | |
69 | ||
70 | The following software must be accessible to a machine where the | |
71 | B<afsmonitor> program is running: | |
72 | ||
73 | =over 4 | |
74 | ||
75 | =item * | |
76 | ||
77 | The AFS xstat libraries, which the afsmonitor program uses to gather data. | |
78 | ||
79 | =item * | |
80 | ||
81 | The curses graphics package, which most UNIX distributions provide as a | |
82 | standard utility. | |
83 | ||
84 | =back | |
85 | ||
86 | The B<afsmonitor> screens format successfully both on so-called dumb | |
87 | terminals and in windowing systems that emulate terminals. For the output | |
88 | to looks its best, the display environment needs to support reverse video | |
89 | and cursor addressing. Set the TERM environment variable to the correct | |
90 | terminal type, or to a value that has characteristics similar to the | |
91 | actual terminal type. The display window or terminal must be at least 80 | |
92 | columns wide and 12 lines long. | |
93 | ||
94 | The B<afsmonitor> program must run in the foreground, and in its own | |
95 | separate, dedicated window or terminal. The window or terminal is | |
96 | unavailable for any other activity as long as the B<afsmonitor> program is | |
97 | running. Any number of instances of the B<afsmonitor> program can run on a | |
98 | single machine, as long as each instance runs in its own dedicated window | |
99 | or terminal. Note that it can take up to three minutes to start an | |
100 | additional instance. | |
101 | ||
102 | =head1 OPTIONS | |
103 | ||
104 | =over 4 | |
105 | ||
106 | =item B<initcmd> | |
107 | ||
108 | Accommodates the command's use of the AFS command parser, and is optional. | |
109 | ||
110 | =item B<-config> <I<file>> | |
111 | ||
112 | Names the configuration file which lists the machines to monitor, | |
113 | statistics to display, and threshold values, if any. A partial pathname is | |
114 | interpreted relative to the current working directory. Provide this | |
115 | argument if not providing the B<-fshosts> argument, B<-cmhosts> argument, | |
116 | or neither. For instructions on creating this file, see the preceding | |
117 | B<DESCRIPTION> section, and the section on the B<afsmonitor> program in | |
118 | the I<OpenAFS Administration Guide>. | |
119 | ||
120 | =item B<-frequency> <I<poll frequency>> | |
121 | ||
122 | Specifies in seconds how often the afsmonitor program probes the File | |
123 | Servers and Cache Managers. Valid values range from C<1> to C<86400> | |
124 | (which is 24 hours); the default value is C<60>. This frequency applies to | |
125 | both File Servers and Cache Managers, but the B<afsmonitor> program | |
126 | initiates the two types of probes, and processes their results, | |
127 | separately. The actual interval between probes to a host is the probe | |
128 | frequency plus the time required for all hosts to respond. | |
129 | ||
130 | =item B<-output> <I<file>> | |
131 | ||
132 | Names the file to which the afsmonitor program writes all of the | |
133 | statistics that it collects. By default, no output file is created. See | |
134 | the section on the B<afsmonitor> command in the I<OpenAFS Administration | |
135 | Guide> for information on this file. | |
136 | ||
137 | =item B<-detailed> | |
138 | ||
139 | Formats the information in the output file named by B<-output> argument in | |
140 | a maximally readable format. Provide the B<-output> argument along with | |
141 | this one. | |
142 | ||
143 | =item B<-fshosts> <I<host>>+ | |
144 | ||
145 | Names one or more machines from which to gather File Server | |
146 | statistics. For each machine, provide either a fully qualified host name, | |
147 | or an unambiguous abbreviation (the ability to resolve an abbreviation | |
148 | depends on the state of the cell's name service at the time the command is | |
149 | issued). This argument can be combined with the B<-cmhosts> argument, but | |
150 | not with the B<-config> argument. | |
151 | ||
152 | =item B<-cmhosts> <I<host>>+ | |
153 | ||
154 | Names one or more machines from which to gather Cache Manager | |
155 | statistics. For each machine, provide either a fully qualified host name, | |
156 | or an unambiguous abbreviation (the ability to resolve an abbreviation | |
157 | depends on the state of the cell's name service at the time the command is | |
158 | issued). This argument can be combined with the B<-fshosts> argument, but | |
159 | not with the B<-config> argument. | |
160 | ||
161 | =item B<-buffers> <I<slots>> | |
162 | ||
163 | Is nonoperational and provided to accommodate potential future | |
164 | enhancements to the program. | |
165 | ||
166 | =item B<-debug> <I<debut output file>> | |
167 | ||
168 | Turns on debugging output, and writes debugging information to the specified | |
169 | file. | |
170 | ||
171 | =item B<-help> | |
172 | ||
173 | Prints the online help for this command. All other valid options are | |
174 | ignored. | |
175 | ||
176 | =item B<-version> | |
177 | ||
178 | Prints the program version and then exits. All other valid options | |
179 | are ignored. | |
180 | ||
181 | =back | |
182 | ||
183 | =head1 OUTPUT | |
184 | ||
185 | The afsmonitor program displays its data on three screens: | |
186 | ||
187 | =over 4 | |
188 | ||
189 | =item System Overview | |
190 | ||
191 | This screen appears automatically when the B<afsmonitor> program | |
192 | initializes. It summarizes separately for File Servers and Cache Managers | |
193 | the number of machines being monitored and how many of them have I<alerts> | |
194 | (statistics that have exceeded their thresholds). It then lists the | |
195 | hostname and number of alerts for each machine being monitored, indicating | |
196 | if appropriate that a process failed to respond to the last probe. | |
197 | ||
198 | =item File Server | |
199 | ||
200 | This screen displays File Server statistics for each file server machine | |
201 | being monitored. It highlights statistics that have exceeded their | |
202 | thresholds, and identifies machines that failed to respond to the last | |
203 | probe. | |
204 | ||
205 | =item Cache Managers | |
206 | ||
207 | This screen displays Cache Manager statistics for each client machine | |
208 | being monitored. It highlights statistics that have exceeded their | |
209 | thresholds, and identifies machines that failed to respond to the last | |
210 | probe. | |
211 | ||
212 | =back | |
213 | ||
214 | Fields at the corners of every screen display the following information: | |
215 | ||
216 | =over 4 | |
217 | ||
218 | =item * | |
219 | ||
220 | In the top left corner, the program name and version number. | |
221 | ||
222 | =item * | |
223 | ||
224 | In the top right corner, the screen name, current and total page numbers, | |
225 | and current and total column numbers. The page number (for example, C<p. 1 | |
226 | of 3>) indicates the index of the current page and the total number of | |
227 | (vertical) pages over which data is displayed. The column number (for | |
228 | example, C<c. 1 of 235>) indicates the index of the current leftmost | |
229 | column and the total number of columns in which data appears. (The symbol | |
230 | C<<<< >>> >>>> indicates that there is additional data to the right; the | |
231 | symbol C<<<< <<< >>>> indicates that there is additional data to the | |
232 | left.) | |
233 | ||
234 | =item * | |
235 | ||
236 | In the bottom left corner, a list of the available commands. Enter the | |
237 | first letter in the command name to run that command. Only the currently | |
238 | possible options appear; for example, if there is only one page of data, | |
239 | the C<next> and C<prev> commands, which scroll the screen up and down | |
240 | respectively, do not appear. For descriptions of the commands, see the | |
241 | following section about navigating the display screens. | |
242 | ||
243 | =item * | |
244 | ||
245 | In the bottom right corner, the C<probes> field reports how many times the | |
246 | program has probed File Servers (C<fs>), Cache Managers (C<cm>), or | |
247 | both. The counts for File Servers and Cache Managers can differ. The | |
248 | C<freq> field reports how often the program sends probes. | |
249 | ||
250 | =back | |
251 | ||
252 | =head2 Navigating the afsmonitor Display Screens | |
253 | ||
254 | As noted, the lower left hand corner of every display screen displays the | |
255 | names of the commands currently available for moving to alternate screens, | |
256 | which can either be a different type or display more statistics or | |
257 | machines of the current type. To execute a command, press the lowercase | |
258 | version of the first letter in its name. Some commands also have an | |
259 | uppercase version that has a somewhat different effect, as indicated in | |
260 | the following list. | |
261 | ||
262 | =over 4 | |
263 | ||
264 | =item C<cm> | |
265 | ||
266 | Switches to the C<Cache Managers> screen. Available only on the C<System | |
267 | Overview> and C<File Servers> screens. | |
268 | ||
269 | =item C<fs> | |
270 | ||
271 | Switches to the C<File Servers> screen. Available only on the C<System | |
272 | Overview> and the C<Cache Managers> screens. | |
273 | ||
274 | =item C<left> | |
275 | ||
276 | Scrolls horizontally to the left, to access the data columns situated to | |
277 | the left of the current set. Available when the C<<<< <<< >>>> symbol | |
278 | appears at the top left of the screen. Press uppercase C<L> to scroll | |
279 | horizontally all the way to the left (to display the first set of data | |
280 | columns). | |
281 | ||
282 | =item C<next> | |
283 | ||
284 | Scrolls down vertically to the next page of machine names. Available when | |
285 | there are two or more pages of machines and the final page is not | |
286 | currently displayed. Press uppercase C<N> to scroll to the final page. | |
287 | ||
288 | =item C<oview> | |
289 | ||
290 | Switches to the C<System Overview> screen. Available only on the C<Cache | |
291 | Managers> and C<File Servers> screens. | |
292 | ||
293 | =item C<prev> | |
294 | ||
295 | Scrolls up vertically to the previous page of machine names. Available | |
296 | when there are two or more pages of machines and the first page is not | |
297 | currently displayed. Press uppercase C<N> to scroll to the first page. | |
298 | ||
299 | =item C<right> | |
300 | ||
301 | Scrolls horizontally to the right, to access the data columns situated to | |
302 | the right of the current set. This command is available when the C<<<< >>> | |
303 | >>>> symbol appears at the upper right of the screen. Press uppercase C<R> | |
304 | to scroll horizontally all the way to the right (to display the final set | |
305 | of data columns). | |
306 | ||
307 | =back | |
308 | ||
309 | =head2 The System Overview Screen | |
310 | ||
311 | The C<System Overview> screen appears automatically as the B<afsmonitor> | |
312 | program initializes. This screen displays the status of as many File | |
313 | Server and Cache Manager processes as can fit in the current window; | |
314 | scroll down to access additional information. | |
315 | ||
316 | The information on this screen is split into File Server information on | |
317 | the left and Cache Manager information on the right. The header for each | |
318 | grouping reports two pieces of information: | |
319 | ||
320 | =over 4 | |
321 | ||
322 | =item * | |
323 | ||
324 | The number of machines on which the program is monitoring the indicated | |
325 | process. | |
326 | ||
327 | =item * | |
328 | ||
329 | The number of alerts and the number of machines affected by them (an | |
330 | I<alert> means that a statistic has exceeded its threshold or a process | |
331 | failed to respond to the last probe). | |
332 | ||
333 | =back | |
334 | ||
335 | A list of the machines being monitored follows. If there are any alerts on | |
336 | a machine, the number of them appears in square brackets to the left of | |
337 | the hostname. If a process failed to respond to the last probe, the | |
338 | letters C<PF> (probe failure) appear in square brackets to the left of the | |
339 | hostname. | |
340 | ||
341 | =head2 The File Servers Screen | |
342 | ||
343 | The C<File Servers> screen displays the values collected at the most | |
344 | recent probe for File Server statistics. | |
345 | ||
346 | A summary line at the top of the screen (just below the standard program | |
347 | version and screen title blocks) specifies the number of monitored File | |
348 | Servers, the number of alerts, and the number of machines affected by the | |
349 | alerts. | |
350 | ||
351 | The first column always displays the hostnames of the machines running the | |
352 | monitored File Servers. | |
353 | ||
354 | To the right of the hostname column appear as many columns of statistics | |
355 | as can fit within the current width of the display screen or window; each | |
356 | column requires space for 10 characters. The name of the statistic appears | |
357 | at the top of each column. If the File Server on a machine did not respond | |
358 | to the most recent probe, a pair of dashes (C<-->) appears in each | |
359 | column. If a value exceeds its configured threshold, it is highlighted in | |
360 | reverse video. If a value is too large to fit into the allotted column | |
361 | width, it overflows into the next row in the same column. | |
362 | ||
363 | =head2 The Cache Managers Screen | |
364 | ||
365 | The C<Cache Managers> screen displays the values collected at the most | |
366 | recent probe for Cache Manager statistics. | |
367 | ||
368 | A summary line at the top of the screen (just below the standard program | |
369 | version and screen title blocks) specifies the number of monitored Cache | |
370 | Managers, the number of alerts, and the number of machines affected by the | |
371 | alerts. | |
372 | ||
373 | The first column always displays the hostnames of the machines running the | |
374 | monitored Cache Managers. | |
375 | ||
376 | To the right of the hostname column appear as many columns of statistics | |
377 | as can fit within the current width of the display screen or window; each | |
378 | column requires space for 10 characters. The name of the statistic appears | |
379 | at the top of each column. If the Cache Manager on a machine did not | |
380 | respond to the most recent probe, a pair of dashes (C<-->) appears in each | |
381 | column. If a value exceeds its configured threshold, it is highlighted in | |
382 | reverse video. If a value is too large to fit into the allotted column | |
383 | width, it overflows into the next row in the same column. | |
384 | ||
385 | =head2 Writing to an Output File | |
386 | ||
387 | Include the B<-output> argument to name the file into which the | |
388 | B<afsmonitor> program writes all of the statistics it collects. The | |
389 | output file can be useful for tracking performance over long periods of | |
390 | time, and enables the administrator to apply post-processing techniques | |
391 | that reveal system trends. The AFS distribution does not include any | |
392 | post-processing programs. | |
393 | ||
394 | The output file is in ASCII format and records the same information as the | |
395 | C<File Server> and C<Cache Manager> display screens. Each line in the | |
396 | file uses the following format to record the time at which the | |
397 | B<afsmonitor> program gathered the indicated statistic from the Cache | |
398 | Manager (C<CM>) or File Server (C<FS>) running on the machine called | |
399 | I<host_name>. If a probe failed, the error code C<-1> appears in the | |
400 | I<statistic> field. | |
401 | ||
402 | <time> <host_name> CM|FS <statistic> | |
403 | ||
404 | If the administrator usually reviews the output file manually, rather than | |
405 | using it as input to an automated analysis program or script, including | |
406 | the B<-detail> flag formats the data in a more easily readable form. | |
407 | ||
408 | =head1 EXAMPLES | |
409 | ||
410 | For examples of commands, display screens, and configuration files, see | |
411 | the section about the B<afsmonitor> program in the I<OpenAFS | |
412 | Administration Guide>. | |
413 | ||
414 | =head1 PRIVILEGE REQUIRED | |
415 | ||
416 | None | |
417 | ||
418 | =head1 SEE ALSO | |
419 | ||
420 | L<afsmonitor(5)> | |
421 | L<fstrace(8)>, | |
422 | L<scout(1)> | |
423 | ||
424 | The I<OpenAFS Administration Guide> at | |
425 | L<http://docs.openafs.org/AdminGuide/>. | |
426 | ||
427 | =head1 COPYRIGHT | |
428 | ||
429 | IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved. | |
430 | ||
431 | This documentation is covered by the IBM Public License Version 1.0. It was | |
432 | converted from HTML to POD by software written by Chas Williams and Russ | |
433 | Allbery, based on work by Alf Wachsmann and Elizabeth Cassell. |