Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / xml / AdminGuide / auagd013.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <chapter id="HDRWQ323">
3 <title>Monitoring and Auditing AFS Performance</title>
4
5 <para>
6 <indexterm>
7 <primary>scout program</primary>
8 </indexterm>
9
10 <indexterm>
11 <primary>monitoring</primary>
12
13 <secondary>file server processes with scout</secondary>
14 </indexterm>
15
16 <indexterm>
17 <primary>monitoring</primary>
18
19 <secondary>file server processes with afsmonitor</secondary>
20 </indexterm>
21
22 <indexterm>
23 <primary>monitoring</primary>
24
25 <secondary>Cache Manager processes with afsmonitor</secondary>
26 </indexterm>
27
28 <indexterm>
29 <primary>monitoring</primary>
30
31 <secondary>Cache Manager performance</secondary>
32 </indexterm>
33
34 <indexterm>
35 <primary>Cache Manager</primary>
36
37 <secondary>monitoring performance</secondary>
38 </indexterm>
39
40 <indexterm>
41 <primary>client machine</primary>
42
43 <secondary>monitoring performance</secondary>
44 </indexterm>
45
46 <indexterm>
47 <primary>file system</primary>
48
49 <secondary>monitoring activity</secondary>
50 </indexterm>
51
52 AFS comes with three main monitoring tools: <itemizedlist>
53 <listitem>
54 <para>The <emphasis role="bold">scout</emphasis> program, which monitors and gathers statistics on File Server
55 performance.</para>
56 </listitem>
57
58 <listitem>
59 <para>The <emphasis role="bold">fstrace</emphasis> command suite, which traces Cache Manager operations in detail.</para>
60 </listitem>
61
62 <listitem>
63 <para>The <emphasis role="bold">afsmonitor</emphasis> program, which monitors and gathers statistics on both the File Server
64 and the Cache Manager.</para>
65 </listitem>
66 </itemizedlist></para>
67
68 <para>AFS also provides a tool for auditing AFS events on file server machines running AIX.</para>
69
70 <sect1 id="HDRWQ324">
71 <title>Summary of Instructions</title>
72
73 <para>This chapter explains how to perform the following tasks by using the indicated commands:</para>
74
75 <informaltable frame="none">
76 <tgroup cols="2">
77 <colspec colwidth="70*" />
78
79 <colspec colwidth="30*" />
80
81 <tbody>
82 <row>
83 <entry>Initialize the <emphasis role="bold">scout</emphasis> program</entry>
84
85 <entry><emphasis role="bold">scout</emphasis></entry>
86 </row>
87
88 <row>
89 <entry>Display information about a trace log</entry>
90
91 <entry><emphasis role="bold">fstrace lslog</emphasis></entry>
92 </row>
93
94 <row>
95 <entry>Display information about an event set</entry>
96
97 <entry><emphasis role="bold">fstrace lsset</emphasis></entry>
98 </row>
99
100 <row>
101 <entry>Change the size of a trace log</entry>
102
103 <entry><emphasis role="bold">fstrace setlog</emphasis></entry>
104 </row>
105
106 <row>
107 <entry>Set the state of an event set</entry>
108
109 <entry><emphasis role="bold">fstrace setset</emphasis></entry>
110 </row>
111
112 <row>
113 <entry>Dump contents of a trace log</entry>
114
115 <entry><emphasis role="bold">fstrace dump</emphasis></entry>
116 </row>
117
118 <row>
119 <entry>Clear a trace log</entry>
120
121 <entry><emphasis role="bold">fstrace clear</emphasis></entry>
122 </row>
123
124 <row>
125 <entry>Initialize the <emphasis role="bold">afsmonitor</emphasis> program</entry>
126
127 <entry><emphasis role="bold">afsmonitor</emphasis></entry>
128 </row>
129 </tbody>
130 </tgroup>
131 </informaltable>
132 </sect1>
133
134 <sect1 id="HDRWQ326">
135 <title>Using the scout Program</title>
136
137 <indexterm>
138 <primary>scout program</primary>
139
140 <secondary>features summarized</secondary>
141 </indexterm>
142
143 <para>The <emphasis role="bold">scout</emphasis> program monitors the status of the File Server process running on file server
144 machines. It periodically collects statistics from a specified set of File Server processes, displays them in a graphical
145 format, and alerts you if any of the statistics exceed a configurable threshold.</para>
146
147 <para>More specifically, the <emphasis role="bold">scout</emphasis> program includes the following features. <itemizedlist>
148 <listitem>
149 <para>You can monitor, from a single location, the File Server process on any number of server machines from the local and
150 foreign cells. The number is limited only by the size of the display window, which must be large enough to display the
151 statistics.</para>
152 </listitem>
153
154 <listitem>
155 <para>You can set a threshold for many of the statistics. When the value of a statistic exceeds the threshold, the
156 <emphasis role="bold">scout</emphasis> program highlights it (displays it in reverse video) to draw your attention to it.
157 If the value goes back under the threshold, the highlighting is deactivated. You control the thresholds, so highlighting
158 reflects what you consider to be a noteworthy situation. See <link linkend="HDRWQ332">Highlighting Significant
159 Statistics</link>.</para>
160 </listitem>
161
162 <listitem>
163 <para>The <emphasis role="bold">scout</emphasis> program alerts you to File Server process, machine, and network outages
164 by highlighting the name of each machine that does not respond to its probe, enabling you to respond more quickly.</para>
165 </listitem>
166
167 <listitem>
168 <para>You can set how often the <emphasis role="bold">scout</emphasis> program collects statistics from the File Server
169 processes.</para>
170 </listitem>
171 </itemizedlist></para>
172
173 <sect2 id="HDRWQ327">
174 <title>System Requirements</title>
175
176 <indexterm>
177 <primary>scout program</primary>
178
179 <secondary>requirements</secondary>
180 </indexterm>
181
182 <indexterm>
183 <primary>requirements</primary>
184
185 <secondary>scout program</secondary>
186 </indexterm>
187
188 <indexterm>
189 <primary>curses graphics utility</primary>
190
191 <secondary>scout program requirements</secondary>
192 </indexterm>
193
194 <indexterm>
195 <primary>scout program</primary>
196
197 <secondary>setting terminal type</secondary>
198 </indexterm>
199
200 <indexterm>
201 <primary>setting</primary>
202
203 <secondary>terminal type for scout</secondary>
204 </indexterm>
205
206 <indexterm>
207 <primary>terminal type</primary>
208
209 <secondary>setting for scout program</secondary>
210 </indexterm>
211
212 <indexterm>
213 <primary>dumb terminal</primary>
214
215 <secondary>use in scout program</secondary>
216 </indexterm>
217
218 <para>The <emphasis role="bold">scout</emphasis> program runs on any AFS client machine that has access to the <emphasis
219 role="bold">curses</emphasis> graphics package, which most UNIX distributions include as a standard utility. It can run on
220 both dumb terminals and under windowing systems that emulate terminals, but the output looks best on machines that support
221 reverse video and cursor addressing. For best results, set the TERM environment variable to the correct terminal type, or one
222 with characteristics similar to the actual ones. For machines running AIX, the recommended TERM setting is <emphasis
223 role="bold">vt100</emphasis>, assuming the terminal is similar to that. For other operating systems, the wider range of
224 acceptable values includes <emphasis role="bold">xterm</emphasis>, <emphasis role="bold">xterms</emphasis>, <emphasis
225 role="bold">vt100</emphasis>, <emphasis role="bold">vt200</emphasis>, and <emphasis role="bold">wyse85</emphasis>.</para>
226
227 <indexterm>
228 <primary>privilege</primary>
229
230 <secondary>required for scout program</secondary>
231 </indexterm>
232
233 <para>No privilege is required to run the <emphasis role="bold">scout</emphasis> program, so any user who can access the
234 directory where its binary resides (the <emphasis role="bold">/usr/afsws/bin</emphasis> directory in the conventional
235 configuration) can use it. The program's probes for collecting statistics do not impose a significant burden on the File
236 Server process, but you can restrict its use by placing the binary file in a directory with a more restrictive access control
237 list (ACL).</para>
238
239 <para>Multiple instances of the <emphasis role="bold">scout</emphasis> program can run on a single client machine, each over
240 its own dedicated connection (in its own window). It must run in the foreground, so the window in which it runs does not
241 accept further input except for an interrupt signal.</para>
242
243 <para>You can also run the <emphasis role="bold">scout</emphasis> program on several machines and view its output on a single
244 machine, by opening telnet connections to the other machines from the central one and initializing the program in each remote
245 window. In this case, you can include the <emphasis role="bold">-host</emphasis> flag to the <emphasis
246 role="bold">scout</emphasis> command to make the name of each remote machine appear in the <emphasis>banner line</emphasis> at
247 the top of the window displaying its output. See <link linkend="HDRWQ330">The Banner Line</link>.</para>
248 </sect2>
249
250 <sect2 id="HDRWQ328">
251 <title>Using the -basename argument to Specify a Domain Name</title>
252
253 <indexterm>
254 <primary>scout program</primary>
255
256 <secondary>basename</secondary>
257 </indexterm>
258
259 <indexterm>
260 <primary>basenames in scout program</primary>
261 </indexterm>
262
263 <para>As previously mentioned, the <emphasis role="bold">scout</emphasis> program can monitor the File Server process on any
264 number of file server machines. If all of the machines belong to the same cell, then their hostnames probably all have the
265 same domain name suffix, such as <emphasis role="bold">example.com</emphasis> in the Example Corporation cell. In this case, you can
266 use the <emphasis role="bold">-basename</emphasis> argument to the <emphasis role="bold">scout</emphasis> command, which has
267 several advantages: <itemizedlist>
268 <listitem>
269 <para>You can omit the domain name suffix as you enter each file server machine's name on the command line. The
270 <emphasis role="bold">scout</emphasis> program automatically appends the domain name to each machine's name, resulting
271 in a fully-qualified hostname. You can omit the domain name suffix even when you don't include the <emphasis
272 role="bold">-basename</emphasis> argument, but in that case correct resolution of the name depends on the state of your
273 cell's naming service at the time of connection.</para>
274 </listitem>
275
276 <listitem>
277 <para>The machine names are more likely to fit in the appropriate column of the display without having to be truncated
278 (for more on truncating names in the display column, see <link linkend="HDRWQ331">The Statistics Display
279 Region</link>).</para>
280 </listitem>
281
282 <listitem>
283 <para>The domain name appears in the banner line at the top of the display window to indicate the name of the cell you
284 are monitoring.</para>
285 </listitem>
286 </itemizedlist></para>
287 </sect2>
288
289 <sect2 id="HDRWQ329">
290 <title>The Layout of the scout Display</title>
291
292 <indexterm>
293 <primary>scout program</primary>
294
295 <secondary>display layout</secondary>
296 </indexterm>
297
298 <indexterm>
299 <primary>display layout in scout program window</primary>
300 </indexterm>
301
302 <para>The <emphasis role="bold">scout</emphasis> program can display statistics either in a dedicated window or on a plain
303 screen if a windowing environment is not available. For best results, use a window or screen that can print in reverse video
304 and do cursor addressing.</para>
305
306 <para>The <emphasis role="bold">scout</emphasis> program screen has three main regions: the <emphasis>banner line</emphasis>,
307 the <emphasis>statistics display region</emphasis> and the <emphasis>probe/message</emphasis> line. This section describes
308 their contents, and graphic examples appear in <link linkend="HDRWQ336">Example Commands and Displays</link>.</para>
309
310 <sect3 id="HDRWQ330">
311 <title>The Banner Line</title>
312
313 <indexterm>
314 <primary>scout program</primary>
315
316 <secondary>banner line</secondary>
317 </indexterm>
318
319 <indexterm>
320 <primary>banner line on the scout program screen</primary>
321 </indexterm>
322
323 <para>By default, the string <computeroutput>scout</computeroutput> appears in the banner line at the top of the window or
324 screen, to indicate that the <emphasis role="bold">scout</emphasis> program is running. You can display two additional types
325 of information by include the appropriate option on the command line: <itemizedlist>
326 <listitem>
327 <para>Include the <emphasis role="bold">-host</emphasis> flag to display the local machine's name in the banner line.
328 This is particularly useful when you are running the <emphasis role="bold">scout</emphasis> program on several
329 machines but displaying the results on a single machine.</para>
330
331 <para>For example, the following banner line appears when you run the <emphasis role="bold">scout</emphasis> program
332 on the machine <emphasis role="bold">client1.example.com</emphasis> and use the<emphasis role="bold">-host</emphasis>
333 flag:</para>
334
335 <programlisting>
336 [client1.example.com] scout
337 </programlisting>
338 </listitem>
339
340 <listitem>
341 <para>Include the <emphasis role="bold">-basename</emphasis> argument to display the specified cell domain name in the
342 banner line. For further discussion, see <link linkend="HDRWQ328">Using the -basename argument to Specify a Domain
343 Name</link>.</para>
344
345 <para>For example, if you specify a value of <emphasis role="bold">example.com</emphasis> for the <emphasis
346 role="bold">-basename</emphasis> argument, the banner line reads:</para>
347
348 <programlisting>
349 scout for example.com
350 </programlisting>
351 </listitem>
352 </itemizedlist></para>
353 </sect3>
354
355 <sect3 id="HDRWQ331">
356 <title>The Statistics Display Region</title>
357
358 <indexterm>
359 <primary>scout program</primary>
360
361 <secondary>statistics displayed</secondary>
362 </indexterm>
363
364 <indexterm>
365 <primary>statistics display by scout program</primary>
366 </indexterm>
367
368 <para>The statistics display region occupies most of the window and is divided into six columns. The following list
369 describes them as they appear from left to right in the window. <variablelist>
370 <varlistentry>
371 <term><computeroutput>Conn</computeroutput></term>
372
373 <listitem>
374 <indexterm>
375 <primary>Conn statistic from scout program</primary>
376 </indexterm>
377
378 <para>Displays the number of RPC connections open between the File Server process and client machines. This number
379 normally equals or exceeds the number in the fourth <computeroutput>Ws</computeroutput> column. It can exceed the
380 number in that column because each user on the machine can have more than one connection open at once, and one
381 client machine can handle several users.</para>
382 </listitem>
383 </varlistentry>
384
385 <varlistentry>
386 <term><computeroutput>Fetch</computeroutput></term>
387
388 <listitem>
389 <indexterm>
390 <primary>Fetch statistic from scout program</primary>
391 </indexterm>
392
393 <para>Displays the number of fetch-type RPCs (fetch data, fetch access list, and fetch status) that the File Server
394 process has received from client machines since it started. It resets to zero when the File Server process
395 restarts.</para>
396 </listitem>
397 </varlistentry>
398
399 <varlistentry>
400 <term><computeroutput>Store</computeroutput></term>
401
402 <listitem>
403 <indexterm>
404 <primary>Store statistic from scout program</primary>
405 </indexterm>
406
407 <para>Displays the number of store-type RPCs (store data, store access list, and store status) that the File Server
408 process has received from client machines since it started. It resets to zero when the File Server process
409 restarts.</para>
410 </listitem>
411 </varlistentry>
412
413 <varlistentry>
414 <term><computeroutput>Ws</computeroutput></term>
415
416 <listitem>
417 <indexterm>
418 <primary>active</primary>
419
420 <secondary>clients statistic from scout program</secondary>
421 </indexterm>
422
423 <indexterm>
424 <primary>client machines statistic from scout program</primary>
425 </indexterm>
426
427 <indexterm>
428 <primary>Ws statistic from scout program</primary>
429 </indexterm>
430
431 <para>Displays the number of client machines (workstations) that have communicated with the File Server process
432 within the last 15 minutes (such machines are termed <emphasis>active</emphasis>). This number is likely to be
433 smaller than the number in the <computeroutput>Conn</computeroutput>) column because a single client machine can
434 have several connections open to one File Server process.</para>
435 </listitem>
436 </varlistentry>
437
438 <varlistentry>
439 <term><emphasis role="bold">[Unlabeled column]</emphasis></term>
440
441 <listitem>
442 <para>Displays the name of the file server machine on which the File Server process is running. It is 12 characters
443 wide. Longer names are truncated and an asterisk (<computeroutput>*</computeroutput>) appears as the last character
444 in the name. If all machines have the same domain name suffix, you can use the <emphasis
445 role="bold">-basename</emphasis> argument to decrease the need for truncation; see <link linkend="HDRWQ328">Using
446 the -basename argument to Specify a Domain Name</link>.</para>
447 </listitem>
448 </varlistentry>
449
450 <varlistentry>
451 <term><computeroutput>Disk attn</computeroutput></term>
452
453 <listitem>
454 <indexterm>
455 <primary>disk partition</primary>
456
457 <secondary>monitoring usage of</secondary>
458 </indexterm>
459
460 <indexterm>
461 <primary>monitoring</primary>
462
463 <secondary>disk usage with scout program</secondary>
464 </indexterm>
465
466 <indexterm>
467 <primary>scout program</primary>
468
469 <secondary>monitoring disk usage</secondary>
470 </indexterm>
471
472 <indexterm>
473 <primary>Disk attn statistic from scout program</primary>
474 </indexterm>
475
476 <para>Displays the number of kilobyte blocks available on up to 26 of the file server machine's AFS server
477 (<emphasis role="bold">/vicep</emphasis>) partitions. The display for each partition has the following format:
478 <programlisting>
479 partition_letter:free_blocks
480 </programlisting></para>
481
482 <para>For example, <computeroutput>a:8949</computeroutput> indicates that partition <emphasis
483 role="bold">/vicepa</emphasis> has 8,949 KB free. If the window is not wide enough for all partition entries to
484 appear on a single line, the <emphasis role="bold">scout</emphasis> program automatically stacks the partition
485 entries into subcolumns within the sixth column.</para>
486
487 <para>The label on the <computeroutput>Disk attn</computeroutput> column indicates the threshold value at which
488 entries in the column become highlighted. By default, the <emphasis role="bold">scout</emphasis> program highlights
489 a partition that is over 95% full, in which case the label is as follows:</para>
490
491 <programlisting>
492 Disk attn: &gt; 95% used
493 </programlisting>
494
495 <para>For more on this threshold and its effect on highlighting, see <link linkend="HDRWQ332">Highlighting
496 Significant Statistics</link>.</para>
497 </listitem>
498 </varlistentry>
499 </variablelist></para>
500
501 <para>For all columns except the fifth (file server machine name), you can use the <emphasis
502 role="bold">-attention</emphasis> argument to set a threshold value above which the <emphasis role="bold">scout</emphasis>
503 program highlights the statistic. By default, only values in the fifth and sixth columns ever become highlighted. For
504 instructions on using the <emphasis role="bold">-attention</emphasis> argument, see <link linkend="HDRWQ332">Highlighting
505 Significant Statistics</link>.</para>
506 </sect3>
507
508 <sect3 id="Header_368">
509 <title>The Probe Reporting Line</title>
510
511 <indexterm>
512 <primary>scout program</primary>
513
514 <secondary>probe reporting line</secondary>
515 </indexterm>
516
517 <indexterm>
518 <primary>message line in scout program display</primary>
519 </indexterm>
520
521 <para>The bottom line of the display indicates how many times the <emphasis role="bold">scout</emphasis> program has probed
522 the File Server processes for statistics. The statistics gathered in the latest probe appear in the statistics display
523 region. By default, the <emphasis role="bold">scout</emphasis> program probes the File Servers every 60 seconds, but you can
524 use the <emphasis role="bold">-frequency</emphasis> argument to specify a different probe frequency.</para>
525 </sect3>
526 </sect2>
527
528 <sect2 id="HDRWQ332">
529 <title>Highlighting Significant Statistics</title>
530
531 <indexterm>
532 <primary>scout program</primary>
533
534 <secondary>highlighting in</secondary>
535 </indexterm>
536
537 <indexterm>
538 <primary>highlighting statistics in scout display</primary>
539
540 <secondary>use of reverse video</secondary>
541 </indexterm>
542
543 <indexterm>
544 <primary>scout program</primary>
545
546 <secondary>reverse video</secondary>
547 </indexterm>
548
549 <indexterm>
550 <primary>reverse video</primary>
551
552 <secondary>use in scout program display</secondary>
553 </indexterm>
554
555 <para>To draw your attention to a statistic that currently exceed a threshold value, the <emphasis
556 role="bold">scout</emphasis> program displays it in reverse video (highlights it). You can set the threshold value for most
557 statistics, and so determine which values are worthy of special attention and which are normal.</para>
558
559 <sect3 id="HDRWQ333">
560 <title>Highlighting Server Outages</title>
561
562 <indexterm>
563 <primary>outages</primary>
564
565 <secondary>monitoring with scout program</secondary>
566 </indexterm>
567
568 <indexterm>
569 <primary>scout program</primary>
570
571 <secondary>outages, monitoring</secondary>
572 </indexterm>
573
574 <indexterm>
575 <primary>monitoring</primary>
576
577 <secondary>outages with scout program</secondary>
578 </indexterm>
579
580 <indexterm>
581 <primary>File Server</primary>
582
583 <secondary>monitoring with scout program</secondary>
584 </indexterm>
585
586 <indexterm>
587 <primary>file server machine</primary>
588
589 <secondary>monitoring outages of</secondary>
590 </indexterm>
591
592 <para>The only column in which you cannot control highlighting is the fifth, which identifies the file server machine for
593 which statistics are displayed in the other columns. The <emphasis role="bold">scout</emphasis> program uses highlighting in
594 this column to indicate that the File Server process on a machine fails to respond to its probe, and automatically blanks
595 out the other columns. Failure to respond to the probe can indicate a File Server process, file server machine, or network
596 outage, so the highlighting draws your attention to a situation that is probably interrupting service to users.</para>
597
598 <para>When the File Server process once again responds to the probes, its name appears normally and statistics reappear in
599 the other columns. If all machine names become highlighted at once, a possible network outage has disrupted the connection
600 between the file server machines and the client machine running the <emphasis role="bold">scout</emphasis> program.</para>
601 </sect3>
602
603 <sect3 id="Header_371">
604 <title>Highlighting for Extreme Statistic Values</title>
605
606 <para>To set the threshold value for one or more of the five statistics-displaying columns, use the <emphasis
607 role="bold">-attention</emphasis> argument. The threshold value applies to all File Server processes you are monitoring (you
608 cannot set different thresholds for different machines). For details, see the syntax description in <link
609 linkend="HDRWQ335">To start the scout program</link>.</para>
610
611 <para>It is not possible to change the threshold values for a running <emphasis role="bold">scout</emphasis> program. Stop
612 the current program and start a new one. Also, the <emphasis role="bold">scout</emphasis> program does not retain threshold
613 values across restarts, so you must specify all thresholds every time you start the program.</para>
614 </sect3>
615 </sect2>
616
617 <sect2 id="HDRWQ334">
618 <title>Resizing the scout Display</title>
619
620 <indexterm>
621 <primary>scout program</primary>
622
623 <secondary>display, resizing</secondary>
624 </indexterm>
625
626 <indexterm>
627 <primary>window</primary>
628
629 <secondary>resizing scout display</secondary>
630 </indexterm>
631
632 <indexterm>
633 <primary>resizing</primary>
634
635 <secondary>scout display</secondary>
636 </indexterm>
637
638 <para>Do not resize the display window while the <emphasis role="bold">scout</emphasis> program is running. Increasing the
639 size does no harm, but the <emphasis role="bold">scout</emphasis> program does not necessarily adjust to the new dimensions.
640 Decreasing the display's width can disturb column alignment, making the display harder to read. With any type of resizing, the
641 <emphasis role="bold">scout</emphasis> program does not adjust the display in any way until it displays the results of the
642 next probe.</para>
643
644 <para>To resize the display effectively, stop the <emphasis role="bold">scout</emphasis> program, resize the window and then
645 restart the program. Even in this case, the <emphasis role="bold">scout</emphasis> program's response depends on the accuracy
646 of the information it receives from the display environment. Testing during development has shown that the display environment
647 does not reliably provide information about window resizing. If you use the X windowing system, issuing the following sequence
648 of commands before starting the <emphasis role="bold">scout</emphasis> program (or placing them in the shell initialization
649 file) sometimes makes it adjust properly to resizing.</para>
650
651 <programlisting>
652 % <emphasis role="bold">set noglob</emphasis>
653 % <emphasis role="bold">eval '/usr/bin/X11/resize'</emphasis>
654 % <emphasis role="bold">unset noglob</emphasis>
655 </programlisting>
656
657 <indexterm>
658 <primary>starting</primary>
659
660 <secondary>scout program</secondary>
661 </indexterm>
662
663 <indexterm>
664 <primary>scout program</primary>
665
666 <secondary>starting</secondary>
667 </indexterm>
668
669 <indexterm>
670 <primary>initializing</primary>
671
672 <secondary>scout program</secondary>
673 </indexterm>
674
675 <indexterm>
676 <primary>scout program</primary>
677
678 <secondary>command syntax</secondary>
679 </indexterm>
680
681 <indexterm>
682 <primary>commands</primary>
683
684 <secondary>scout</secondary>
685 </indexterm>
686 </sect2>
687
688 <sect2 id="HDRWQ335">
689 <title>To start the scout program</title>
690
691 <orderedlist>
692 <listitem>
693 <para>Open a dedicated command shell. If necessary, adjust it to the appropriate size.</para>
694 </listitem>
695
696 <listitem>
697 <para>Issue the <emphasis role="bold">scout</emphasis> command to start the program. <programlisting>
698 % <emphasis role="bold">scout</emphasis> [<emphasis role="bold">initcmd</emphasis>] <emphasis role="bold">-server</emphasis> &lt;<replaceable>FileServer name(s) to monitor</replaceable>&gt;+ \
699 [<emphasis role="bold">-basename</emphasis> &lt;<replaceable>base server name</replaceable>&gt;] \
700 [<emphasis role="bold">-frequency</emphasis> &lt;<replaceable>poll frequency, in seconds</replaceable>&gt;] [<emphasis
701 role="bold">-host</emphasis>] \
702 [<emphasis role="bold">-attention</emphasis> &lt;<replaceable>specify attention (highlighting) level</replaceable>&gt;+] \
703 [<emphasis role="bold">-debug</emphasis> &lt;<replaceable>turn debugging output on to the named file</replaceable>&gt;]
704 </programlisting></para>
705
706 <para>where <variablelist>
707 <varlistentry>
708 <term><emphasis role="bold">initcmd</emphasis></term>
709
710 <listitem>
711 <para>Is an optional string that accommodates the command's use of the AFS command parser. It can be omitted and
712 ignored.</para>
713 </listitem>
714 </varlistentry>
715
716 <varlistentry>
717 <term><emphasis role="bold">-server</emphasis></term>
718
719 <listitem>
720 <para>Identifies each File Server process to monitor, by naming the file server machine it is running on. Provide
721 fully-qualified hostnames unless the <emphasis role="bold">-basename</emphasis> argument is used. In that case,
722 specify only the initial part of each machine name, omitting the domain name suffix common to all the machine
723 names.</para>
724 </listitem>
725 </varlistentry>
726
727 <varlistentry>
728 <term><emphasis role="bold">-basename</emphasis></term>
729
730 <listitem>
731 <para>Specifies the domain name suffix common to all of the file server machines named by the <emphasis
732 role="bold">-server</emphasis> argument. For discussion of this argument's effects, see <link
733 linkend="HDRWQ328">Using the -basename argument to Specify a Domain Name</link>.</para>
734
735 <para>Do not include the period that separates the domain suffix from the initial part of the machine name, but do
736 include any periods that occur within the suffix itself. (For example, in the Example Corporation cell, the proper
737 value is <emphasis role="bold">example.com</emphasis>, not <emphasis role="bold">.example.com</emphasis>.)</para>
738 </listitem>
739 </varlistentry>
740
741 <varlistentry>
742 <term><emphasis role="bold">-frequency</emphasis></term>
743
744 <listitem>
745 <para>Sets the frequency, in seconds, of the <emphasis role="bold">scout</emphasis> program's probes to File
746 Server processes. Specify an integer greater than 0 (zero). The default is 60 seconds.</para>
747 </listitem>
748 </varlistentry>
749
750 <varlistentry>
751 <term><emphasis role="bold">-host</emphasis></term>
752
753 <listitem>
754 <para>Displays the name of the machine that is running the <emphasis role="bold">scout</emphasis> program in the
755 display window's banner line. By default, no machine name is displayed.</para>
756 </listitem>
757 </varlistentry>
758
759 <varlistentry>
760 <term><emphasis role="bold">-attention</emphasis></term>
761
762 <listitem>
763 <para>Defines the threshold value at which to highlight one or more statistics. You can provide the pairs of
764 statistic and threshold in any order, separating each pair and the parts of each pair with one or more spaces. The
765 following list defines the syntax for each statistic.<variablelist>
766 <indexterm>
767 <primary>scout program</primary>
768
769 <secondary>attention levels, setting</secondary>
770 </indexterm>
771
772 <indexterm>
773 <primary>highlighting statistics in scout display</primary>
774
775 <secondary>setting thresholds</secondary>
776 </indexterm>
777
778 <indexterm>
779 <primary>thresholds for statistics in scout display</primary>
780
781 <secondary>setting</secondary>
782 </indexterm>
783
784 <varlistentry>
785 <term><emphasis role="bold">conn connections</emphasis></term>
786
787 <listitem>
788 <para>Highlights the value in the <computeroutput>Conn</computeroutput> (first) column when the number of
789 connections that the File Server has open to client machines exceeds the connections value. The
790 highlighting deactivates when the value goes back below the threshold. There is no default
791 threshold.</para>
792 </listitem>
793 </varlistentry>
794
795 <varlistentry>
796 <term><emphasis role="bold">fetch fetch_RPCs</emphasis></term>
797
798 <listitem>
799 <para>Highlights the value in the <computeroutput>Fetch</computeroutput> (second) column when the number
800 of fetch RPCs that clients have made to the File Server process exceeds the fetch_RPCs value. The
801 highlighting deactivates only when the File Server process restarts, at which time the value returns to
802 zero. There is no default threshold.</para>
803 </listitem>
804 </varlistentry>
805
806 <varlistentry>
807 <term><emphasis role="bold">store store_RPCs</emphasis></term>
808
809 <listitem>
810 <para>Highlights the value in the <computeroutput>Store</computeroutput> (third) column when the number of
811 store RPCs that clients have made to the File Server process exceeds the store_RPCs value. The
812 highlighting deactivates only when the File Server process restarts, at which time the value returns to
813 zero. There is no default threshold.</para>
814 </listitem>
815 </varlistentry>
816
817 <varlistentry>
818 <term><emphasis role="bold">ws active_clients</emphasis></term>
819
820 <listitem>
821 <para>Highlights the value in the <computeroutput>Ws</computeroutput> (fourth) column when the number of
822 active client machines (those that have contacted the File Server in the last 15 minutes) exceeds the
823 active_clients value. The highlighting deactivates when the value goes back below the threshold. There is
824 no default threshold.</para>
825 </listitem>
826 </varlistentry>
827
828 <varlistentry>
829 <term><emphasis role="bold">disk percent_full % or disk min_blocks</emphasis></term>
830
831 <listitem>
832 <para>Highlights the value for a partition in the <computeroutput>Disk attn</computeroutput> (sixth)
833 column when either the amount of disk space used exceeds the percentage indicated by thepercent_full
834 value, or the number of free KB blocks is less than the min_blocks value. The highlighting deactivates
835 when the value goes back below the percent_full threshold or above the min_blocks threshold.</para>
836
837 <para>The value you specify appears in the header of the sixth column following the string
838 <computeroutput>Disk attn</computeroutput>. The default threshold is 95% full.</para>
839
840 <para>Acceptable values for percent_full are the integers from the range <emphasis
841 role="bold">0</emphasis> (zero) to <emphasis role="bold">99</emphasis>, and you must include the percent
842 sign to distinguish this statistic from a min_blocks value..</para>
843 </listitem>
844 </varlistentry>
845 </variablelist></para>
846
847 <para>The following example sets the threshold for the <computeroutput>Conn</computeroutput> column to 100, for
848 the <computeroutput>Ws</computeroutput> column to 50, and for the <computeroutput>Disk attn</computeroutput>
849 column to 75%. There is no threshold for the <computeroutput>Fetch</computeroutput> and
850 <computeroutput>Store</computeroutput> columns.</para>
851
852 <para><emphasis role="bold">-attention conn 100 ws 50 disk 75%</emphasis></para>
853
854 <para>The following example has the same affect as the previous one except that it sets the threshold for the Disk
855 attn column to 5000 free KB blocks:</para>
856
857 <para><emphasis role="bold">-attention disk 5000 ws 50 conn 100</emphasis></para>
858 </listitem>
859 </varlistentry>
860
861 <varlistentry>
862 <term><emphasis role="bold">-debug</emphasis></term>
863
864 <listitem>
865 <para>Enables debugging output and directs it into the specified file. Partial pathnames are interpreted relative
866 to the current working directory. By default, no debugging output is produced.</para>
867 </listitem>
868 </varlistentry>
869 </variablelist></para>
870 </listitem>
871 </orderedlist>
872 </sect2>
873
874 <sect2 id="Header_374">
875 <title>To stop the scout program</title>
876
877 <indexterm>
878 <primary>scout program</primary>
879
880 <secondary>stopping</secondary>
881 </indexterm>
882
883 <orderedlist>
884 <listitem>
885 <para>Enter <emphasis role="bold">Ctrl-c</emphasis> in the display window. This is the proper interrupt signal even if the
886 general interrupt signal in your environment is different.</para>
887 </listitem>
888 </orderedlist>
889 </sect2>
890
891 <sect2 id="HDRWQ336">
892 <title>Example Commands and Displays</title>
893
894 <indexterm>
895 <primary>scout program</primary>
896
897 <secondary>examples (command and display)</secondary>
898 </indexterm>
899
900 <indexterm>
901 <primary>examples</primary>
902
903 <secondary>scout program display</secondary>
904 </indexterm>
905
906 <para>This section presents examples of the <emphasis role="bold">scout</emphasis> program, combining different arguments and
907 illustrating the screen displays that result.</para>
908
909 <para>In the first example, an administrator in the Example Corporation issues the <emphasis role="bold">scout</emphasis> command
910 without providing any optional arguments or flags. She includes the <emphasis role="bold">-server</emphasis> argument because
911 she is providing multiple machine names. She chooses to specify on the initial part of each machine's name even though she has
912 not used the <emphasis role="bold">-basename</emphasis> argument, relying on the cell's name service to obtain the
913 fully-qualified name that the <emphasis role="bold">scout</emphasis> program requires for establishing a connection.</para>
914
915 <programlisting>
916 % <emphasis role="bold">scout -server fs1 fs2</emphasis>
917 </programlisting>
918
919 <para><link linkend="FIGWQ337">Figure 2</link> depicts the resulting display. Notice first that the machine names in the fifth
920 (unlabeled) column appear in the format the administrator used on the command line. Now consider the second line in the
921 display region, where the machine name <computeroutput>fs2</computeroutput> appears in the fifth column. The
922 <computeroutput>Conn</computeroutput> and <computeroutput>Ws</computeroutput> columns together show that machine <emphasis
923 role="bold">fs2</emphasis> has 144 RPC connections open to 44 client machines, demonstrating that multiple connections per
924 client machine are possible. The <computeroutput>Fetch</computeroutput> column shows that client machines have made 2,734,278
925 fetch RPCs to machine <emphasis role="bold">fs2</emphasis> since the File Server process last started and the
926 <computeroutput>Store</computeroutput> column shows that they have made 34,066 store RPCs.</para>
927
928 <para>Six partition entries appear in the <computeroutput>Disk attn</computeroutput> column, marked
929 <computeroutput>a</computeroutput> through <computeroutput>f</computeroutput> (for <emphasis role="bold">/vicepa</emphasis>
930 through <emphasis role="bold">/vicepf</emphasis>). They appear on three lines in two subcolumns because of the width of the
931 window; if the window is wider, there are more subcolumns. Four of the partition entries (<computeroutput>a</computeroutput>,
932 <computeroutput>c</computeroutput>, <computeroutput>d</computeroutput>, and <computeroutput>e</computeroutput>) appear in
933 reverse video to indicate that they are more than 95% full (the threshold value that appears in the <computeroutput>Disk
934 attn</computeroutput> header).</para>
935
936 <figure id="FIGWQ337" label="2">
937 <title>First example scout display</title>
938
939 <mediaobject>
940 <imageobject>
941 <imagedata fileref="scout1.png" scale="50" />
942 </imageobject>
943 </mediaobject>
944 </figure>
945
946 <para><emphasis role="bold"> </emphasis></para>
947
948 <para>In the second example, the administrator uses more of the <emphasis role="bold">scout</emphasis> program's optional
949 arguments. <itemizedlist>
950 <listitem>
951 <para>She provides the machine names in the same form as in Example 1, but this time she also uses the <emphasis
952 role="bold">-basename</emphasis> argument to specify their domain name suffix, <emphasis role="bold">example.com</emphasis>.
953 This implies that the <emphasis role="bold">scout</emphasis> program does not need the name service to expand the names
954 to fully-qualified hostnames, but the name service still converts the hostnames to IP addresses.</para>
955 </listitem>
956
957 <listitem>
958 <para>She uses the <emphasis role="bold">-host</emphasis> flag to display in the banner line the name of the client
959 machine where the <emphasis role="bold">scout</emphasis> program is running.</para>
960 </listitem>
961
962 <listitem>
963 <para>She uses the <emphasis role="bold">-frequency</emphasis> argument to changes the probing frequency from its
964 default of once per minute to once every five seconds.</para>
965 </listitem>
966
967 <listitem>
968 <para>She uses the <emphasis role="bold">-attention</emphasis> argument to changes the highlighting threshold for
969 partitions to a 5000 KB minimum rather than the default of 95% full.</para>
970 </listitem>
971 </itemizedlist></para>
972
973 <programlisting>
974 % <emphasis role="bold">scout -server fs1 fs2 -basename example.com -host -frequency 5 -attention disk 5000</emphasis>
975 </programlisting>
976
977 <para>The use of optional arguments results in several differences between <link linkend="FIGWQ338">Figure 3</link> and <link
978 linkend="FIGWQ337">Figure 2</link>. First, because the <emphasis role="bold">-host</emphasis> flag is included, the banner
979 line displays the name of the machine running the <emphasis role="bold">scout</emphasis> process as
980 <computeroutput>[client52]</computeroutput> along with the basename <computeroutput>example.com</computeroutput> specified with
981 the <emphasis role="bold">-basename</emphasis> argument.</para>
982
983 <para>Another difference is that two rather than four of machine <emphasis role="bold">fs2</emphasis>'s partitions appear in
984 reverse video, even though their values are almost the same as in <link linkend="FIGWQ337">Figure 2</link>. This is because
985 the administrator changed the highlight threshold to a 5000 block minimum, as also reflected in the <computeroutput>Disk
986 attn</computeroutput> column's header. And while machine <emphasis role="bold">fs2</emphasis>'s partitions <emphasis
987 role="bold">/vicepa</emphasis> and <emphasis role="bold">/vicepd</emphasis> are still 95% full, they have more than 5000 free
988 blocks left; partitions <emphasis role="bold">/vicepc</emphasis> and <emphasis role="bold">/vicepe</emphasis> are highlighted
989 because they have fewer than 5000 blocks free.</para>
990
991 <para>Note also the result of changing the probe frequency, reflected in the probe reporting line at the bottom left corner of
992 the display. Both this example and the previous one represent a time lapse of one minute after the administrator issues the
993 <emphasis role="bold">scout</emphasis> command. In this example, however, the <emphasis role="bold">scout</emphasis> program
994 has probed the File Server processes 12 times as opposed to once</para>
995
996 <figure id="FIGWQ338" label="3">
997 <title>Second example scout display</title>
998
999 <mediaobject>
1000 <imageobject>
1001 <imagedata fileref="scout2.png" scale="50" />
1002 </imageobject>
1003 </mediaobject>
1004 </figure>
1005
1006 <para><emphasis role="bold"> </emphasis></para>
1007
1008 <para>In <link linkend="FIGWQ339">Figure 4</link>, an administrator in the State University cell monitors three of that cell's
1009 file server machines. He uses the <emphasis role="bold">-basename</emphasis> argument to specify the <emphasis
1010 role="bold">stateu.edu</emphasis> domain name.</para>
1011
1012 <programlisting>
1013 % <emphasis role="bold">scout -server server2 server3 server4 -basename stateu.edu</emphasis>
1014 </programlisting>
1015
1016 <figure id="FIGWQ339" label="4">
1017 <title>Third example scout display</title>
1018
1019 <mediaobject>
1020 <imageobject>
1021 <imagedata fileref="scout3.png" scale="50" />
1022 </imageobject>
1023 </mediaobject>
1024 </figure>
1025
1026 <para><emphasis role="bold"> </emphasis></para>
1027
1028 <para><link linkend="FIGWQ340">Figure 5</link> illustrates three of the <emphasis role="bold">scout</emphasis> program's
1029 features. First, you can monitor file server machines from different cells in a single display: <emphasis
1030 role="bold">fs1.abc.com</emphasis>, <emphasis role="bold">server3.stateu.edu</emphasis>, and <emphasis
1031 role="bold">sv7.def.com</emphasis>. Because the machines belong to different cells, it is not possible to provide the
1032 <emphasis role="bold">-basename</emphasis> argument.</para>
1033
1034 <para>Second, it illustrates how the display must truncate machine names that do not fit in the fifth column, using an
1035 asterisk at the end of the name to show that it is shortened.</para>
1036
1037 <para>Third, it illustrates what happens when the <emphasis role="bold">scout</emphasis> process cannot reach a File Server
1038 process, in this case the one on the machine <emphasis role="bold">sv7.def.com</emphasis>: it highlights the machine name and
1039 blanks out the values in the other columns.</para>
1040
1041 <figure id="FIGWQ340" label="5">
1042 <title>Fourth example scout display</title>
1043
1044 <mediaobject>
1045 <imageobject>
1046 <imagedata fileref="scout4.png" scale="50" />
1047 </imageobject>
1048 </mediaobject>
1049 </figure>
1050 </sect2>
1051 </sect1>
1052
1053 <sect1 id="HDRWQ341">
1054 <title>Using the fstrace Command Suite</title>
1055
1056 <para>This section describes the <emphasis role="bold">fstrace</emphasis> commands that system administrators employ to trace
1057 Cache Manager activity for debugging purposes. It assumes the reader is familiar with the Cache Manager concepts described in
1058 <link linkend="HDRWQ387">Administering Client Machines and the Cache Manager</link>.</para>
1059
1060 <para>The <emphasis role="bold">fstrace</emphasis> command suite monitors the internal activity of the Cache Manager and enables
1061 you to record, or trace, its operations in detail. The operations, which are termed <emphasis>events</emphasis>, comprise the
1062 <emphasis role="bold">cm</emphasis> <emphasis>event set</emphasis>. Examples of <emphasis role="bold">cm</emphasis> events are
1063 fetching files and looking up information for a listing of files and subdirectories using the UNIX <emphasis
1064 role="bold">ls</emphasis> command.</para>
1065
1066 <para>Following are the <emphasis role="bold">fstrace</emphasis> commands and their respective functions: <itemizedlist>
1067 <listitem>
1068 <para>The <emphasis role="bold">fstrace apropos</emphasis> command provides a short description of commands.</para>
1069 </listitem>
1070
1071 <listitem>
1072 <para>The <emphasis role="bold">fstrace clear</emphasis> command clears the trace log.</para>
1073 </listitem>
1074
1075 <listitem>
1076 <para>The <emphasis role="bold">fstrace dump</emphasis> command dumps the contents of the trace log.</para>
1077 </listitem>
1078
1079 <listitem>
1080 <para>The <emphasis role="bold">fstrace help</emphasis> command provides a description and syntax for commands.</para>
1081 </listitem>
1082
1083 <listitem>
1084 <para>The <emphasis role="bold">fstrace lslog</emphasis> command lists information about the trace log.</para>
1085 </listitem>
1086
1087 <listitem>
1088 <para>The <emphasis role="bold">fstrace lsset</emphasis> command lists information about the event set.</para>
1089 </listitem>
1090
1091 <listitem>
1092 <para>The <emphasis role="bold">fstrace setlog</emphasis> command changes the size of the trace log.</para>
1093 </listitem>
1094
1095 <listitem>
1096 <para>The <emphasis role="bold">fstrace setset</emphasis> command sets the state of the event set.</para>
1097 </listitem>
1098 </itemizedlist></para>
1099
1100 <sect2 id="HDRWQ342">
1101 <title>About the fstrace Command Suite</title>
1102
1103 <para>The <emphasis role="bold">fstrace</emphasis> command suite replaces and greatly expands the functionality formerly
1104 provided by the <emphasis role="bold">fs debug</emphasis> command. Its intended use is to aid in diagnosis of specific Cache
1105 Manager problems, such as client machine hangs, cache consistency problems, clock synchronization errors, and failures to
1106 access a volume or AFS file. Therefore, it is best not to keep <emphasis role="bold">fstrace</emphasis> logging enabled at all
1107 times, unlike the logging for AFS server processes.</para>
1108
1109 <para>Most of the messages in the trace log correspond to low-level Cache Manager operations. It is likely that only personnel
1110 familiar with the AFS source code can interpret them. If you have an AFS source license, you can attempt to interpret the
1111 trace yourself, or work with the AFS Product Support group to resolve the underlying problems. If you do not have an AFS
1112 source license, it is probably more efficient to contact the AFS Product Support group immediately in case of problems. They
1113 can instruct you to activate <emphasis role="bold">fstrace</emphasis> tracing if appropriate.</para>
1114
1115 <para>The log can grow in size very quickly; this can use valuable disk space if you are writing to a file in the local file
1116 space. Additionally, if the size of the log becomes too large, it can become difficult to parse the results for pertinent
1117 information.</para>
1118
1119 <indexterm>
1120 <primary>cmfx trace log (fstrace)</primary>
1121 </indexterm>
1122
1123 <indexterm>
1124 <primary>trace log from (fstrace)</primary>
1125
1126 <secondary>cmfx</secondary>
1127 </indexterm>
1128
1129 <para>When AFS tracing is enabled, each time a <emphasis role="bold">cm</emphasis> event occurs, a message is written to the
1130 trace log, <emphasis role="bold">cmfx</emphasis>. To diagnose a problem, read the output of the trace log and analyze the
1131 operations executed by the Cache Manager. The default size of the trace log is 60 KB, but you can increase or decrease
1132 it.</para>
1133
1134 <indexterm>
1135 <primary>cm event set (fstrace)</primary>
1136 </indexterm>
1137
1138 <indexterm>
1139 <primary>event set (fstrace)</primary>
1140
1141 <secondary>cm</secondary>
1142 </indexterm>
1143
1144 <para>To use the <emphasis role="bold">fstrace</emphasis> command suite, you must first enable tracing and reserve, or
1145 allocate, space for the trace log with the <emphasis role="bold">fstrace setset</emphasis> command. With this command, you can
1146 set the <emphasis role="bold">cm</emphasis> event set to one of three states to enable or disable tracing for the event set
1147 and to allocate or deallocate space for the trace log in the kernel: <variablelist>
1148 <indexterm>
1149 <primary>active</primary>
1150
1151 <secondary>state of fstrace event set</secondary>
1152 </indexterm>
1153
1154 <indexterm>
1155 <primary>inactive (state of fstrace event set)</primary>
1156 </indexterm>
1157
1158 <indexterm>
1159 <primary>dormant (state of fstrace event set)</primary>
1160 </indexterm>
1161
1162 <varlistentry>
1163 <term><computeroutput>active</computeroutput></term>
1164
1165 <listitem>
1166 <para>Enables tracing for the event set and allocates space for the trace log.</para>
1167 </listitem>
1168 </varlistentry>
1169
1170 <varlistentry>
1171 <term><computeroutput>inactive</computeroutput></term>
1172
1173 <listitem>
1174 <para>Temporarily disables tracing for the event set; however, the event set continues to allocate space occupied by
1175 the log to which it sends data.</para>
1176 </listitem>
1177 </varlistentry>
1178
1179 <varlistentry>
1180 <term><computeroutput>dormant</computeroutput></term>
1181
1182 <listitem>
1183 <para>Disables tracing for the event set; furthermore, the event set releases the space occupied by the log to which
1184 it sends data. When the <emphasis role="bold">cm</emphasis> event set that sends data to the <emphasis
1185 role="bold">cmfx</emphasis> trace log is in this state, the space allocated for that log is freed or
1186 deallocated.</para>
1187 </listitem>
1188 </varlistentry>
1189 </variablelist></para>
1190
1191 <indexterm>
1192 <primary>persistent fstrace event set or trace log</primary>
1193 </indexterm>
1194
1195 <indexterm>
1196 <primary>trace log (fstrace)</primary>
1197
1198 <secondary>persistence</secondary>
1199 </indexterm>
1200
1201 <indexterm>
1202 <primary>event set (fstrace)</primary>
1203
1204 <secondary>persistence</secondary>
1205 </indexterm>
1206
1207 <para>Both event sets and trace logs can be designated as <emphasis>persistent</emphasis>, which prevents accidental resetting
1208 of an event set's state or clearing of a trace log. The designation is made as the kernel is compiled and cannot be
1209 changed.</para>
1210
1211 <para>If an event set such as <emphasis role="bold">cm</emphasis> is persistent, you can change its state only by including
1212 the <emphasis role="bold">-set</emphasis> argument to the <emphasis role="bold">fstrace setset</emphasis> command. (That is,
1213 you cannot change its state along with the state of all other event sets by issuing the <emphasis role="bold">fstrace
1214 setset</emphasis> command with no arguments.) Similarly, if a trace log such as <emphasis role="bold">cmfx</emphasis> is
1215 persistent, you can clear it only by including either the <emphasis role="bold">-set</emphasis> or <emphasis
1216 role="bold">-log</emphasis> argument to the <emphasis role="bold">fstrace clear</emphasis> command (you cannot clear it along
1217 with all other trace logs by issuing the <emphasis role="bold">fstrace clear</emphasis> command with no arguments.)</para>
1218
1219 <para>When a problem occurs, set the <emphasis role="bold">cm</emphasis> event set to active using the <emphasis
1220 role="bold">fstrace setset</emphasis> command. When tracing is enabled on a busy AFS client, the volume of events being
1221 recorded is significant; therefore, when you are diagnosing problems, restrict AFS activity as much as possible to minimize
1222 the amount of extraneous tracing in the log. Because tracing can have a negative impact on system performance, leave <emphasis
1223 role="bold">cm</emphasis> tracing in the dormant state when you are not diagnosing problems.</para>
1224
1225 <para>If a problem is reproducible, clear the <emphasis role="bold">cmfx</emphasis> trace log with the <emphasis
1226 role="bold">fstrace clear</emphasis> command and reproduce the problem. If the problem is not easily reproduced, keep the
1227 state of the event set active until the problem recurs.</para>
1228
1229 <para>To view the contents of the trace log and analyze the <emphasis role="bold">cm</emphasis> events, use the <emphasis
1230 role="bold">fstrace dump</emphasis> command to copy the content lines of the trace log to standard output (stdout) or to a
1231 file.</para>
1232
1233 <note>
1234 <para>If a particular command or process is causing problems, determine its process id (PID). Search the output of the
1235 <emphasis role="bold">fstrace dump</emphasis> command for the PID to find only those lines associated with the
1236 problem.</para>
1237 </note>
1238 </sect2>
1239
1240 <sect2 id="HDRWQ343">
1241 <title>Requirements for Using the fstrace Command Suite</title>
1242
1243 <indexterm>
1244 <primary>privilege</primary>
1245
1246 <secondary>required for fstrace commands</secondary>
1247 </indexterm>
1248
1249 <indexterm>
1250 <primary>fstrace commands</primary>
1251
1252 <secondary>privilege requirements</secondary>
1253 </indexterm>
1254
1255 <para>Except for the <emphasis role="bold">fstrace help</emphasis> and <emphasis role="bold">fstrace apropos</emphasis>
1256 commands, which require no privilege, issuing the <emphasis role="bold">fstrace</emphasis> commands requires that the issuer
1257 be logged in as the local superuser <emphasis role="bold">root</emphasis> on the local client machine. Before issuing an
1258 <emphasis role="bold">fstrace</emphasis> command, verify that you have the necessary privilege.</para>
1259
1260 <para>The Cache Manager catalog must be in place so that logging can occur. The <emphasis role="bold">fstrace</emphasis>
1261 command suite uses the standard UNIX catalog utilities. The default location is <emphasis
1262 role="bold">/usr/vice/etc/C/afszcm.cat</emphasis>. It can be placed in another directory by placing the file elsewhere and
1263 using the proper NLSPATH and LANG environment variables.</para>
1264 </sect2>
1265
1266 <sect2 id="Header_379">
1267 <title>Using fstrace Commands Effectively</title>
1268
1269 <para>To use <emphasis role="bold">fstrace</emphasis> commands most effectively, configure them as indicated: <itemizedlist>
1270 <listitem>
1271 <para>Store the <emphasis role="bold">fstrace</emphasis> binary in a local disk directory.</para>
1272 </listitem>
1273
1274 <listitem>
1275 <para>When you dump the <emphasis role="bold">fstrace</emphasis> log to a file, direct it to one on the local
1276 disk.</para>
1277 </listitem>
1278
1279 <listitem>
1280 <para>The trace can grow large in just a few minutes. Before attempting to dump the log to a local file, verify that you
1281 have enough room. Be particularly careful if you are using disk quotas on partitions in the local file system.</para>
1282 </listitem>
1283
1284 <listitem>
1285 <para>Attempt to limit Cache Manager activity on the AFS client machine other than the problem operation. This reduces
1286 the amount of extraneous data in the trace.</para>
1287 </listitem>
1288
1289 <listitem>
1290 <para>Activate the <emphasis role="bold">fstrace</emphasis> log for the shortest possibly period of time. If possible
1291 activate the trace immediately before performing the problem operation, deactivate it as soon as the operation
1292 completes, and dump the trace log to a file immediately.</para>
1293 </listitem>
1294
1295 <listitem>
1296 <para>If possible, obtain UNIX process ID (PID) of the command or program that initiates the problematic operation. This
1297 enables the person analyzing the trace log to search it for messages associated with the PID.</para>
1298 </listitem>
1299 </itemizedlist></para>
1300 </sect2>
1301
1302 <sect2 id="HDRWQ344">
1303 <title>Activating the Trace Log</title>
1304
1305 <para>To start Cache Manager tracing on an AFS client machine, you must first configure <itemizedlist>
1306 <listitem>
1307 <para>The <emphasis role="bold">cmfx</emphasis> kernel trace log using the <emphasis role="bold">fstrace
1308 setlog</emphasis> command</para>
1309 </listitem>
1310
1311 <listitem>
1312 <para>The <emphasis role="bold">cm</emphasis> event set using the <emphasis role="bold">fstrace setset</emphasis>
1313 command</para>
1314 </listitem>
1315 </itemizedlist></para>
1316
1317 <para>The <emphasis role="bold">fstrace setlog</emphasis> command sets the size of the <emphasis role="bold">cmfx</emphasis>
1318 kernel trace log in kilobytes. The trace log occupies 60 kilobytes of kernel by default. If the trace log already exists, it
1319 is cleared when this command is issued and a new log of the given size is created. Otherwise, a new log of the desired size is
1320 created.</para>
1321
1322 <para>The <emphasis role="bold">fstrace setset</emphasis> command sets the state of the <emphasis role="bold">cm</emphasis>
1323 kernel event set. The state of the <emphasis role="bold">cm</emphasis> event set determines whether information on the events
1324 in that event set is logged.</para>
1325
1326 <para>After establishing kernel tracing on the AFS client machine, you can check the state of the event set and the size of
1327 the kernel buffer allocated for the trace log. To display information about the state of the <emphasis
1328 role="bold">cm</emphasis> event set, issue the <emphasis role="bold">fstrace lsset</emphasis> command. To display information
1329 about the <emphasis role="bold">cmfx</emphasis> trace log, use the <emphasis role="bold">fstrace lslog</emphasis> command. See
1330 the instructions in <link linkend="HDRWQ346">Displaying the State of a Trace Log or Event Set</link>.</para>
1331
1332 <indexterm>
1333 <primary>fstrace commands</primary>
1334
1335 <secondary>setlog</secondary>
1336 </indexterm>
1337
1338 <indexterm>
1339 <primary>commands</primary>
1340
1341 <secondary>fstrace setlog</secondary>
1342 </indexterm>
1343
1344 <indexterm>
1345 <primary>trace log (fstrace)</primary>
1346
1347 <secondary>configuring</secondary>
1348 </indexterm>
1349
1350 <indexterm>
1351 <primary>configuring</primary>
1352
1353 <secondary>trace log (fstrace)</secondary>
1354 </indexterm>
1355 </sect2>
1356
1357 <sect2 id="Header_381">
1358 <title>To configure the trace log</title>
1359
1360 <orderedlist>
1361 <listitem>
1362 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1363 the <emphasis role="bold">su</emphasis> command. <programlisting>
1364 % <emphasis role="bold">su root</emphasis>
1365 Password: &lt;<replaceable>root_password</replaceable>&gt;
1366 </programlisting></para>
1367 </listitem>
1368
1369 <listitem>
1370 <para>Issue the <emphasis role="bold">fstrace setlog</emphasis> command to set the size of the <emphasis
1371 role="bold">cmfx</emphasis> kernel trace log. <programlisting>
1372 # <emphasis role="bold">fstrace setlog</emphasis> [<emphasis role="bold">-log</emphasis> &lt;<replaceable>log_name</replaceable>&gt;+] <emphasis
1373 role="bold">-buffersize</emphasis> &lt;<replaceable>1-kilobyte_units</replaceable>&gt;
1374 </programlisting></para>
1375 </listitem>
1376 </orderedlist>
1377
1378 <para>The following example sets the size of the <emphasis role="bold">cmfx</emphasis> trace log to 80 KB.</para>
1379
1380 <programlisting>
1381 # <emphasis role="bold">fstrace setlog cmfx 80</emphasis>
1382 </programlisting>
1383
1384 <indexterm>
1385 <primary>fstrace commands</primary>
1386
1387 <secondary>setset</secondary>
1388 </indexterm>
1389
1390 <indexterm>
1391 <primary>commands</primary>
1392
1393 <secondary>fstrace setset</secondary>
1394 </indexterm>
1395
1396 <indexterm>
1397 <primary>event set (fstrace)</primary>
1398
1399 <secondary>setting</secondary>
1400 </indexterm>
1401
1402 <indexterm>
1403 <primary>setting</primary>
1404
1405 <secondary>event set (fstrace)</secondary>
1406 </indexterm>
1407 </sect2>
1408
1409 <sect2 id="HDRWQ345">
1410 <title>To set the event set</title>
1411
1412 <orderedlist>
1413 <listitem>
1414 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1415 the <emphasis role="bold">su</emphasis> command. <programlisting>
1416 % <emphasis role="bold">su root</emphasis>
1417 Password: &lt;<replaceable>root_password</replaceable>&gt;
1418 </programlisting></para>
1419 </listitem>
1420
1421 <listitem>
1422 <para>Issue the <emphasis role="bold">fstrace setset</emphasis> command to set the state of event sets. <programlisting>
1423 % <emphasis role="bold">fstrace setset</emphasis> [<emphasis role="bold">-set</emphasis> &lt;<replaceable>set_name</replaceable>&gt;+] [<emphasis
1424 role="bold">-active</emphasis>] [<emphasis role="bold">-inactive</emphasis>] \
1425 [<emphasis role="bold">-dormant</emphasis>]
1426 </programlisting></para>
1427 </listitem>
1428 </orderedlist>
1429
1430 <para>The following example activates the <emphasis role="bold">cm</emphasis> event set.</para>
1431
1432 <programlisting>
1433 # <emphasis role="bold">fstrace setset cm -active</emphasis>
1434 </programlisting>
1435 </sect2>
1436
1437 <sect2 id="HDRWQ346">
1438 <title>Displaying the State of a Trace Log or Event Set</title>
1439
1440 <para>An event set must be in the <emphasis>active state</emphasis> to be included in the trace log. To display an event set's
1441 state, use the <emphasis role="bold">fstrace lsset</emphasis> command. To set its state, issue the <emphasis
1442 role="bold">fstrace setset</emphasis> command as described in <link linkend="HDRWQ345">To set the event set</link>.</para>
1443
1444 <para>To display size and allocation information for the trace log, issue the <emphasis role="bold">fstrace
1445 lslog</emphasis>command with the <emphasis role="bold">-long</emphasis> argument.</para>
1446
1447 <indexterm>
1448 <primary>fstrace commands</primary>
1449
1450 <secondary>lsset</secondary>
1451 </indexterm>
1452
1453 <indexterm>
1454 <primary>commands</primary>
1455
1456 <secondary>fstrace lsset</secondary>
1457 </indexterm>
1458
1459 <indexterm>
1460 <primary>event set (fstrace)</primary>
1461
1462 <secondary>displaying state</secondary>
1463 </indexterm>
1464
1465 <indexterm>
1466 <primary>displaying</primary>
1467
1468 <secondary>state of event set (fstrace)</secondary>
1469 </indexterm>
1470 </sect2>
1471
1472 <sect2 id="Header_384">
1473 <title>To display the state of an event set</title>
1474
1475 <orderedlist>
1476 <listitem>
1477 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1478 the <emphasis role="bold">su</emphasis> command. <programlisting>
1479 % <emphasis role="bold">su root</emphasis>
1480 Password: &lt;<replaceable>root_password</replaceable>&gt;
1481 </programlisting></para>
1482 </listitem>
1483
1484 <listitem>
1485 <para>Issue the <emphasis role="bold">fstrace lsset</emphasis> command to display the available event set and its state.
1486 <programlisting>
1487 # <emphasis role="bold">fstrace lsset</emphasis> [<emphasis role="bold">-set</emphasis> &lt;<replaceable>set_name</replaceable>&gt;+]
1488 </programlisting></para>
1489 </listitem>
1490 </orderedlist>
1491
1492 <para>The following example displays the event set and its state on the local machine.</para>
1493
1494 <programlisting>
1495 # <emphasis role="bold">fstrace lsset cm</emphasis>
1496 Available sets:
1497 cm active
1498 </programlisting>
1499
1500 <para>The output from this command lists the event set and its states. The three event states for the <emphasis
1501 role="bold">cm</emphasis> event set are: <variablelist>
1502 <varlistentry>
1503 <term><emphasis role="bold">active</emphasis></term>
1504
1505 <listitem>
1506 <para>Tracing is enabled.</para>
1507 </listitem>
1508 </varlistentry>
1509
1510 <varlistentry>
1511 <term><emphasis role="bold">inactive</emphasis></term>
1512
1513 <listitem>
1514 <para>Tracing is disabled, but space is still allocated for the corresponding trace log (<emphasis
1515 role="bold">cmfx</emphasis>).</para>
1516 </listitem>
1517 </varlistentry>
1518
1519 <varlistentry>
1520 <term><emphasis role="bold">dormant</emphasis></term>
1521
1522 <listitem>
1523 <para>Tracing is disabled, and space is no longer allocated for the corresponding trace log (<emphasis
1524 role="bold">cmfx</emphasis>).Disables tracing for the event set.</para>
1525 </listitem>
1526 </varlistentry>
1527 </variablelist></para>
1528
1529 <indexterm>
1530 <primary>fstrace commands</primary>
1531
1532 <secondary>lslog</secondary>
1533 </indexterm>
1534
1535 <indexterm>
1536 <primary>commands</primary>
1537
1538 <secondary>fstrace lslog</secondary>
1539 </indexterm>
1540
1541 <indexterm>
1542 <primary>trace log (fstrace)</primary>
1543
1544 <secondary>displaying state</secondary>
1545 </indexterm>
1546
1547 <indexterm>
1548 <primary>displaying</primary>
1549
1550 <secondary>state of trace log (fstrace)</secondary>
1551 </indexterm>
1552 </sect2>
1553
1554 <sect2 id="Header_385">
1555 <title>To display the log size</title>
1556
1557 <orderedlist>
1558 <listitem>
1559 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1560 the <emphasis role="bold">su</emphasis> command. <programlisting>
1561 % <emphasis role="bold">su root</emphasis>
1562 Password: &lt;<replaceable>root_password</replaceable>&gt;
1563 </programlisting></para>
1564 </listitem>
1565
1566 <listitem>
1567 <para>Issue the <emphasis role="bold">fstrace lslog</emphasis> command to display information about the kernel trace log.
1568 <programlisting>
1569 # <emphasis role="bold">fstrace lslog</emphasis> [<emphasis role="bold">-set</emphasis> &lt;<replaceable>set_name</replaceable>&gt;+] [<emphasis
1570 role="bold">-log</emphasis> &lt;<replaceable>log_name</replaceable>&gt;] [<emphasis role="bold">-long</emphasis>]
1571 </programlisting></para>
1572 </listitem>
1573 </orderedlist>
1574
1575 <para>The following example uses the <emphasis role="bold">-long</emphasis> flag to display additional information about the
1576 <emphasis role="bold">cmfx</emphasis> trace log.</para>
1577
1578 <programlisting>
1579 # <emphasis role="bold">fstrace lslog cmfx -long</emphasis>
1580 Available logs:
1581 cmfx : 60 kbytes (allocated)
1582 </programlisting>
1583
1584 <para>The output from this command lists information on the trace log. When issued without the <emphasis
1585 role="bold">-long</emphasis> flag, the <emphasis role="bold">fstrace lslog</emphasis> command lists only the name of the log.
1586 When issued with the <emphasis role="bold">-long</emphasis> flag, the <emphasis role="bold">fstrace lslog</emphasis> command
1587 lists the log, the size of the log in kilobytes, and the allocation state of the log.</para>
1588
1589 <para>There are two allocation states for the kernel trace log: <variablelist>
1590 <varlistentry>
1591 <term><computeroutput>allocated</computeroutput></term>
1592
1593 <listitem>
1594 <para>Space is reserved for the log in the kernel. This indicates that the event set that writes to this log is either
1595 <emphasis>active</emphasis> (tracing is enabled for the event set) or <emphasis>inactive</emphasis> (tracing is
1596 temporarily disabled for the event set); however, the event set continues to reserve space occupied by the log to
1597 which it sends data.</para>
1598 </listitem>
1599 </varlistentry>
1600
1601 <varlistentry>
1602 <term><computeroutput>unallocated</computeroutput></term>
1603
1604 <listitem>
1605 <para>Space is not reserved for the log in the kernel. This indicates that the event set that writes to this log is
1606 <emphasis>dormant</emphasis> (tracing is disabled for the event set); furthermore, the event set releases the space
1607 occupied by the log to which it sends data.</para>
1608 </listitem>
1609 </varlistentry>
1610 </variablelist></para>
1611 </sect2>
1612
1613 <sect2 id="HDRWQ347">
1614 <title>Dumping and Clearing the Trace Log</title>
1615
1616 <para>After the Cache Manager operation you want to trace is complete, use the <emphasis role="bold">fstrace dump</emphasis>
1617 command to dump the trace log to the standard output stream or to the file named by the <emphasis role="bold">-file</emphasis>
1618 argument. Or, to dump the trace log continuously, use the <emphasis role="bold">-follow</emphasis> argument (combine it with
1619 the <emphasis role="bold">-file</emphasis> argument if desired). To halt continuous dumping, press an interrupt signal such as
1620 &lt;<emphasis role="bold">Ctrl-c</emphasis>&gt;.</para>
1621
1622 <para>To clear a trace log when you no longer need the data in it, issue the <emphasis role="bold">fstrace clear</emphasis>
1623 command. (The <emphasis role="bold">fstrace setlog</emphasis> command also clears an existing trace log automatically when you
1624 use it to change the log's size.)</para>
1625
1626 <indexterm>
1627 <primary>fstrace commands</primary>
1628
1629 <secondary>dump</secondary>
1630 </indexterm>
1631
1632 <indexterm>
1633 <primary>commands</primary>
1634
1635 <secondary>fstrace dump</secondary>
1636 </indexterm>
1637
1638 <indexterm>
1639 <primary>trace log (fstrace)</primary>
1640
1641 <secondary>dumping</secondary>
1642 </indexterm>
1643
1644 <indexterm>
1645 <primary>displaying</primary>
1646
1647 <secondary>contents of trace log (fstrace)</secondary>
1648 </indexterm>
1649
1650 <indexterm>
1651 <primary>dumping</primary>
1652
1653 <secondary>trace log contents (fstrace)</secondary>
1654 </indexterm>
1655 </sect2>
1656
1657 <sect2 id="Header_387">
1658 <title>To dump the contents of a trace log</title>
1659
1660 <orderedlist>
1661 <listitem>
1662 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1663 the <emphasis role="bold">su</emphasis> command. <programlisting>
1664 % <emphasis role="bold">su root</emphasis>
1665 Password: &lt;<replaceable>root_password</replaceable>&gt;
1666 </programlisting></para>
1667 </listitem>
1668
1669 <listitem>
1670 <para>Issue the <emphasis role="bold">fstrace dump</emphasis> command to dump trace logs. <programlisting>
1671 # <emphasis role="bold">fstrace dump</emphasis> [<emphasis role="bold">-set</emphasis> &lt;<replaceable>set_name</replaceable>&gt;+] [<emphasis
1672 role="bold">-follow</emphasis> &lt;<replaceable>log_name</replaceable>&gt;] \
1673 [<emphasis role="bold">-file</emphasis> &lt;<replaceable>output_filename</replaceable>&gt;] \
1674 [<emphasis role="bold">-sleep</emphasis> &lt;<replaceable>seconds_between_reads</replaceable>&gt;]
1675 </programlisting></para>
1676 </listitem>
1677 </orderedlist>
1678
1679 <para>At the beginning of the output of each dump is a header specifying the date and time at which the dump began. The number
1680 of logs being dumped is also displayed if the <emphasis role="bold">-follow</emphasis> argument is not specified. The header
1681 appears as follows:</para>
1682
1683 <programlisting>
1684 AFS Trace Dump --
1685 Date: date time
1686 Found n logs.
1687 </programlisting>
1688
1689 <para>where <emphasis>date</emphasis> is the starting date of the trace log dump, <emphasis>time</emphasis> is the starting
1690 time of the trace log dump, and <emphasis>n</emphasis> specifies the number of logs found by the <emphasis role="bold">fstrace
1691 dump</emphasis> command.</para>
1692
1693 <para>The following is an example of trace log dump header:</para>
1694
1695 <programlisting>
1696 AFS Trace Dump --
1697 Date: Fri Apr 16 10:44:38 1999
1698 Found 1 logs.
1699 </programlisting>
1700
1701 <para>The contents of the log follow the header and are comprised of messages written to the log from an active event set. The
1702 messages written to the log contain the following three components: <itemizedlist>
1703 <listitem>
1704 <para>The timestamp associated with the message (number of seconds from an arbitrary start point)</para>
1705 </listitem>
1706
1707 <listitem>
1708 <para>The process ID or thread ID associated with the message</para>
1709 </listitem>
1710
1711 <listitem>
1712 <para>The message itself</para>
1713 </listitem>
1714 </itemizedlist></para>
1715
1716 <para>A trace log message is formatted as follows:</para>
1717
1718 <programlisting>
1719 time timestamp, pid pid:event message
1720 </programlisting>
1721
1722 <para>where <emphasis>timestamp</emphasis> is the number of seconds from an arbitrary start point, <emphasis>pid</emphasis> is
1723 the process ID number of the Cache Manager event, and <emphasis>event message</emphasis> is the Cache Manager event which
1724 corresponds with a function in the AFS source code.</para>
1725
1726 <para>The following is an example of a dumped trace log message:</para>
1727
1728 <programlisting>
1729 time 749.641274, pid 3002:Returning code 2 from 19
1730 </programlisting>
1731
1732 <para>For the messages in the trace log to be most readable, the Cache Manager catalog file needs to be installed on the local
1733 disk of the client machine; the conventional location is <emphasis role="bold">/usr/vice/etc/C/afszcm.cat</emphasis>. Log
1734 messages that begin with the string <computeroutput>raw op</computeroutput>, like the following, indicate that the catalog is
1735 not installed.</para>
1736
1737 <programlisting>
1738 raw op 232c, time 511.916288, pid 0
1739 p0:Fri Apr 16 10:36:31 1999
1740 </programlisting>
1741
1742 <para>Every 1024 seconds, a current time message is written to each log. This message has the following format:</para>
1743
1744 <programlisting>
1745 time timestamp, pid pid: Current time: unix_time
1746 </programlisting>
1747
1748 <para>where timestamp is the number of seconds from an arbitrary start point, pid is the process ID number, and unix_time is
1749 the standard time format since January 1, 1970.</para>
1750
1751 <para>The current time message can be used to determine the actual time associated with each log message. Determine the actual
1752 time as follows: <orderedlist>
1753 <listitem>
1754 <para>Locate the log message whose actual time you want to determine.</para>
1755 </listitem>
1756
1757 <listitem>
1758 <para>Search backward through the dump record until you come to a current time message.</para>
1759 </listitem>
1760
1761 <listitem>
1762 <para>If the current time message's <emphasis>timestamp</emphasis> is smaller than the log message's
1763 <emphasis>timestamp</emphasis>, subtract the former from the latter. If the current time message's
1764 <emphasis>timestamp</emphasis> is larger than the log message's <emphasis>timestamp</emphasis>, add 1024 to the latter
1765 and subtract the former from the result.</para>
1766 </listitem>
1767
1768 <listitem>
1769 <para>Add the resulting number to the current time message's <emphasis>unix_time</emphasis> to determine the log
1770 message's actual time.</para>
1771 </listitem>
1772 </orderedlist></para>
1773
1774 <para>Because log data is stored in a finite, circular buffer, some of the data can be overwritten before being read. If this
1775 happens, the following message appears at the appropriate place in the dump:</para>
1776
1777 <programlisting>
1778 Log wrapped; data missing.
1779 </programlisting>
1780
1781 <note>
1782 <para>If this message appears in the middle of a dump, which can happen under a heavy work load, it indicates that not all
1783 of the log data is being written to the log or some data is being overwritten. Increasing the size of the log with the
1784 <emphasis role="bold">fstrace setlog</emphasis> command can alleviate this problem.</para>
1785 </note>
1786
1787 <indexterm>
1788 <primary>fstrace commands</primary>
1789
1790 <secondary>clear</secondary>
1791 </indexterm>
1792
1793 <indexterm>
1794 <primary>commands</primary>
1795
1796 <secondary>fstrace clear</secondary>
1797 </indexterm>
1798
1799 <indexterm>
1800 <primary>trace log (fstrace)</primary>
1801
1802 <secondary>clearing contents</secondary>
1803 </indexterm>
1804
1805 <indexterm>
1806 <primary>clearing</primary>
1807
1808 <secondary>contents of trace log (fstrace)</secondary>
1809 </indexterm>
1810
1811 <indexterm>
1812 <primary>removing</primary>
1813
1814 <secondary>trace log contents (fstrace)</secondary>
1815 </indexterm>
1816 </sect2>
1817
1818 <sect2 id="Header_388">
1819 <title>To clear the contents of a trace log</title>
1820
1821 <orderedlist>
1822 <listitem>
1823 <para>Become the local superuser <emphasis role="bold">root</emphasis> on the machine, if you are not already, by issuing
1824 the <emphasis role="bold">su</emphasis> command. <programlisting>
1825 % <emphasis role="bold">su root</emphasis>
1826 Password: &lt;<replaceable>root_password</replaceable>&gt;
1827 </programlisting></para>
1828 </listitem>
1829
1830 <listitem>
1831 <para>Issue the <emphasis role="bold">fstrace clear</emphasis> command to clear logs by log name or by event set.
1832 <programlisting>
1833 # <emphasis role="bold">fstrace clear</emphasis> [<emphasis role="bold">-set</emphasis> &lt;<replaceable>set_name</replaceable>&gt;+] [<emphasis
1834 role="bold">-log</emphasis> &lt;<replaceable>log_name</replaceable>&gt;+]
1835 </programlisting></para>
1836 </listitem>
1837 </orderedlist>
1838
1839 <para>The following example clears the <emphasis role="bold">cmfx</emphasis> log used by the <emphasis
1840 role="bold">cm</emphasis> event set on the local machine.</para>
1841
1842 <programlisting>
1843 # <emphasis role="bold">fstrace clear cm</emphasis>
1844 </programlisting>
1845
1846 <para>The following example also clears the <emphasis role="bold">cmfx</emphasis> log on the local machine.</para>
1847
1848 <programlisting>
1849 # <emphasis role="bold">fstrace clear cmfx</emphasis>
1850 </programlisting>
1851
1852 <indexterm>
1853 <primary>fstrace commands</primary>
1854
1855 <secondary>example of use</secondary>
1856 </indexterm>
1857 </sect2>
1858
1859 <sect2 id="HDRWQ348">
1860 <title>Examples of fstrace Commands</title>
1861
1862 <para>This section contains an extensive example of the use of the <emphasis role="bold">fstrace</emphasis> command suite,
1863 which is useful for gathering a detailed trace of Cache Manager activity when you are working with AFS Product Support to
1864 diagnose a problem. The Product Support representative can guide you in choosing appropriate parameter settings for the
1865 trace.</para>
1866
1867 <para>Before starting the kernel trace log, try to isolate the Cache Manager on the AFS client machine that is experiencing
1868 the problem accessing the file. If necessary, instruct users to move to another machine so as to minimize the Cache Manager
1869 activity on this machine. To minimize the amount of unrelated AFS activity recorded in the trace log, place both the <emphasis
1870 role="bold">fstrace</emphasis> binary and the dump file must reside on the local disk, not in AFS. You must be logged in as
1871 the local superuser <emphasis role="bold">root</emphasis> to issue <emphasis role="bold">fstrace</emphasis> commands.</para>
1872
1873 <para>Before starting a kernel trace, issue the <emphasis role="bold">fstrace lsset</emphasis> command to check the state of
1874 the <emphasis role="bold">cm</emphasis> event set.</para>
1875
1876 <programlisting>
1877 # <emphasis role="bold">fstrace lsset cm</emphasis>
1878 </programlisting>
1879
1880 <para>If tracing has not been enabled previously or if tracing has been turned off on the client machine, the following output
1881 is displayed:</para>
1882
1883 <programlisting>
1884 Available sets:
1885 cm inactive
1886 </programlisting>
1887
1888 <para>If tracing has been turned off and kernel memory is not allocated for the trace log on the client machine, the following
1889 output is displayed:</para>
1890
1891 <programlisting>
1892 Available sets:
1893 cm inactive (dormant)
1894 </programlisting>
1895
1896 <para>If the current state of the <emphasis role="bold">cm</emphasis> event set is <computeroutput>inactive</computeroutput>
1897 or <computeroutput>inactive (dormant)</computeroutput>, turn on kernel tracing by issuing the <emphasis role="bold">fstrace
1898 setset</emphasis> command with the <emphasis role="bold">-active</emphasis> flag.</para>
1899
1900 <programlisting>
1901 # <emphasis role="bold">fstrace setset cm -active</emphasis>
1902 </programlisting>
1903
1904 <para>If tracing is enabled currently on the client machine, the following output is displayed:</para>
1905
1906 <programlisting>
1907 Available sets:
1908 cm active
1909 </programlisting>
1910
1911 <para>If tracing is enabled currently, you do not need to use the <emphasis role="bold">fstrace setset</emphasis> command. Do
1912 issue the <emphasis role="bold">fstrace clear</emphasis> command to clear the contents of any existing trace log, removing
1913 prior traces that are not related to the current problem.</para>
1914
1915 <programlisting>
1916 # <emphasis role="bold">fstrace clear cm</emphasis>
1917 </programlisting>
1918
1919 <para>After checking on the state of the event set, issue the <emphasis role="bold">fstrace lslog</emphasis> command with the
1920 <emphasis role="bold">-long</emphasis> flag to check the current state and size of the kernel trace log .</para>
1921
1922 <programlisting>
1923 # <emphasis role="bold">fstrace lslog cmfx -long</emphasis>
1924 </programlisting>
1925
1926 <para>If tracing has not been enabled previously or the <emphasis role="bold">cm</emphasis> event set was set to
1927 <computeroutput>active</computeroutput> or <computeroutput>inactive</computeroutput> previously, output similar to the
1928 following is displayed:</para>
1929
1930 <programlisting>
1931 Available logs:
1932 cmfx : 60 kbytes (allocated)
1933 </programlisting>
1934
1935 <para>The <emphasis role="bold">fstrace</emphasis> tracing utility allocates 60 kilobytes of memory to the trace log by
1936 default. You can increase or decrease the amount of memory allocated to the kernel trace log by setting it with the <emphasis
1937 role="bold">fstrace setlog</emphasis> command. The number specified with the <emphasis role="bold">-buffersize</emphasis>
1938 argument represents the number of kilobytes allocated to the kernel trace log. If you increase the size of the kernel trace
1939 log to 100 kilobytes, issue the following command.</para>
1940
1941 <programlisting>
1942 # <emphasis role="bold">fstrace setlog cmfx</emphasis> 100
1943 </programlisting>
1944
1945 <para>After ensuring that the kernel trace log is configured for your needs, you can set up a file into which you can dump the
1946 kernel trace log. For example, create a dump file with the name <emphasis role="bold">cmfx.dump.file.1</emphasis> using the
1947 following <emphasis role="bold">fstrace dump</emphasis> command. Issue the command as a continuous process by adding the
1948 <emphasis role="bold">-follow</emphasis> and <emphasis role="bold">-sleep</emphasis> arguments. Setting the <emphasis
1949 role="bold">-sleep</emphasis> argument to <emphasis>10</emphasis> dumps output from the kernel trace log to the file every 10
1950 seconds.</para>
1951
1952 <programlisting>
1953 # <emphasis role="bold">fstrace dump -follow</emphasis> cmfx <emphasis role="bold">-file</emphasis> cmfx.dump.file.1 <emphasis
1954 role="bold">-sleep</emphasis> 10
1955 AFS Trace Dump -
1956 Date: Fri Apr 16 10:54:57 1999
1957 Found 1 logs.
1958 time 32.965783, pid 0: Fri Apr 16 10:45:52 1999
1959 time 32.965783, pid 33657: Close 0x5c39ed8 flags 0x20
1960 time 32.965897, pid 33657: Gn_close vp 0x5c39ed8 flags 0x20 (returns
1961 0x0)
1962 time 35.159854, pid 10891: Breaking callback for 5bd95e4 states 1024
1963 (volume 0)
1964 time 35.407081, pid 10891: Breaking callback for 5c0fadc states 1024
1965 (volume 0)
1966 . .
1967 . .
1968 . .
1969 time 71.440456, pid 33658: Lookup adp 0x5bbdcf0 name g3oCKs fid (756
1970 4fb7e:588d240.2ff978a8.6)
1971 time 71.440569, pid 33658: Returning code 2 from 19
1972 time 71.440619, pid 33658: Gn_lookup vp 0x5bbdcf0 name g3oCKs (returns
1973 0x2)
1974 time 71.464989, pid 38267: Gn_open vp 0x5bbd000 flags 0x0 (returns 0x
1975 0)
1976 AFS Trace Dump - Completed
1977 </programlisting>
1978 </sect2>
1979 </sect1>
1980
1981 <sect1 id="HDRWQ349">
1982 <title>Using the afsmonitor Program</title>
1983
1984 <indexterm>
1985 <primary>afsmonitor program</primary>
1986
1987 <secondary>features summarized</secondary>
1988 </indexterm>
1989
1990 <para>The <emphasis role="bold">afsmonitor</emphasis> program enables you to monitor the status and performance of specified
1991 File Server and Cache Manager processes by gathering statistical information. Among its other uses, the <emphasis
1992 role="bold">afsmonitor</emphasis> program can be used to fine-tune Cache Manager configuration and load balance File
1993 Servers.</para>
1994
1995 <para>The <emphasis role="bold">afsmonitor</emphasis> program enables you to perform the following tasks. <itemizedlist>
1996 <listitem>
1997 <para>Monitor any number of File Server and Cache Manager processes on any number of machines (in both local and foreign
1998 cells) from a single location.</para>
1999 </listitem>
2000
2001 <listitem>
2002 <para>Set threshold values for any monitored statistic. When the value of a statistic exceeds the threshold, the <emphasis
2003 role="bold">afsmonitor</emphasis> program highlights it to draw your attention. You can set threshold levels that apply to
2004 every machine or only some.</para>
2005 </listitem>
2006
2007 <listitem>
2008 <para>Invoke programs or scripts automatically when a statistic exceeds its threshold.</para>
2009 </listitem>
2010 </itemizedlist></para>
2011
2012 <sect2 id="HDRWQ350">
2013 <title>Requirements for running the afsmonitor program</title>
2014
2015 <indexterm>
2016 <primary>afsmonitor program</primary>
2017
2018 <secondary>requirements for running</secondary>
2019 </indexterm>
2020
2021 <para>The following software must be accessible to a machine where the <emphasis role="bold">afsmonitor</emphasis> program is
2022 running: <itemizedlist>
2023 <listitem>
2024 <para>The AFS <emphasis role="bold">xstat</emphasis> libraries, which the <emphasis role="bold">afsmonitor</emphasis>
2025 program uses to gather data</para>
2026 </listitem>
2027
2028 <listitem>
2029 <para>The <emphasis role="bold">curses</emphasis> graphics package, which most UNIX distributions provide as a standard
2030 utility</para>
2031 </listitem>
2032 </itemizedlist></para>
2033
2034 <indexterm>
2035 <primary>curses graphics utility</primary>
2036
2037 <secondary>afsmonitor program</secondary>
2038 </indexterm>
2039
2040 <indexterm>
2041 <primary>xstat as requirement for running afsmonitor</primary>
2042 </indexterm>
2043
2044 <para>The <emphasis role="bold">afsmonitor</emphasis> screens format successfully both on so-called dumb terminals and in
2045 windowing systems that emulate terminals. For the output to looks its best, the display environment needs to support reverse
2046 video and cursor addressing. Set the TERM environment variable to the correct terminal type, or to a value that has
2047 characteristics similar to the actual terminal type. The display window or terminal must be at least 80 columns wide and 12
2048 lines long.</para>
2049
2050 <indexterm>
2051 <primary>afsmonitor program</primary>
2052
2053 <secondary>setting terminal type</secondary>
2054 </indexterm>
2055
2056 <indexterm>
2057 <primary>terminal type</primary>
2058
2059 <secondary>setting for afsmonitor</secondary>
2060 </indexterm>
2061
2062 <indexterm>
2063 <primary>dumb terminal</primary>
2064
2065 <secondary>use with afsmonitor</secondary>
2066 </indexterm>
2067
2068 <para>The <emphasis role="bold">afsmonitor</emphasis> program must run in the foreground, and in its own separate, dedicated
2069 window or terminal. The window or terminal is unavailable for any other activity as long as the <emphasis
2070 role="bold">afsmonitor</emphasis> program is running. Any number of instances of the <emphasis
2071 role="bold">afsmonitor</emphasis> program can run on a single machine, as long as each instance runs in its own dedicated
2072 window or terminal. Note that it can take up to three minutes to start an additional instance.</para>
2073
2074 <indexterm>
2075 <primary>privilege</primary>
2076
2077 <secondary>required for afsmonitor program</secondary>
2078 </indexterm>
2079
2080 <para>No privilege is required to run the <emphasis role="bold">afsmonitor</emphasis> program. By convention, it is installed
2081 in the <emphasis role="bold">/usr/afsws/bin</emphasis> directory, and anyone who can access the directory can monitor File
2082 Servers and Cache Managers. The probes through which the <emphasis role="bold">afsmonitor</emphasis> program collects
2083 statistics do not constitute a significant burden on the File Server or Cache Manager unless hundreds of people are running
2084 the program. If you wish to restrict its use, place the binary file in a directory available only to authorized users.</para>
2085 </sect2>
2086
2087 <sect2 id="Header_392">
2088 <title>The afsmonitor Output Screens</title>
2089
2090 <indexterm>
2091 <primary>afsmonitor program</primary>
2092
2093 <secondary>screen layout</secondary>
2094 </indexterm>
2095
2096 <para>The <emphasis role="bold">afsmonitor</emphasis> program displays its data on three screens: <itemizedlist>
2097 <listitem>
2098 <para><computeroutput>System Overview</computeroutput>: This screen appears automatically when the <emphasis
2099 role="bold">afsmonitor</emphasis> program initializes. It summarizes separately for File Servers and Cache Managers the
2100 number of machines being monitored and how many of them have <emphasis>alerts</emphasis> (statistics that have exceeded
2101 their thresholds). It then lists the hostname and number of alerts for each machine being monitored, indicating if
2102 appropriate that a process failed to respond to the last probe.</para>
2103 </listitem>
2104
2105 <listitem>
2106 <para><computeroutput>File Server</computeroutput>: This screen displays File Server statistics for each file server
2107 machine being monitored. It highlights statistics that have exceeded their thresholds, and identifies machines that
2108 failed to respond to the last probe.</para>
2109 </listitem>
2110
2111 <listitem>
2112 <para><computeroutput>Cache Managers</computeroutput>: This screen displays Cache Manager statistics for each client
2113 machine being monitored. It highlights statistics that have exceeded their thresholds, and identifies machines that
2114 failed to respond to the last probe.</para>
2115 </listitem>
2116 </itemizedlist></para>
2117
2118 <para>Fields at the corners of every screen display the following information: <itemizedlist>
2119 <listitem>
2120 <para>In the top left corner, the program name and version number.</para>
2121 </listitem>
2122
2123 <listitem>
2124 <para>In the top right corner, the screen name, current and total page numbers, and current and total column numbers.
2125 The page number (for example, <computeroutput>p. 1 of 3</computeroutput>) indicates the index of the current page and
2126 the total number of (vertical) pages over which data is displayed. The column number (for example, <computeroutput>c. 1
2127 of 235</computeroutput>) indicates the index of the current leftmost column and the total number of columns in which
2128 data appears. (The symbol <computeroutput>&gt;&gt;&gt;</computeroutput> indicates that there is additional data to the
2129 right; the symbol <computeroutput>&lt;&lt;&lt;</computeroutput> indicates that there is additional data to the
2130 left.)</para>
2131 </listitem>
2132
2133 <listitem>
2134 <para>In the bottom left corner, a list of the available commands. Enter the first letter in the command name to run
2135 that command. Only the currently possible options appear; for example, if there is only one page of data, the
2136 <computeroutput>next</computeroutput> and <computeroutput>prev</computeroutput> commands, which scroll the screen up and
2137 down respectively, do not appear. For descriptions of the commands, see the following section about navigating the
2138 display screens.</para>
2139 </listitem>
2140
2141 <listitem>
2142 <para>In the bottom right corner, the <computeroutput>probes</computeroutput> field reports how many times the program
2143 has probed File Servers (<computeroutput>fs</computeroutput>), Cache Managers (<computeroutput>cm</computeroutput>), or
2144 both. The counts for File Servers and Cache Managers can differ. The <computeroutput>freq</computeroutput> field reports
2145 how often the program sends probes.</para>
2146 </listitem>
2147 </itemizedlist></para>
2148
2149 <para><emphasis role="bold">Navigating the afsmonitor Display Screens</emphasis></para>
2150
2151 <para>As noted, the lower left hand corner of every display screen displays the names of the commands currently available for
2152 moving to alternate screens, which can either be a different type or display more statistics or machines of the current type.
2153 To execute a command, press the lowercase version of the first letter in its name. Some commands also have an uppercase
2154 version that has a somewhat different effect, as indicated in the following list. <variablelist>
2155 <varlistentry>
2156 <term><computeroutput>cm</computeroutput></term>
2157
2158 <listitem>
2159 <para>Switches to the <computeroutput>Cache Managers</computeroutput> screen. Available only on the
2160 <computeroutput>System Overview</computeroutput> and <computeroutput>File Servers</computeroutput> screens.</para>
2161 </listitem>
2162 </varlistentry>
2163
2164 <varlistentry>
2165 <term><computeroutput>fs</computeroutput></term>
2166
2167 <listitem>
2168 <para>Switches to the <computeroutput>File Servers</computeroutput> screen. Available only on the
2169 <computeroutput>System Overview</computeroutput> and the <computeroutput>Cache Managers</computeroutput>
2170 screens.</para>
2171 </listitem>
2172 </varlistentry>
2173
2174 <varlistentry>
2175 <term><computeroutput>left</computeroutput></term>
2176
2177 <listitem>
2178 <para>Scrolls horizontally to the left, to access the data columns situated to the left of the current set. Available
2179 when the <computeroutput>&lt;&lt;&lt;</computeroutput> symbol appears at the top left of the screen. Press uppercase
2180 <emphasis role="bold">L</emphasis> to scroll horizontally all the way to the left (to display the first set of data
2181 columns).</para>
2182 </listitem>
2183 </varlistentry>
2184
2185 <varlistentry>
2186 <term><computeroutput>next</computeroutput></term>
2187
2188 <listitem>
2189 <para>Scrolls down vertically to the next page of machine names. Available when there are two or more pages of
2190 machines and the final page is not currently displayed. Press uppercase <emphasis role="bold">N</emphasis> to scroll
2191 to the final page.</para>
2192 </listitem>
2193 </varlistentry>
2194
2195 <varlistentry>
2196 <term><computeroutput>oview</computeroutput></term>
2197
2198 <listitem>
2199 <para>Switches to the <computeroutput>System Overview</computeroutput> screen. Available only on the
2200 <computeroutput>Cache Managers</computeroutput> and <computeroutput>File Servers</computeroutput> screens.</para>
2201 </listitem>
2202 </varlistentry>
2203
2204 <varlistentry>
2205 <term><computeroutput>prev</computeroutput></term>
2206
2207 <listitem>
2208 <para>Scrolls up vertically to the previous page of machine names. Available when there are two or more pages of
2209 machines and the first page is not currently displayed. Press uppercase <emphasis role="bold">N</emphasis> to scroll
2210 to the first page.</para>
2211 </listitem>
2212 </varlistentry>
2213
2214 <varlistentry>
2215 <term><computeroutput>right</computeroutput></term>
2216
2217 <listitem>
2218 <para>Scrolls horizontally to the right, to access the data columns situated to the right of the current set. This
2219 command is available when the <computeroutput>&gt;&gt;&gt;</computeroutput> symbol appears at the upper right of the
2220 screen. Press uppercase <emphasis role="bold">R</emphasis> to scroll horizontally all the way to the right (to display
2221 the final set of data columns).</para>
2222 </listitem>
2223 </varlistentry>
2224 </variablelist></para>
2225 </sect2>
2226
2227 <sect2 id="Header_393">
2228 <title>The System Overview Screen</title>
2229
2230 <para>The <computeroutput>System Overview</computeroutput> screen appears automatically as the <emphasis
2231 role="bold">afsmonitor</emphasis> program initializes. This screen displays the status of as many File Server and Cache
2232 Manager processes as can fit in the current window; scroll down to access additional information.</para>
2233
2234 <para>The information on this screen is split into File Server information on the left and Cache Manager information on the
2235 right. The header for each grouping reports two pieces of information: <itemizedlist>
2236 <listitem>
2237 <para>The number of machines on which the program is monitoring the indicated process</para>
2238 </listitem>
2239
2240 <listitem>
2241 <para>The number of alerts and the number of machines affected by them (an <emphasis>alert</emphasis> means that a
2242 statistic has exceeded its threshold or a process failed to respond to the last probe)</para>
2243 </listitem>
2244 </itemizedlist></para>
2245
2246 <para>A list of the machines being monitored follows. If there are any alerts on a machine, the number of them appears in
2247 square brackets to the left of the hostname. If a process failed to respond to the last probe, the letters
2248 <computeroutput>PF</computeroutput> (probe failure) appear in square brackets to the left of the hostname.</para>
2249
2250 <para>The following graphic is an example <computeroutput>System Overview</computeroutput> screen. The <emphasis
2251 role="bold">afsmonitor</emphasis> program is monitoring six File Servers and seven Cache Managers. The File Server process on
2252 host <emphasis role="bold">fs1.example.com</emphasis> and the Cache Manager on host <emphasis role="bold">cli33.example.com</emphasis>
2253 are each marked <computeroutput>[ 1]</computeroutput> to indicate that one threshold value is exceeded. The
2254 <computeroutput>[PF]</computeroutput> marker on host <emphasis role="bold">fs6.example.com</emphasis> indicates that its File
2255 Server process did not respond to the last probe.</para>
2256
2257 <figure id="Figure_6" label="6">
2258 <title>The afsmonitor System Overview Screen</title>
2259
2260 <mediaobject>
2261 <imageobject>
2262 <imagedata fileref="overview.png" scale="50" />
2263 </imageobject>
2264 </mediaobject>
2265 </figure>
2266
2267 <para><emphasis role="bold"> </emphasis></para>
2268 </sect2>
2269
2270 <sect2 id="Header_394">
2271 <title>The File Servers Screen</title>
2272
2273 <para>The <computeroutput>File Servers</computeroutput> screen displays the values collected at the most recent probe for File
2274 Server statistics.</para>
2275
2276 <para>A summary line at the top of the screen (just below the standard program version and screen title blocks) specifies the
2277 number of monitored File Servers, the number of alerts, and the number of machines affected by the alerts.</para>
2278
2279 <para>The first column always displays the hostnames of the machines running the monitored File Servers.</para>
2280
2281 <para>To the right of the hostname column appear as many columns of statistics as can fit within the current width of the
2282 display screen or window; each column requires space for 10 characters. The name of the statistic appears at the top of each
2283 column. If the File Server on a machine did not respond to the most recent probe, a pair of dashes
2284 (<computeroutput>--</computeroutput>) appears in each column. If a value exceeds its configured threshold, it is highlighted
2285 in reverse video. If a value is too large to fit into the allotted column width, it overflows into the next row in the same
2286 column.</para>
2287
2288 <para>For a list of the available File Server statistics, see <link linkend="HDRWQ617">Appendix C, The afsmonitor Program
2289 Statistics</link>.</para>
2290
2291 <para>The following graphic depicts the <computeroutput>File Servers</computeroutput> screen that follows the System Overview
2292 Screen example previously discussed; however, one additional server probe has been completed. In this example, the File Server
2293 process on <emphasis role="bold">fs1</emphasis> has exceeded the configured threshold for the number of performance calls
2294 received (the <emphasis role="bold">numPerfCalls</emphasis> statistic), and that field appears in reverse video. Host
2295 <emphasis role="bold">fs6</emphasis> did not respond to Probe 10, so dashes appear in all fields.</para>
2296
2297 <figure id="Figure_7" label="7">
2298 <title>The afsmonitor File Servers Screen</title>
2299
2300 <mediaobject>
2301 <imageobject>
2302 <imagedata fileref="fserver1.png" scale="50" />
2303 </imageobject>
2304 </mediaobject>
2305 </figure>
2306
2307 <para><emphasis role="bold"> </emphasis></para>
2308
2309 <para>Both the File Servers and Cache Managers screen (discussed in the following section) can display hundreds of columns of
2310 data and are therefore designed to scroll left and right. In the preceding graphic, the screen displays the leftmost screen
2311 and the screen title block shows that column 1 of 235 is displayed. The appearance of the
2312 <computeroutput>&gt;&gt;&gt;</computeroutput> symbol in the upper right hand corner of the screen and the <emphasis
2313 role="bold">right</emphasis> command in the command block indicate that additional data is available by scrolling right. (For
2314 information on the available statistics, see <link linkend="HDRWQ617">Appendix C, The afsmonitor Program
2315 Statistics</link>.)</para>
2316
2317 <para>If the <emphasis role="bold">right</emphasis> command is executed, the screen looks something like the following
2318 example. Note that the horizontal scroll symbols now point both to the left (<computeroutput>&lt;&lt;&lt;</computeroutput>)
2319 and to the right (<computeroutput>&gt;&gt;&gt;</computeroutput>) and both the <emphasis role="bold">left</emphasis> and
2320 <emphasis role="bold">right</emphasis> commands appear, indicating that additional data is available by scrolling both left
2321 and right.</para>
2322
2323 <figure id="Figure_8" label="8">
2324 <title>The afsmonitor File Servers Screen Shifted One Page to the Right</title>
2325
2326 <mediaobject>
2327 <imageobject>
2328 <imagedata fileref="fserver2.png" scale="50" />
2329 </imageobject>
2330 </mediaobject>
2331 </figure>
2332
2333 <para><emphasis role="bold"> </emphasis></para>
2334 </sect2>
2335
2336 <sect2 id="Header_395">
2337 <title>The Cache Managers Screen</title>
2338
2339 <para>The <computeroutput>Cache Managers</computeroutput> screen displays the values collected at the most recent probe for
2340 Cache Manager statistics.</para>
2341
2342 <para>A summary line at the top of the screen (just below the standard program version and screen title blocks) specifies the
2343 number of monitored Cache Managers, the number of alerts, and the number of machines affected by the alerts.</para>
2344
2345 <para>The first column always displays the hostnames of the machines running the monitored Cache Managers.</para>
2346
2347 <para>To the right of the hostname column appear as many columns of statistics as can fit within the current width of the
2348 display screen or window; each column requires space for 10 characters. The name of the statistic appears at the top of each
2349 column. If the Cache Manager on a machine did not respond to the most recent probe, a pair of dashes
2350 (<computeroutput>--</computeroutput>) appears in each column. If a value exceeds its configured threshold, it is highlighted
2351 in reverse video. If a value is too large to fit into the allotted column width, it overflows into the next row in the same
2352 column.</para>
2353
2354 <para>For a list of the available Cache Manager statistics, see <link linkend="HDRWQ617">Appendix C, The afsmonitor Program
2355 Statistics</link>.</para>
2356
2357 <para>The following graphic depicts a Cache Managers screen that follows the System Overview Screen previously discussed. In
2358 the example, the Cache Manager process on host <emphasis role="bold">cli33</emphasis> has exceeded the configured threshold
2359 for the number of cells it can contact (the <emphasis role="bold">numCellsContacted</emphasis> statistic), so that field
2360 appears in reverse video.</para>
2361
2362 <figure id="Figure_9" label="9">
2363 <title>The afsmonitor Cache Managers Screen</title>
2364
2365 <mediaobject>
2366 <imageobject>
2367 <imagedata fileref="cachmgr.png" scale="50" />
2368 </imageobject>
2369 </mediaobject>
2370 </figure>
2371
2372 <para><emphasis role="bold"> </emphasis></para>
2373 </sect2>
2374 </sect1>
2375
2376 <sect1 id="HDRWQ351">
2377 <title>Configuring the afsmonitor Program</title>
2378
2379 <indexterm>
2380 <primary>afsmonitor program</primary>
2381
2382 <secondary>creating configuration files for</secondary>
2383 </indexterm>
2384
2385 <indexterm>
2386 <primary>configuring</primary>
2387
2388 <secondary>afsmonitor program</secondary>
2389 </indexterm>
2390
2391 <para>To customize the <emphasis role="bold">afsmonitor</emphasis> program, create an ASCII-format configuration file and use
2392 the <emphasis role="bold">-config</emphasis> argument to name it. You can specify the following in the configuration file:
2393 <itemizedlist>
2394 <listitem>
2395 <para>The File Servers, Cache Managers, or both to monitor.</para>
2396 </listitem>
2397
2398 <listitem>
2399 <para>The statistics to display. By default, the display includes 271 statistics for File Servers and 570 statistics for
2400 Cache Managers. For information on the available statistics, see <link linkend="HDRWQ617">Appendix C, The afsmonitor
2401 Program Statistics</link>.</para>
2402 </listitem>
2403
2404 <listitem>
2405 <para>The threshold values to set for statistics and a script or program to execute if a threshold is exceeded. By
2406 default, no threshold values are defined and no scripts or programs are executed.</para>
2407 </listitem>
2408 </itemizedlist></para>
2409
2410 <para>The following list describes the instructions that can appear in the configuration file: <variablelist>
2411 <varlistentry>
2412 <term><computeroutput>cm</computeroutput> <replaceable>hostname</replaceable></term>
2413
2414 <listitem>
2415 <para>Names a client machine for which to display Cache Manager statistics. The order of <emphasis
2416 role="bold">cm</emphasis> lines in the file determines the order in which client machines appear from top to bottom on
2417 the <computeroutput>System Overview</computeroutput> and <computeroutput>Cache Managers</computeroutput> output
2418 screens.</para>
2419 </listitem>
2420 </varlistentry>
2421
2422 <varlistentry>
2423 <term><computeroutput>fs</computeroutput> <replaceable>hostname</replaceable></term>
2424
2425 <listitem>
2426 <para>Names a file server machine for which to display File Server statistics. The order of <emphasis
2427 role="bold">fs</emphasis> lines in the file determines the order in which file server machines appear from top to bottom
2428 on the <computeroutput>System Overview</computeroutput> and <computeroutput>File Servers</computeroutput> output
2429 screens.</para>
2430 </listitem>
2431 </varlistentry>
2432
2433 <varlistentry>
2434 <term><computeroutput>thresh fs | cm <replaceable>field_name</replaceable> <replaceable>thresh_val</replaceable>
2435 [<replaceable>cmd_to_run</replaceable>] [<replaceable>arg1</replaceable>] . . .
2436 [<replaceable>argn</replaceable>]</computeroutput></term>
2437
2438 <listitem>
2439 <para>Assigns the threshold value thresh_val to the statistic field_name, for either a File Server statistic (<emphasis
2440 role="bold">fs</emphasis>) or a Cache Manager statistic (<emphasis role="bold">cm</emphasis>). The optional
2441 cmd_to_execute field names a binary or script to execute each time the value of the statistic changes from being below
2442 thresh_val to being at or above thresh_val. A change between two values that both exceed thresh_val does not retrigger
2443 the binary or script. The optional arg1 through argn fields are additional values that the <emphasis
2444 role="bold">afsmonitor</emphasis> program passes as arguments to the cmd_to_execute command. If any of them include one
2445 or more spaces, enclose the entire field in double quotes.</para>
2446
2447 <para>The parameters <emphasis role="bold">fs</emphasis>, <emphasis role="bold">cm</emphasis>, field_name,
2448 threshold_val, and arg1 through argn correspond to the values with the same name on the <emphasis
2449 role="bold">thresh</emphasis> line. The host_name parameter identifies the file server or client machine where the
2450 statistic has crossed the threshold, and the actual_val parameter is the actual value of field_name that equals or
2451 exceeds the threshold value.</para>
2452
2453 <para>Use the <emphasis role="bold">thresh</emphasis> line to set either a global threshold, which applies to all file
2454 server machines listed on <emphasis role="bold">fs</emphasis> lines or client machines listed on <emphasis
2455 role="bold">cm</emphasis> lines in the configuration file, or a machine-specific threshold, which applies to only one
2456 file server or client machine. <itemizedlist>
2457 <listitem>
2458 <para>To set a global threshold, place the <emphasis role="bold">thresh</emphasis> line before any of the
2459 <emphasis role="bold">fs</emphasis> or <emphasis role="bold">cm</emphasis> lines in the file.</para>
2460 </listitem>
2461
2462 <listitem>
2463 <para>To set a machine-specific threshold, place the <emphasis role="bold">thresh</emphasis> line below the
2464 corresponding <emphasis role="bold">fs</emphasis> or <emphasis role="bold">cm</emphasis> line, and above any other
2465 <emphasis role="bold">fs</emphasis> or <emphasis role="bold">cm</emphasis> lines. A machine-specific threshold
2466 value always overrides the corresponding global threshold, if set. Do not place a <emphasis role="bold">thresh
2467 fs</emphasis> line directly after a <emphasis role="bold">cm</emphasis> line or a <emphasis role="bold">thresh
2468 cm</emphasis> line directly after a <emphasis role="bold">fs</emphasis> line.</para>
2469 </listitem>
2470 </itemizedlist></para>
2471 </listitem>
2472 </varlistentry>
2473
2474 <varlistentry>
2475 <term><computeroutput>show fs | cm <replaceable>field/group/section</replaceable></computeroutput></term>
2476
2477 <listitem>
2478 <para>Specifies which individual statistic, group of statistics, or section of statistics to display on the
2479 <computeroutput>File Servers</computeroutput> screen (<emphasis role="bold">fs</emphasis>) or <computeroutput>Cache
2480 Managers</computeroutput> screen (<emphasis role="bold">cm</emphasis>) and the order in which to display them. The
2481 appendix of <emphasis role="bold">afsmonitor</emphasis> statistics in the <emphasis>OpenAFS Administration
2482 Guide</emphasis> specifies the group and section to which each statistic belongs. Include as many <emphasis
2483 role="bold">show</emphasis> lines as necessary to customize the screen display as desired, and place them anywhere in
2484 the file. The top-to-bottom order of the <emphasis role="bold">show</emphasis> lines in the configuration file
2485 determines the left-to-right order in which the statistics appear on the corresponding screen.</para>
2486
2487 <para>If there are no <emphasis role="bold">show</emphasis> lines in the configuration file, then the screens display
2488 all statistics for both Cache Managers and File Servers. Similarly, if there are no <emphasis role="bold">show
2489 fs</emphasis> lines, the <computeroutput>File Servers</computeroutput> screen displays all file server statistics, and
2490 if there are no <emphasis role="bold">show cm</emphasis> lines, the <computeroutput>Cache Managers</computeroutput>
2491 screen displays all client statistics.</para>
2492 </listitem>
2493 </varlistentry>
2494
2495 <varlistentry>
2496 <term><emphasis role="bold"># comments</emphasis></term>
2497
2498 <listitem>
2499 <para>Precedes a line of text that the <emphasis role="bold">afsmonitor</emphasis> program ignores because of the
2500 initial number (<emphasis role="bold">#</emphasis>) sign, which must appear in the very first column of the line.</para>
2501 </listitem>
2502 </varlistentry>
2503 </variablelist></para>
2504
2505 <para>For a list of the values that can appear in the field/group/section field of a <emphasis role="bold">show</emphasis>
2506 instruction, see <link linkend="HDRWQ617">Appendix C, The afsmonitor Program Statistics</link>.)</para>
2507
2508 <para>The following example illustrates a possible configuration file:</para>
2509
2510 <programlisting>
2511 thresh cm dlocalAccesses 1000000
2512 thresh cm dremoteAccesses 500000 handleDRemote
2513 thresh fs rx_maxRtt_Usec 1000
2514 cm client5
2515 cm client33
2516 cm client14
2517 thresh cm dlocalAccesses 2000000
2518 thresh cm vcacheMisses 10000
2519 cm client2
2520 fs fs3
2521 fs fs9
2522 fs fs5
2523 fs fs10
2524 show cm numCellsContacted
2525 show cm dlocalAccesses
2526 show cm dremoteAccesses
2527 show cm vcacheMisses
2528 show cm Auth_Stats_group
2529 </programlisting>
2530
2531 <para>Since the first three <emphasis role="bold">thresh</emphasis> instructions appear before any <emphasis
2532 role="bold">fs</emphasis> or <emphasis role="bold">cm</emphasis> instructions, they set global threshold values: <itemizedlist>
2533 <listitem>
2534 <para>All Cache Manager process in this file use <emphasis role="bold">1000000</emphasis> as the threshold for the
2535 <emphasis role="bold">dlocalAccesses</emphasis> statistic (except for the machine <emphasis role="bold">client2</emphasis>
2536 which uses an overriding value of <emphasis role="bold">2000000</emphasis>.)</para>
2537 </listitem>
2538
2539 <listitem>
2540 <para>All Cache Manager processes in this file use <emphasis role="bold">500000</emphasis> as the threshold value for the
2541 <emphasis role="bold">dremoteAccesses</emphasis> statistic; if that value is exceeded, the script <emphasis
2542 role="bold">handleDRemote</emphasis> is invoked.</para>
2543 </listitem>
2544
2545 <listitem>
2546 <para>All File Server processes in this file use <emphasis role="bold">1000</emphasis> as the threshold value for the
2547 <emphasis role="bold">rx_maxRtt_Usec</emphasis> statistic.</para>
2548 </listitem>
2549 </itemizedlist></para>
2550
2551 <para>The four <emphasis role="bold">cm</emphasis> instructions monitor the Cache Manager on the machines <emphasis
2552 role="bold">client5</emphasis>, <emphasis role="bold">client33</emphasis>, <emphasis role="bold">client14</emphasis>, and
2553 <emphasis role="bold">client2</emphasis>. The first three use all of the global threshold values.</para>
2554
2555 <para>The Cache Manager on <emphasis role="bold">client2</emphasis> uses the global threshold value for the <emphasis
2556 role="bold">dremoteAccesses</emphasis> statistic, but a different one for the <emphasis role="bold">dlocalAccesses</emphasis>
2557 statistic. Furthermore, <emphasis role="bold">client22</emphasis> is the only Cache Manager that uses the threshold set for the
2558 <emphasis role="bold">vcacheMisses</emphasis> statistic.</para>
2559
2560 <para>The <emphasis role="bold">fs</emphasis> instructions monitor the File Server on the machines <emphasis
2561 role="bold">fs3</emphasis>, <emphasis role="bold">fs9</emphasis>, <emphasis role="bold">fs5</emphasis>, and <emphasis
2562 role="bold">fs10</emphasis>. They all use the global threshold for the<emphasis role="bold">rx_maxRtt_Usec</emphasis>
2563 statistic.</para>
2564
2565 <para>Because there are no <emphasis role="bold">show fs</emphasis> instructions, the File Servers screen displays all File
2566 Server statistics. The Cache Managers screen displays only the statistics named in <emphasis role="bold">show cm</emphasis>
2567 instructions, ordering them from left to right. The <emphasis role="bold">Auth_Stats_group</emphasis> includes several
2568 statistics, all of which are displayed (<emphasis role="bold">curr_PAGs</emphasis>, <emphasis
2569 role="bold">curr_Records</emphasis>, <emphasis role="bold">curr_AuthRecords</emphasis>, <emphasis
2570 role="bold">curr_UnauthRecords</emphasis>, <emphasis role="bold">curr_MaxRecordsInPAG</emphasis>, <emphasis
2571 role="bold">curr_LongestChain</emphasis>, <emphasis role="bold">PAGCreations</emphasis>, <emphasis
2572 role="bold">TicketUpdates</emphasis>, <emphasis role="bold">HWM_PAGS</emphasis>, <emphasis role="bold">HWM_Records</emphasis>,
2573 <emphasis role="bold">HWM_MaxRecordsInPAG</emphasis>, and <emphasis role="bold">HWM_LongestChain</emphasis>).</para>
2574 </sect1>
2575
2576 <sect1 id="HDRWQ352">
2577 <title>Writing afsmonitor Statistics to a File</title>
2578
2579 <indexterm>
2580 <primary>afsmonitor program</primary>
2581
2582 <secondary>creating an output file</secondary>
2583 </indexterm>
2584
2585 <para>All of the statistical information collected and displayed by the <emphasis role="bold">afsmonitor</emphasis> program can
2586 be preserved by writing it to an output file. You can create an output file by using the <emphasis
2587 role="bold">-output</emphasis> argument when you startup the <emphasis role="bold">afsmonitor</emphasis> process. You can use
2588 the output file to track process performance over long periods of time and to apply post-processing techniques to further
2589 analyze system trends.</para>
2590
2591 <para>The <emphasis role="bold">afsmonitor</emphasis> program output file is a simple ASCII file that records the information
2592 reported by the File Server and Cache Manager screens. The output file has the following format:</para>
2593
2594 <programlisting>
2595 time host_name <emphasis role="bold">CM</emphasis>|<emphasis role="bold">FS</emphasis> list_of_measured_values
2596 </programlisting>
2597
2598 <para>and specifies the <emphasis>time</emphasis> at which the <emphasis>list_of_measured_values</emphasis> were gathered from
2599 the Cache Manager (<emphasis role="bold">CM</emphasis>) or File Server (<emphasis role="bold">FS</emphasis>) process housed on
2600 host_name. On those occasion where probes fail, the value <computeroutput>-1</computeroutput> is reported instead of the
2601 <emphasis>list_of_measured_values</emphasis>.</para>
2602
2603 <para>This file format provides several advantages: <itemizedlist>
2604 <listitem>
2605 <para>It can be viewed using a standard editor. If you intend to view this file frequently, use the <emphasis
2606 role="bold">-detailed</emphasis> flag with the <emphasis role="bold">-output</emphasis> argument. It formats the output
2607 file in a way that is easier to read.</para>
2608 </listitem>
2609
2610 <listitem>
2611 <para>It can be passed through filters to extract desired information using the standard set of UNIX tools.</para>
2612 </listitem>
2613
2614 <listitem>
2615 <para>It is suitable for long term storage of the <emphasis role="bold">afsmonitor</emphasis> program output.</para>
2616 </listitem>
2617 </itemizedlist></para>
2618
2619 <indexterm>
2620 <primary>afsmonitor program</primary>
2621
2622 <secondary>command syntax</secondary>
2623 </indexterm>
2624
2625 <indexterm>
2626 <primary>commands</primary>
2627
2628 <secondary>afsmonitor</secondary>
2629 </indexterm>
2630 </sect1>
2631
2632 <sect1 id="Header_398">
2633 <title>To start the afsmonitor Program</title>
2634
2635 <orderedlist>
2636 <listitem>
2637 <para>Open a separate command shell window or use a dedicated terminal for each instance of the <emphasis
2638 role="bold">afsmonitor</emphasis> program. This window or terminal must be devoted to the exclusive use of the <emphasis
2639 role="bold">afsmonitor</emphasis> process because the command cannot be run in the background.</para>
2640 </listitem>
2641
2642 <listitem>
2643 <para>Initialize the <emphasis role="bold">afsmonitor</emphasis> program. The message <computeroutput>afsmonitor Collecting
2644 Statistics...</computeroutput>, followed by the appearance of the <computeroutput>System Overview</computeroutput> screen,
2645 confirms a successful start. <programlisting>
2646 % <emphasis role="bold">afsmonitor</emphasis> [<emphasis role="bold">initcmd</emphasis>] [<emphasis role="bold">-config</emphasis> &lt;<replaceable>configuration file</replaceable>&gt;] \
2647 [<emphasis role="bold">-frequency</emphasis> &lt;<replaceable>poll frequency, in seconds</replaceable>&gt;] \
2648 [<emphasis role="bold">-output</emphasis> &lt;<replaceable>storage file name</replaceable>&gt;] [<emphasis
2649 role="bold">-detailed</emphasis>] \
2650 [<emphasis role="bold">-debug</emphasis> &lt;<replaceable>turn debugging output on to the named file</replaceable>&gt;] \
2651 [<emphasis role="bold">-fshosts</emphasis> &lt;<replaceable>list of file servers to monitor</replaceable>&gt;+] \
2652 [<emphasis role="bold">-cmhosts</emphasis> &lt;<replaceable>list of cache managers to monitor</replaceable>&gt;+]
2653 afsmonitor Collecting Statistics...
2654 </programlisting></para>
2655
2656 <para>where <variablelist>
2657 <varlistentry>
2658 <term><emphasis role="bold">initcmd</emphasis></term>
2659
2660 <listitem>
2661 <para>Is an optional string that accommodates the command's use of the AFS command parser. It can be omitted and
2662 ignored.</para>
2663 </listitem>
2664 </varlistentry>
2665
2666 <varlistentry>
2667 <term><emphasis role="bold">-config</emphasis></term>
2668
2669 <listitem>
2670 <para>Specifies the pathname of an <emphasis role="bold">afsmonitor</emphasis> configuration file, which lists the
2671 machines and statistics to monitor. Partial pathnames are interpreted relative to the current working directory.
2672 Provide either this argument or one or both of the <emphasis role="bold">-fshosts</emphasis> and <emphasis
2673 role="bold">-cmhosts</emphasis> arguments. You must use a configuration file to set thresholds or customize the
2674 screen display. For instructions on creating the configuration file, see <link linkend="HDRWQ351">Configuring the
2675 afsmonitor Program</link>.</para>
2676 </listitem>
2677 </varlistentry>
2678
2679 <varlistentry>
2680 <term><emphasis role="bold">-frequency</emphasis></term>
2681
2682 <listitem>
2683 <para>Specifies how often to probe the File Server and Cache Manager processes, as a number of seconds. Acceptable
2684 values range from <emphasis role="bold">1</emphasis> and <emphasis role="bold">86400</emphasis>; the default value
2685 is <emphasis role="bold">60</emphasis>. This frequency applies to both File Server and Cache Manager probes;
2686 however, File Server and Cache Manager probes are initiated and processed independent of each other. The actual
2687 interval between probes to a host is the probe frequency plus the time needed by all hosts to respond to the
2688 probe.</para>
2689 </listitem>
2690 </varlistentry>
2691
2692 <varlistentry>
2693 <term><emphasis role="bold">-output</emphasis></term>
2694
2695 <listitem>
2696 <para>Specifies the name of an output file to which to write all of the statistical data. By default, no output file
2697 is created. For information on this file, see <link linkend="HDRWQ352">Writing afsmonitor Statistics to a
2698 File</link>.</para>
2699 </listitem>
2700 </varlistentry>
2701
2702 <varlistentry>
2703 <term><emphasis role="bold">-detailed</emphasis></term>
2704
2705 <listitem>
2706 <para>Formats the output file named by the <emphasis role="bold">-output</emphasis> argument to be more easily
2707 readable. The <emphasis role="bold">-output</emphasis> argument must be provided along with this flag.</para>
2708 </listitem>
2709 </varlistentry>
2710
2711 <varlistentry>
2712 <term><emphasis role="bold">-fshosts</emphasis></term>
2713
2714 <listitem>
2715 <para>Identifies each File Server process to monitor by specifying the host it is running on. You can identify a
2716 host using either its complete Internet-style host name or an abbreviation acceptable to the cell's naming service.
2717 Combine this argument with the <emphasis role="bold">-cmhosts</emphasis> if you wish, but not the <emphasis
2718 role="bold">-config</emphasis> argument.</para>
2719 </listitem>
2720 </varlistentry>
2721
2722 <varlistentry>
2723 <term><emphasis role="bold">-cmhosts</emphasis></term>
2724
2725 <listitem>
2726 <para>Identifies each Cache Manager process to monitor by specifying the host it is running on. You can identify a
2727 host using either its complete Internet-style host name or an abbreviation acceptable to the cell's naming service.
2728 Combine this argument with the <emphasis role="bold">-fshosts</emphasis> if you wish, but not the <emphasis
2729 role="bold">-config</emphasis> argument.</para>
2730 </listitem>
2731 </varlistentry>
2732 </variablelist></para>
2733 </listitem>
2734 </orderedlist>
2735 </sect1>
2736
2737 <sect1 id="Header_399">
2738 <title>To stop the afsmonitor program</title>
2739
2740 <indexterm>
2741 <primary>afsmonitor program</primary>
2742
2743 <secondary>stopping</secondary>
2744 </indexterm>
2745
2746 <para>To exit an <emphasis role="bold">afsmonitor</emphasis> program session, Enter the &lt;<emphasis
2747 role="bold">Ctrl-c</emphasis>&gt; interrupt signal or an uppercase <emphasis role="bold">Q</emphasis>.</para>
2748 </sect1>
2749
2750 <sect1 id="HDRWQ353">
2751 <title>The xstat Data Collection Facility</title>
2752
2753 <indexterm>
2754 <primary>xstat data collection facility</primary>
2755 </indexterm>
2756
2757 <indexterm>
2758 <primary>xstat data collection facility</primary>
2759
2760 <secondary>libxstat_fs.a library</secondary>
2761 </indexterm>
2762
2763 <indexterm>
2764 <primary>xstat data collection facility</primary>
2765
2766 <secondary>libxstat_cm.a library</secondary>
2767 </indexterm>
2768
2769 <indexterm>
2770 <primary>data collection</primary>
2771
2772 <secondary>with xstat data collection facility</secondary>
2773 </indexterm>
2774
2775 <indexterm>
2776 <primary>collecting</primary>
2777
2778 <secondary>data with xstat data collection facility</secondary>
2779 </indexterm>
2780
2781 <indexterm>
2782 <primary>File Server</primary>
2783
2784 <secondary>collecting data with xstat data collection facility</secondary>
2785 </indexterm>
2786
2787 <indexterm>
2788 <primary>Cache Manager</primary>
2789
2790 <secondary>collecting data with xstat data collection facility</secondary>
2791 </indexterm>
2792
2793 <indexterm>
2794 <primary>File Server</primary>
2795
2796 <secondary>xstat data collection facility libraries</secondary>
2797 </indexterm>
2798
2799 <indexterm>
2800 <primary>Cache Manager</primary>
2801
2802 <secondary>xstat data collection facility libraries</secondary>
2803 </indexterm>
2804
2805 <indexterm>
2806 <primary>libxstat_fs.a library</primary>
2807 </indexterm>
2808
2809 <indexterm>
2810 <primary>libxstat_cm.a library</primary>
2811 </indexterm>
2812
2813 <para>The <emphasis role="bold">afsmonitor</emphasis> program uses the <emphasis role="bold">xstat</emphasis> data collection
2814 facility to gather and calculate the data that it (the <emphasis role="bold">afsmonitor</emphasis> program) then uses to perform
2815 its function. You can also use the <emphasis role="bold">xstat</emphasis> facility to create your own data display programs. If
2816 you do, keep the following in mind. The File Server considers any program calling its RPC routines to be a Cache Manager;
2817 therefore, any program calling the File Server interface directly must export the Cache Manager's callback interface. The
2818 calling program must be capable of emulating the necessary callback state, and it must respond to periodic keep-alive messages
2819 from the File Server. In addition, a calling program must be able to gather the collected data.</para>
2820
2821 <para>The <emphasis role="bold">xstat</emphasis> facility consists of two C language libraries available to user-level
2822 applications: <itemizedlist>
2823 <listitem>
2824 <para><emphasis role="bold">/usr/afsws/lib/afs/libxstat_fs.a</emphasis> exports calls that gather information from one or
2825 more running File Server processes.</para>
2826 </listitem>
2827
2828 <listitem>
2829 <para><emphasis role="bold">/usr/afsws/lib/afs/libxstat_cm.a</emphasis> exports calls that collect information from one or
2830 more running Cache Managers.</para>
2831 </listitem>
2832 </itemizedlist></para>
2833
2834 <para>The libraries allow the caller to register <itemizedlist>
2835 <listitem>
2836 <para>A set of File Servers or Cache Managers to be examined.</para>
2837 </listitem>
2838
2839 <listitem>
2840 <para>The frequency with which the File Servers or Cache Managers are to be probed for data.</para>
2841 </listitem>
2842
2843 <listitem>
2844 <para>A user-specified routine to be called each time data is collected.</para>
2845 </listitem>
2846 </itemizedlist></para>
2847
2848 <para>The libraries handle all of the lightweight processes, callback interactions, and timing issues associated with the data
2849 collection. The user needs only to process the data as it arrives.</para>
2850
2851 <sect2 id="Header_401">
2852 <title>The libxstat Libraries</title>
2853
2854 <indexterm>
2855 <primary>libxstat_fs.a library</primary>
2856
2857 <secondary>routines</secondary>
2858 </indexterm>
2859
2860 <indexterm>
2861 <primary>libxstat_cm.a library</primary>
2862
2863 <secondary>routines</secondary>
2864 </indexterm>
2865
2866 <para>The <emphasis role="bold">libxstat_fs.a</emphasis> and <emphasis role="bold">libxstat_cm.a</emphasis> libraries handle
2867 the callback requirements and other complications associated with the collection of data from File Servers and Cache Managers.
2868 The user provides only the means of accumulating the desired data. Each <emphasis role="bold">xstat</emphasis> library
2869 implements three routines: <itemizedlist>
2870 <listitem>
2871 <para>Initialization (<emphasis role="bold">xstat_fs_Init</emphasis> and <emphasis role="bold">xstat_cm_Init</emphasis>)
2872 arranges the periodic collection and handling of data.</para>
2873 </listitem>
2874
2875 <listitem>
2876 <para>Immediate probe (<emphasis role="bold">xstat_fs_ForceProbeNow</emphasis> and <emphasis
2877 role="bold">xstat_cm_ForceProbeNow</emphasis>) forces the immediate collection of data, after which collection returns
2878 to its normal probe schedule.</para>
2879 </listitem>
2880
2881 <listitem>
2882 <para>Cleanup (<emphasis role="bold">xstat_fs_Cleanup</emphasis> and <emphasis role="bold">xstat_cm_Cleanup</emphasis>)
2883 terminates all connections and removes all traces of the data collection from memory.</para>
2884 </listitem>
2885 </itemizedlist></para>
2886
2887 <indexterm>
2888 <primary>File Server</primary>
2889
2890 <secondary>xstat data collections</secondary>
2891 </indexterm>
2892
2893 <indexterm>
2894 <primary>Cache Manager</primary>
2895
2896 <secondary>xstat data collections</secondary>
2897 </indexterm>
2898
2899 <indexterm>
2900 <primary>xstat data collection facility</primary>
2901
2902 <secondary>data collections</secondary>
2903 </indexterm>
2904
2905 <indexterm>
2906 <primary>libxstat_fs.a library</primary>
2907
2908 <secondary>data collections</secondary>
2909 </indexterm>
2910
2911 <indexterm>
2912 <primary>libxstat_cm.a library</primary>
2913
2914 <secondary>data collections</secondary>
2915 </indexterm>
2916
2917 <para>The File Server and Cache Manager each define data collections that clients can fetch. A data collection is simply a
2918 related set of numbers that can be collected as a unit. For example, the File Server and Cache Manager each define profiling
2919 and performance data collections. The profiling collections maintain counts of the number of times internal functions are
2920 called within servers, allowing bottleneck analysis to be performed. The performance collections record, among other things,
2921 internal disk I/O statistics for a File Server and cache effectiveness figures for a Cache Manager, allowing for performance
2922 analysis.</para>
2923
2924 <indexterm>
2925 <primary>xstat data collection facility</primary>
2926
2927 <secondary>obtaining more information</secondary>
2928 </indexterm>
2929
2930 <indexterm>
2931 <primary>libxstat_fs.a library</primary>
2932
2933 <secondary>obtaining more information</secondary>
2934 </indexterm>
2935
2936 <indexterm>
2937 <primary>libxstat_cm.a library</primary>
2938
2939 <secondary>obtaining more information</secondary>
2940 </indexterm>
2941
2942 <para>For a copy of the detailed specification which provides much additional usage information about the <emphasis
2943 role="bold">xstat</emphasis> facility, its libraries, and the routines in the libraries, contact AFS Product Support.</para>
2944 </sect2>
2945
2946 <sect2 id="Header_402">
2947 <title>Example xstat Commands</title>
2948
2949 <indexterm>
2950 <primary>xstat data collection facility</primary>
2951
2952 <secondary>example commands</secondary>
2953 </indexterm>
2954
2955 <indexterm>
2956 <primary>libxstat_fs.a library</primary>
2957
2958 <secondary>example command using</secondary>
2959 </indexterm>
2960
2961 <indexterm>
2962 <primary>libxstat_cm.a library</primary>
2963
2964 <secondary>example command using</secondary>
2965 </indexterm>
2966
2967 <indexterm>
2968 <primary>File Server</primary>
2969
2970 <secondary>xstat example commands</secondary>
2971 </indexterm>
2972
2973 <indexterm>
2974 <primary>Cache Manager</primary>
2975
2976 <secondary>xstat example commands</secondary>
2977 </indexterm>
2978
2979 <para>AFS comes with two low-level, example commands: <emphasis role="bold">xstat_fs_test</emphasis> and <emphasis
2980 role="bold">xstat_cm_test</emphasis>. The commands allow you to experiment with the <emphasis role="bold">xstat</emphasis>
2981 facility. They gather information and display the available data collections for a File Server or Cache Manager. They are
2982 intended merely to provide examples of the types of data that can be collected via <emphasis role="bold">xstat</emphasis>;
2983 they are not intended for use in the actual collection of data.</para>
2984
2985 <indexterm>
2986 <primary>commands</primary>
2987
2988 <secondary>xstat_fs_test</secondary>
2989 </indexterm>
2990
2991 <indexterm>
2992 <primary>libxstat_fs.a library</primary>
2993
2994 <secondary>xstat_fs_test example command</secondary>
2995 </indexterm>
2996
2997 <indexterm>
2998 <primary>File Server</primary>
2999
3000 <secondary>xstat_fs_test example command</secondary>
3001 </indexterm>
3002
3003 <indexterm>
3004 <primary>xstat data collection facility</primary>
3005
3006 <secondary>xstat_fs_test example command</secondary>
3007 </indexterm>
3008
3009 <sect3 id="Header_403">
3010 <title>To use the example xstat_fs_test command</title>
3011
3012 <orderedlist>
3013 <listitem>
3014 <para>Issue the example <emphasis role="bold">xstat_fs_test</emphasis> command to test the routines in the <emphasis
3015 role="bold">libxstat_fs.a</emphasis> library and display the data collections associated with the File Server process.
3016 The command executes in the foreground. <programlisting>
3017 % <emphasis role="bold">xstat_fs_test</emphasis> [<emphasis role="bold">initcmd</emphasis>] \
3018 <emphasis role="bold">-fsname</emphasis> &lt;<replaceable>File Server name(s) to monitor</replaceable>&gt;+ \
3019 <emphasis role="bold">-collID</emphasis> &lt;<replaceable>Collection(s) to fetch</replaceable>&gt;+ [<emphasis
3020 role="bold">-onceonly</emphasis>] \
3021 [<emphasis role="bold">-frequency</emphasis> &lt;<replaceable>poll frequency, in seconds</replaceable>&gt;] \
3022 [<emphasis role="bold">-period</emphasis> &lt;<replaceable>data collection time, in minutes</replaceable>&gt;] [<emphasis
3023 role="bold">-debug</emphasis>]
3024 </programlisting></para>
3025
3026 <para>where <variablelist>
3027 <varlistentry>
3028 <term><emphasis role="bold">xstat_fs_test</emphasis></term>
3029
3030 <listitem>
3031 <para>Must be typed in full.</para>
3032 </listitem>
3033 </varlistentry>
3034
3035 <varlistentry>
3036 <term><emphasis role="bold">initcmd</emphasis></term>
3037
3038 <listitem>
3039 <para>Is an optional string that accommodates the command's use of the AFS command parser. It can be omitted and
3040 ignored.</para>
3041 </listitem>
3042 </varlistentry>
3043
3044 <varlistentry>
3045 <term><emphasis role="bold">-fsname</emphasis></term>
3046
3047 <listitem>
3048 <para>Is the Internet host name of each file server machine on which to monitor the File Server process.</para>
3049 </listitem>
3050 </varlistentry>
3051
3052 <varlistentry>
3053 <term><emphasis role="bold">-collID</emphasis></term>
3054
3055 <listitem>
3056 <para>Specifies each data collection to return. The indicated data collection defines the type and amount of
3057 data the command is to gather about the File Server. Data is returned in the form of a predefined data structure
3058 (refer to the specification documents referenced previously for more information about the data
3059 structures).</para>
3060
3061 <para>There are two acceptable values: <itemizedlist>
3062 <listitem>
3063 <para><emphasis role="bold">1</emphasis> reports various internal performance statistics related to the
3064 File Server (for example, vnode cache entries and <emphasis role="bold">Rx</emphasis> protocol
3065 activity).</para>
3066 </listitem>
3067
3068 <listitem>
3069 <para><emphasis role="bold">2</emphasis> reports all of the internal performance statistics provided by
3070 the <emphasis role="bold">1</emphasis> setting, plus some additional, detailed performance figures about
3071 the File Server (for example, minimum, maximum, and cumulative statistics regarding File Server RPCs, how
3072 long they take to complete, and how many succeed).</para>
3073 </listitem>
3074 </itemizedlist></para>
3075 </listitem>
3076 </varlistentry>
3077
3078 <varlistentry>
3079 <term><emphasis role="bold">-onceonly</emphasis></term>
3080
3081 <listitem>
3082 <para>Directs the command to gather statistics just one time. Omit this option to have the command continue to
3083 probe the File Server for statistics every 30 seconds. If you omit this option, you can use the &lt;<emphasis
3084 role="bold">Ctrl-c</emphasis>&gt; interrupt signal to halt the command at any time.</para>
3085 </listitem>
3086 </varlistentry>
3087
3088 <varlistentry>
3089 <term><emphasis role="bold">-frequency</emphasis></term>
3090
3091 <listitem>
3092 <para>Sets the frequency in seconds at which the program initiates probes to the File Server. If you omit this
3093 argument, the default is 30 seconds.</para>
3094 </listitem>
3095 </varlistentry>
3096
3097 <varlistentry>
3098 <term><emphasis role="bold">-period</emphasis></term>
3099
3100 <listitem>
3101 <para>Sets how long the utility runs before exiting, as a number of minutes. If you omit this argument, the
3102 default is 10 minutes.</para>
3103 </listitem>
3104 </varlistentry>
3105
3106 <varlistentry>
3107 <term><emphasis role="bold">-debug</emphasis></term>
3108
3109 <listitem>
3110 <para>Displays additional information as the command runs.</para>
3111 </listitem>
3112 </varlistentry>
3113 </variablelist></para>
3114 </listitem>
3115 </orderedlist>
3116
3117 <indexterm>
3118 <primary>commands</primary>
3119
3120 <secondary>xstat_cm_test</secondary>
3121 </indexterm>
3122
3123 <indexterm>
3124 <primary>libxstat_cm.a library</primary>
3125
3126 <secondary>xstat_cm_test example command</secondary>
3127 </indexterm>
3128
3129 <indexterm>
3130 <primary>Cache Manager</primary>
3131
3132 <secondary>xstat_cm_test example command</secondary>
3133 </indexterm>
3134
3135 <indexterm>
3136 <primary>xstat data collection facility</primary>
3137
3138 <secondary>xstat_cm_test example command</secondary>
3139 </indexterm>
3140 </sect3>
3141
3142 <sect3 id="Header_404">
3143 <title>To use the example xstat_cm_test command</title>
3144
3145 <orderedlist>
3146 <listitem>
3147 <para>Issue the example <emphasis role="bold">xstat_cm_test</emphasis> command to test the routines in the <emphasis
3148 role="bold">libxstat_cm.a</emphasis> library and display the data collections associated with the Cache Manager. The
3149 command executes in the foreground. <programlisting>
3150 % <emphasis role="bold">xstat_cm_test</emphasis> [<emphasis role="bold">initcmd</emphasis>] \
3151 <emphasis role="bold">-cmname</emphasis> &lt;<replaceable>Cache Manager name(s) to monitor</replaceable>&gt;+ \
3152 <emphasis role="bold">-collID</emphasis> &lt;<replaceable>Collection(s) to fetch</replaceable>&gt;+ \
3153 [<emphasis role="bold">-onceonly</emphasis>] [<emphasis role="bold">-frequency</emphasis> &lt;<replaceable>poll frequency, in seconds</replaceable>&gt;] \
3154 [<emphasis role="bold">-period</emphasis> &lt;<replaceable>data collection time, in minutes</replaceable>&gt;] [<emphasis
3155 role="bold">-debug</emphasis>]
3156 </programlisting></para>
3157
3158 <para>where <variablelist>
3159 <varlistentry>
3160 <term><emphasis role="bold">xstat_cm_test</emphasis></term>
3161
3162 <listitem>
3163 <para>Must be typed in full.</para>
3164 </listitem>
3165 </varlistentry>
3166
3167 <varlistentry>
3168 <term><emphasis role="bold">initcmd</emphasis></term>
3169
3170 <listitem>
3171 <para>Is an optional string that accommodates the command's use of the AFS command parser. It can be omitted and
3172 ignored.</para>
3173 </listitem>
3174 </varlistentry>
3175
3176 <varlistentry>
3177 <term><emphasis role="bold">-cmname</emphasis></term>
3178
3179 <listitem>
3180 <para>Is the host name of each client machine on which to monitor the Cache Manager.</para>
3181 </listitem>
3182 </varlistentry>
3183
3184 <varlistentry>
3185 <term><emphasis role="bold">-collID</emphasis></term>
3186
3187 <listitem>
3188 <para>Specifies each data collection to return. The indicated data collection defines the type and amount of
3189 data the command is to gather about the Cache Manager. Data is returned in the form of a predefined data
3190 structure (refer to the specification documents referenced previously for more information about the data
3191 structures).</para>
3192
3193 <para>There are two acceptable values: <itemizedlist>
3194 <listitem>
3195 <para><emphasis role="bold">0</emphasis> provides profiling information about the numbers of times
3196 different internal Cache Manager routines were called since the Cache manager was started.</para>
3197 </listitem>
3198
3199 <listitem>
3200 <para><emphasis role="bold">1</emphasis> reports various internal performance statistics related to the
3201 Cache manager (for example, statistics about how effectively the cache is being used and the quantity of
3202 intracell and intercell data access).</para>
3203 </listitem>
3204
3205 <listitem>
3206 <para><emphasis role="bold">2</emphasis> reports all of the internal performance statistics provided by
3207 the <emphasis role="bold">1</emphasis> setting, plus some additional, detailed performance figures about
3208 the Cache Manager (for example, statistics about the number of RPCs sent by the Cache Manager and how long
3209 they take to complete; and statistics regarding things such as authentication, access, and PAG information
3210 associated with data access).</para>
3211 </listitem>
3212 </itemizedlist></para>
3213 </listitem>
3214 </varlistentry>
3215
3216 <varlistentry>
3217 <term><emphasis role="bold">-onceonly</emphasis></term>
3218
3219 <listitem>
3220 <para>Directs the command to gather statistics just one time. Omit this option to have the command continue to
3221 probe the Cache Manager for statistics every 30 seconds. If you omit this option, you can use the &lt;<emphasis
3222 role="bold">Ctrl-c</emphasis>&gt; interrupt signal to halt the command at any time.</para>
3223 </listitem>
3224 </varlistentry>
3225
3226 <varlistentry>
3227 <term><emphasis role="bold">-frequency</emphasis></term>
3228
3229 <listitem>
3230 <para>Sets the frequency in seconds at which the program initiates probes to the Cache Manager. If you omit this
3231 argument, the default is 30 seconds.</para>
3232 </listitem>
3233 </varlistentry>
3234
3235 <varlistentry>
3236 <term><emphasis role="bold">-period</emphasis></term>
3237
3238 <listitem>
3239 <para>Sets how long the utility runs before exiting, as a number of minutes. If you omit this argument, the
3240 default is 10 minutes.</para>
3241 </listitem>
3242 </varlistentry>
3243
3244 <varlistentry>
3245 <term><emphasis role="bold">-debug</emphasis></term>
3246
3247 <listitem>
3248 <para>Displays additional information as the command runs.</para>
3249 </listitem>
3250 </varlistentry>
3251 </variablelist></para>
3252 </listitem>
3253 </orderedlist>
3254 </sect3>
3255 </sect2>
3256 </sect1>
3257
3258 <sect1 id="HDRWQ354">
3259 <title>Auditing AFS Events on AIX File Servers</title>
3260
3261 <indexterm>
3262 <primary>AFS</primary>
3263
3264 <secondary>auditing events on AIX server machines</secondary>
3265 </indexterm>
3266
3267 <indexterm>
3268 <primary>AIX</primary>
3269
3270 <secondary>auditing AFS events</secondary>
3271
3272 <tertiary>about</tertiary>
3273 </indexterm>
3274
3275 <indexterm>
3276 <primary>auditing AFS events on AIX server machines</primary>
3277 </indexterm>
3278
3279 <indexterm>
3280 <primary>events</primary>
3281
3282 <secondary>auditing AFS on AIX server machines</secondary>
3283 </indexterm>
3284
3285 <para>You can audit AFS events on AIX File Servers using an AFS mechanism that transfers audit information from AFS to the AIX
3286 auditing system. The following general classes of AFS events can be audited. For a complete list of specific AFS audit events,
3287 see <link linkend="HDRWQ620">Appendix D, AIX Audit Events</link>. <itemizedlist>
3288 <listitem>
3289 <para>Authentication and Identification Events</para>
3290 </listitem>
3291
3292 <listitem>
3293 <para>Security Events</para>
3294 </listitem>
3295
3296 <listitem>
3297 <para>Privilege Required Events</para>
3298 </listitem>
3299
3300 <listitem>
3301 <para>Object Creation and Deletion Events</para>
3302 </listitem>
3303
3304 <listitem>
3305 <para>Attribute Modification Events</para>
3306 </listitem>
3307
3308 <listitem>
3309 <para>Process Control Events</para>
3310 </listitem>
3311 </itemizedlist></para>
3312
3313 <note>
3314 <para>This section assumes familiarity with the AIX auditing system. For more information, see the <emphasis>AIX System
3315 Management Guide</emphasis> for the version of AIX you are using.</para>
3316 </note>
3317
3318 <sect2 id="Header_406">
3319 <title>Configuring AFS Auditing on AIX File Servers</title>
3320
3321 <para>The directory <emphasis role="bold">/usr/afs/local/audit</emphasis> contains three files that contain the information
3322 needed to configure AIX File Servers to audit AFS events: <itemizedlist>
3323 <listitem>
3324 <para>The <emphasis role="bold">events.sample</emphasis> file contains information on auditable AFS events. The contents
3325 of this file are integrated into the corresponding AIX events file (<emphasis
3326 role="bold">/etc/security/audit/events</emphasis>).</para>
3327 </listitem>
3328
3329 <listitem>
3330 <para>The <emphasis role="bold">config.sample</emphasis> file defines the six classes of AFS audit events and the events
3331 that make up each class. It also defines the classes of AFS audit events to audit for the File Server, which runs as the
3332 local superuser <emphasis role="bold">root</emphasis>. The contents of this file must be integrated into the
3333 corresponding AIX config file (<emphasis role="bold">/etc/security/audit/config</emphasis>).</para>
3334 </listitem>
3335
3336 <listitem>
3337 <para>The <emphasis role="bold">objects.sample</emphasis> file contains a list of information about audited files. You
3338 must only audit files in the local file space. The contents of this file must be integrated into the corresponding AIX
3339 objects file (<emphasis role="bold">/etc/security/audit/objects</emphasis>).</para>
3340 </listitem>
3341 </itemizedlist></para>
3342
3343 <para>Once you have properly configured these files to include the AFS-relevant information, use the AIX auditing system to
3344 start up and shut down the auditing.</para>
3345 </sect2>
3346
3347 <sect2 id="Header_407">
3348 <title>To enable AFS auditing</title>
3349
3350 <orderedlist>
3351 <listitem>
3352 <para>Create the following string in the file <emphasis role="bold">/usr/afs/local/Audit</emphasis> on each File Server on
3353 which you plan to audit AFS events: <programlisting><emphasis role="bold">AFS_AUDIT_AllEvents</emphasis></programlisting></para>
3354 </listitem>
3355
3356 <listitem>
3357 <para>Issue the <emphasis role="bold">bos restart</emphasis> command (with the <emphasis role="bold">-all</emphasis> flag)
3358 to stop and restart all server processes on each File Server. For instructions on using this command, see <link
3359 linkend="HDRWQ170">Stopping and Immediately Restarting Processes</link>.</para>
3360 </listitem>
3361 </orderedlist>
3362 </sect2>
3363
3364 <sect2 id="Header_408">
3365 <title>To disable AFS auditing</title>
3366
3367 <orderedlist>
3368 <listitem>
3369 <para>Remove the contents of the file <emphasis role="bold">/usr/afs/local/Audit</emphasis> on each File Server for which
3370 you are no longer interested in auditing AFS events.</para>
3371 </listitem>
3372
3373 <listitem>
3374 <para>Issue the <emphasis role="bold">bos restart</emphasis> command (with the <emphasis role="bold">-all</emphasis> flag)
3375 to stop and restart all server processes on each File Server. For instructions on using this command, see <link
3376 linkend="HDRWQ170">Stopping and Immediately Restarting Processes</link>.</para>
3377 </listitem>
3378 </orderedlist>
3379 </sect2>
3380 </sect1>
3381 </chapter>