Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / man-pages / pod1 / udebug.pod
CommitLineData
805e021f
CE
1=head1 NAME
2
3udebug - Reports Ubik process status for a database server process
4
5=head1 SYNOPSIS
6
7=for html
8<div class="synopsis">
9
10B<udebug> S<<< B<-server> <I<server machine>> >>> S<<< [B<-port> <I<IP port>>] >>>
11 [B<-long>] [B<-help>]
12
13B<udebug> S<<< B<-s> <I<server machine>> >>> S<<< [B<-p> <I<IP port>>] >>> [B<-l>] [B<-h>]
14
15=for html
16</div>
17
18=head1 DESCRIPTION
19
20The B<udebug> command displays the status of the lightweight Ubik process
21for the database server process identified by the B<-port> argument that
22is running on the database server machine named by the B<-server>
23argument. The output identifies the machines where peer database server
24processes are running, which of them is the synchronization site (Ubik
25coordinator), and the status of the connections between them.
26
27=head1 OPTIONS
28
29=over 4
30
31=item B<-server> <I<server machine>>
32
33Names the database server machine that is running the process for which to
34display status information. Provide the machine's IP address in dotted
35decimal format, its fully qualified host name (for example,
36B<fs1.example.com>), or the shortest abbreviated form of its host name that
37distinguishes it from other machines. Successful use of an abbreviated
38form depends on the availability of a name resolution service (such as the
39Domain Name Service or a local host table) at the time the command is
40issued.
41
42=item B<-port> <I<IP port>>
43
44Identifies the database server process for which to display status
45information, either by its process name or port number. Provide one of the
46following values:
47
48=over 4
49
50=item
51
52B<buserver> or 7021 for the Backup Server
53
54=item
55
56B<kaserver> or 7004 for the Authentication Server
57
58=item
59
60B<ptserver> or 7002 for the Protection Server
61
62=item
63
64B<vlserver> or 7003 for the Volume Location Server
65
66=back
67
68=item B<-long>
69
70Reports additional information about each peer of the machine named by the
71B<-server> argument. The information appears by default if that machine
72is the synchronization site.
73
74=item B<-help>
75
76Prints the online help for this command. All other valid options are
77ignored.
78
79=back
80
81=head1 OUTPUT
82
83Several of the messages in the output provide basic status information
84about the Ubik process on the machine specified by the B<-server>
85argument, and the remaining messages are useful mostly for debugging
86purposes.
87
88To check basic Ubik status, issue the command for each database server
89machine in turn. In the output for each, one of the following messages
90appears in the top third of the output.
91
92 I am sync site . . . (<#_sites> servers)
93
94 I am not sync site
95
96For the synchronization site, the following message indicates that all
97sites have the same version of the database, which implies that Ubik is
98functioning correctly. See the following for a description of values other
99than C<1f>.
100
101 Recovery state 1f
102
103For correct Ubik operation, the database server machine clocks must agree
104on the time. The following messages, which are the second and third lines
105in the output, report the current date and time according to the database
106server machine's clock and the clock on the machine where the B<udebug>
107command is issued.
108
109 Host's <IP_addr> time is <dbserver_date/time>
110 Local time is <local_date/time> (time differential <skew> secs)
111
112The <skew> is the difference between the database server machine clock and
113the local clock. Its absolute value is not vital for Ubik functioning, but
114a difference of more than a few seconds between the I<skew> values for the
115database server machines indicates that their clocks are not synchronized
116and Ubik performance is possibly hampered.
117
118Following is a description of all messages in the output. As noted, it is
119useful mostly for debugging and most meaningful to someone who understands
120Ubik's implementation.
121
122The output begins with the following messages. The first message reports
123the IP addresses that are configured with the operating system on the
124machine specified by the B<-server> argument. As previously noted, the
125second and third messages report the current date and time according to
126the clocks on the database server machine and the machine where the
127B<udebug> command is issued, respectively. All subsequent timestamps in
128the output are expressed in terms of the local clock rather than the
129database server machine clock.
130
131 Host's addresses are: <list_of_IP_addrs>
132 Host's <IP_addr> time is <dbserver_date/time>
133 Local time is <local_date/time> (time differential <skew> secs)
134
135If the <skew> is more than about 10 seconds, the following message
136appears. As noted, it does not necessarily indicate Ubik malfunction: it
137denotes clock skew between the database server machine and the local
138machine, rather than among the database server machines.
139
140 ****clock may be bad
141
142If the udebug command is issued during the coordinator election process
143and voting has not yet begun, the following message appears next.
144
145 Last yes vote not cast yet
146
147Otherwise, the output continues with the following messages.
148
149 Last yes vote for <sync_IP_addr> was <last_vote> secs ago (sync site);
150 Last vote started <vote_start> secs ago (at <date/time>)
151 Local db version is <db_version>
152
153The first indicates which peer this Ubik process last voted for as
154coordinator (it can vote for itself) and how long ago it sent the vote.
155The second message indicates how long ago the Ubik coordinator requested
156confirming votes from the secondary sites. Usually, the <last_vote> and
157<vote_start> values are the same; a difference between them can indicate
158clock skew or a slow network connection between the two database server
159machines. A small difference is not harmful. The third message reports the
160current version number <db_version> of the database maintained by this
161Ubik process. It has two fields separated by a period. The field before
162the period is based on a timestamp that reflects when the database first
163changed after the most recent coordinator election, and the field after
164the period indicates the number of changes since the election.
165
166The output continues with messages that differ depending on whether the
167Ubik process is the coordinator or not.
168
169=over 4
170
171=item *
172
173If there is only one database server machine, it is always the coordinator
174(synchronization site), as indicated by the following message.
175
176 I am sync site forever (1 server)
177
178=item *
179
180If there are multiple database sites, and the B<-server> argument names
181the coordinator (synchronization site), the output continues with the
182following two messages.
183
184 I am sync site until <expiration> secs from now (at <date/time>)
185 (<#_sites> servers)
186 Recovery state <flags>
187
188The first message (which is reported on one line) reports how much longer
189the site remains coordinator even if the next attempt to maintain quorum
190fails, and how many sites are participating in the quorum. The I<flags>
191field in the second message is a hexadecimal number that indicates the
192current state of the quorum. A value of C<1f> indicates complete database
193synchronization, whereas a value of C<f> means that the coordinator has
194the correct database but cannot contact all secondary sites to determine
195if they also have it. Lesser values are acceptable if the B<udebug>
196command is issued during coordinator election, but they denote a problem
197if they persist. The individual flags have the following meanings:
198
199=over 4
200
201=item 0x1
202
203This machine is the coordinator.
204
205=item 0x2
206
207The coordinator has determined which site has the database with the
208highest version number.
209
210=item 0x4
211
212The coordinator has a copy of the database with the highest version
213number.
214
215=item 0x8
216
217The database's version number has been updated correctly.
218
219=item 0x10
220
221All sites have the database with the highest version number.
222
223=back
224
225If the udebug command is issued while the coordinator is writing a change
226into the database, the following additional message appears.
227
228 I am currently managing write transaction I<identifier>
229
230=item *
231
232If the B<-server> argument names a secondary site, the output continues
233with the following messages.
234
235 I am not sync site
236 Lowest host <lowest_IP_addr> was set <low_time> secs ago
237 Sync host <sync_IP_addr> was set <sync_time> secs ago
238
239The <lowest_IP_addr> is the lowest IP address of any peer from which the
240Ubik process has received a message recently, whereas the <sync_IP_addr>
241is the IP address of the current coordinator. If they differ, the machine
242with the lowest IP address is not currently the coordinator. The Ubik
243process continues voting for the current coordinator as long as they
244remain in contact, which provides for maximum stability. However, in the
245event of another coordinator election, this Ubik process votes for the
246<lowest_IP_addr> site instead (assuming they are in contact), because it
247has a bias to vote in elections for the site with the lowest IP address.
248
249=back
250
251For both the synchronization and secondary sites, the output continues
252with the following messages. The first message reports the version number
253of the database at the synchronization site, which needs to match the
254<db_version> reported by the preceding C<Local db version> message. The
255second message indicates how many VLDB records are currently locked for
256any operation or for writing in particular. The values are nonzero if the
257B<udebug> command is issued while an operation is in progress.
258
259 Sync site's db version is <db_version>
260 <locked> locked pages, <writes> of them for write
261
262The following messages appear next only if there are any read or write
263locks on database records:
264
265 There are read locks held
266 There are write locks held
267
268Similarly, one or more of the following messages appear next only if there
269are any read or write transactions in progress when the B<udebug> command
270is issued:
271
272 There is an active write transaction
273 There is at least one active read transaction
274 Transaction tid is <tid>
275
276If the machine named by the B<-server> argument is the coordinator, the
277next message reports when the current coordinator last updated the
278database.
279
280 Last time a new db version was labelled was:
281 <last_restart> secs ago (at <date/time>)
282
283If the machine named by the B<-server> argument is the coordinator, the
284output concludes with an entry for each secondary site that is
285participating in the quorum, in the following format.
286
287 Server (<IP_address>): (db <db_version>)
288 last vote rcvd <last_vote> secs ago (at <date/time>),
289 last beacon sent <last_beacon> secs ago (at <date/time>),
290 last vote was { yes | no }
291 dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
292
293The first line reports the site's IP address and the version number of the
294database it is maintaining. The <last_vote> field reports how long ago the
295coordinator received a vote message from the Ubik process at the site, and
296the <last_beacon> field how long ago the coordinator last requested a vote
297message. If the B<udebug> command is issued during the coordinator
298election process and voting has not yet begun, the following messages
299appear instead.
300
301 Last vote never rcvd
302 Last beacon never sent
303
304On the final line of each entry, the fields have the following meaning:
305
306=over 4
307
308=item *
309
310C<dbcurrent> is C<1> if the site has the database with the highest version
311number, C<0> if it does not.
312
313=item *
314
315C<up> is C<1> if the Ubik process at the site is functioning correctly,
316C<0> if it is not.
317
318=item *
319
320C<beaconSince> is C<1> if the site has responded to the coordinator's last
321request for votes, C<0> if it has not.
322
323=back
324
325Including the B<-long> flag produces peer entries even when the
326B<-server> argument names a secondary site, but in that case only the
327I<IP_address> field is guaranteed to be accurate. For example, the value
328in the <db_version> field is usually C<0.0>, because secondary sites do
329not poll their peers for this information. The values in the I<last_vote>
330and I<last_beacon> fields indicate when this site last received or
331requested a vote as coordinator; they generally indicate the time of the
332last coordinator election.
333
334=head1 EXAMPLES
335
336This example checks the status of the Ubik process for the Volume Location
337Server on the machine C<afs1>, which is the synchronization site.
338
339 % udebug afs1 vlserver
340 Host's addresses are: 192.12.107.33
341 Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999
342 Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs)
343 Last yes vote for 192.12.107.33 was 1 secs ago (sync site);
344 Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999)
345 Local db version is 940902602.674
346 I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers)
347 Recovery state 1f
348 Sync site's db version is 940902602.674
349 0 locked pages, 0 of them for write
350 Last time a new db version was labelled was:
351 129588 secs ago (at Mon Oct 25 21:50:04 1999)
352
353 Server( 192.12.107.35 ): (db 940902602.674)
354 last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
355 last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
356 dbcurrent=1, up=1 beaconSince=1
357
358 Server( 192.12.107.34 ): (db 940902602.674)
359 last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
360 last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
361 dbcurrent=1, up=1 beaconSince=1
362
363This example checks the status of the Authentication Server on the machine
364with IP address 192.12.107.34, which is a secondary site. The local clock
365is about 4 minutes behind the database server machine's clock.
366
367 % udebug 192.12.107.34 7004
368 Host's addresses are: 192.12.107.34
369 Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999
370 Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs)
371 ****clock may be bad
372 Last yes vote for 192.12.107.33 was 6 secs ago (sync site);
373 Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999)
374 Local db version is 940906574.25
375 I am not sync site
376 Lowest host 192.12.107.33 was set 6 secs ago
377 Sync host 192.12.107.33 was set 6 secs ago
378 Sync site's db version is 940906574.25
379 0 locked pages, 0 of them for write
380
381=head1 PRIVILEGE REQUIRED
382
383None
384
385=head1 SEE ALSO
386
387L<buserver(8)>,
388L<kaserver(8)>,
389L<ptserver(8)>,
390L<vlserver(8)>
391
392=head1 COPYRIGHT
393
394IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
395
396This documentation is covered by the IBM Public License Version 1.0. It was
397converted from HTML to POD by software written by Chas Williams and Russ
398Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.