Commit | Line | Data |
---|---|---|
805e021f CE |
1 | =head1 NAME |
2 | ||
3 | udebug - Reports Ubik process status for a database server process | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | =for html | |
8 | <div class="synopsis"> | |
9 | ||
10 | B<udebug> S<<< B<-server> <I<server machine>> >>> S<<< [B<-port> <I<IP port>>] >>> | |
11 | [B<-long>] [B<-help>] | |
12 | ||
13 | B<udebug> S<<< B<-s> <I<server machine>> >>> S<<< [B<-p> <I<IP port>>] >>> [B<-l>] [B<-h>] | |
14 | ||
15 | =for html | |
16 | </div> | |
17 | ||
18 | =head1 DESCRIPTION | |
19 | ||
20 | The B<udebug> command displays the status of the lightweight Ubik process | |
21 | for the database server process identified by the B<-port> argument that | |
22 | is running on the database server machine named by the B<-server> | |
23 | argument. The output identifies the machines where peer database server | |
24 | processes are running, which of them is the synchronization site (Ubik | |
25 | coordinator), and the status of the connections between them. | |
26 | ||
27 | =head1 OPTIONS | |
28 | ||
29 | =over 4 | |
30 | ||
31 | =item B<-server> <I<server machine>> | |
32 | ||
33 | Names the database server machine that is running the process for which to | |
34 | display status information. Provide the machine's IP address in dotted | |
35 | decimal format, its fully qualified host name (for example, | |
36 | B<fs1.example.com>), or the shortest abbreviated form of its host name that | |
37 | distinguishes it from other machines. Successful use of an abbreviated | |
38 | form depends on the availability of a name resolution service (such as the | |
39 | Domain Name Service or a local host table) at the time the command is | |
40 | issued. | |
41 | ||
42 | =item B<-port> <I<IP port>> | |
43 | ||
44 | Identifies the database server process for which to display status | |
45 | information, either by its process name or port number. Provide one of the | |
46 | following values: | |
47 | ||
48 | =over 4 | |
49 | ||
50 | =item | |
51 | ||
52 | B<buserver> or 7021 for the Backup Server | |
53 | ||
54 | =item | |
55 | ||
56 | B<kaserver> or 7004 for the Authentication Server | |
57 | ||
58 | =item | |
59 | ||
60 | B<ptserver> or 7002 for the Protection Server | |
61 | ||
62 | =item | |
63 | ||
64 | B<vlserver> or 7003 for the Volume Location Server | |
65 | ||
66 | =back | |
67 | ||
68 | =item B<-long> | |
69 | ||
70 | Reports additional information about each peer of the machine named by the | |
71 | B<-server> argument. The information appears by default if that machine | |
72 | is the synchronization site. | |
73 | ||
74 | =item B<-help> | |
75 | ||
76 | Prints the online help for this command. All other valid options are | |
77 | ignored. | |
78 | ||
79 | =back | |
80 | ||
81 | =head1 OUTPUT | |
82 | ||
83 | Several of the messages in the output provide basic status information | |
84 | about the Ubik process on the machine specified by the B<-server> | |
85 | argument, and the remaining messages are useful mostly for debugging | |
86 | purposes. | |
87 | ||
88 | To check basic Ubik status, issue the command for each database server | |
89 | machine in turn. In the output for each, one of the following messages | |
90 | appears in the top third of the output. | |
91 | ||
92 | I am sync site . . . (<#_sites> servers) | |
93 | ||
94 | I am not sync site | |
95 | ||
96 | For the synchronization site, the following message indicates that all | |
97 | sites have the same version of the database, which implies that Ubik is | |
98 | functioning correctly. See the following for a description of values other | |
99 | than C<1f>. | |
100 | ||
101 | Recovery state 1f | |
102 | ||
103 | For correct Ubik operation, the database server machine clocks must agree | |
104 | on the time. The following messages, which are the second and third lines | |
105 | in the output, report the current date and time according to the database | |
106 | server machine's clock and the clock on the machine where the B<udebug> | |
107 | command is issued. | |
108 | ||
109 | Host's <IP_addr> time is <dbserver_date/time> | |
110 | Local time is <local_date/time> (time differential <skew> secs) | |
111 | ||
112 | The <skew> is the difference between the database server machine clock and | |
113 | the local clock. Its absolute value is not vital for Ubik functioning, but | |
114 | a difference of more than a few seconds between the I<skew> values for the | |
115 | database server machines indicates that their clocks are not synchronized | |
116 | and Ubik performance is possibly hampered. | |
117 | ||
118 | Following is a description of all messages in the output. As noted, it is | |
119 | useful mostly for debugging and most meaningful to someone who understands | |
120 | Ubik's implementation. | |
121 | ||
122 | The output begins with the following messages. The first message reports | |
123 | the IP addresses that are configured with the operating system on the | |
124 | machine specified by the B<-server> argument. As previously noted, the | |
125 | second and third messages report the current date and time according to | |
126 | the clocks on the database server machine and the machine where the | |
127 | B<udebug> command is issued, respectively. All subsequent timestamps in | |
128 | the output are expressed in terms of the local clock rather than the | |
129 | database server machine clock. | |
130 | ||
131 | Host's addresses are: <list_of_IP_addrs> | |
132 | Host's <IP_addr> time is <dbserver_date/time> | |
133 | Local time is <local_date/time> (time differential <skew> secs) | |
134 | ||
135 | If the <skew> is more than about 10 seconds, the following message | |
136 | appears. As noted, it does not necessarily indicate Ubik malfunction: it | |
137 | denotes clock skew between the database server machine and the local | |
138 | machine, rather than among the database server machines. | |
139 | ||
140 | ****clock may be bad | |
141 | ||
142 | If the udebug command is issued during the coordinator election process | |
143 | and voting has not yet begun, the following message appears next. | |
144 | ||
145 | Last yes vote not cast yet | |
146 | ||
147 | Otherwise, the output continues with the following messages. | |
148 | ||
149 | Last yes vote for <sync_IP_addr> was <last_vote> secs ago (sync site); | |
150 | Last vote started <vote_start> secs ago (at <date/time>) | |
151 | Local db version is <db_version> | |
152 | ||
153 | The first indicates which peer this Ubik process last voted for as | |
154 | coordinator (it can vote for itself) and how long ago it sent the vote. | |
155 | The second message indicates how long ago the Ubik coordinator requested | |
156 | confirming votes from the secondary sites. Usually, the <last_vote> and | |
157 | <vote_start> values are the same; a difference between them can indicate | |
158 | clock skew or a slow network connection between the two database server | |
159 | machines. A small difference is not harmful. The third message reports the | |
160 | current version number <db_version> of the database maintained by this | |
161 | Ubik process. It has two fields separated by a period. The field before | |
162 | the period is based on a timestamp that reflects when the database first | |
163 | changed after the most recent coordinator election, and the field after | |
164 | the period indicates the number of changes since the election. | |
165 | ||
166 | The output continues with messages that differ depending on whether the | |
167 | Ubik process is the coordinator or not. | |
168 | ||
169 | =over 4 | |
170 | ||
171 | =item * | |
172 | ||
173 | If there is only one database server machine, it is always the coordinator | |
174 | (synchronization site), as indicated by the following message. | |
175 | ||
176 | I am sync site forever (1 server) | |
177 | ||
178 | =item * | |
179 | ||
180 | If there are multiple database sites, and the B<-server> argument names | |
181 | the coordinator (synchronization site), the output continues with the | |
182 | following two messages. | |
183 | ||
184 | I am sync site until <expiration> secs from now (at <date/time>) | |
185 | (<#_sites> servers) | |
186 | Recovery state <flags> | |
187 | ||
188 | The first message (which is reported on one line) reports how much longer | |
189 | the site remains coordinator even if the next attempt to maintain quorum | |
190 | fails, and how many sites are participating in the quorum. The I<flags> | |
191 | field in the second message is a hexadecimal number that indicates the | |
192 | current state of the quorum. A value of C<1f> indicates complete database | |
193 | synchronization, whereas a value of C<f> means that the coordinator has | |
194 | the correct database but cannot contact all secondary sites to determine | |
195 | if they also have it. Lesser values are acceptable if the B<udebug> | |
196 | command is issued during coordinator election, but they denote a problem | |
197 | if they persist. The individual flags have the following meanings: | |
198 | ||
199 | =over 4 | |
200 | ||
201 | =item 0x1 | |
202 | ||
203 | This machine is the coordinator. | |
204 | ||
205 | =item 0x2 | |
206 | ||
207 | The coordinator has determined which site has the database with the | |
208 | highest version number. | |
209 | ||
210 | =item 0x4 | |
211 | ||
212 | The coordinator has a copy of the database with the highest version | |
213 | number. | |
214 | ||
215 | =item 0x8 | |
216 | ||
217 | The database's version number has been updated correctly. | |
218 | ||
219 | =item 0x10 | |
220 | ||
221 | All sites have the database with the highest version number. | |
222 | ||
223 | =back | |
224 | ||
225 | If the udebug command is issued while the coordinator is writing a change | |
226 | into the database, the following additional message appears. | |
227 | ||
228 | I am currently managing write transaction I<identifier> | |
229 | ||
230 | =item * | |
231 | ||
232 | If the B<-server> argument names a secondary site, the output continues | |
233 | with the following messages. | |
234 | ||
235 | I am not sync site | |
236 | Lowest host <lowest_IP_addr> was set <low_time> secs ago | |
237 | Sync host <sync_IP_addr> was set <sync_time> secs ago | |
238 | ||
239 | The <lowest_IP_addr> is the lowest IP address of any peer from which the | |
240 | Ubik process has received a message recently, whereas the <sync_IP_addr> | |
241 | is the IP address of the current coordinator. If they differ, the machine | |
242 | with the lowest IP address is not currently the coordinator. The Ubik | |
243 | process continues voting for the current coordinator as long as they | |
244 | remain in contact, which provides for maximum stability. However, in the | |
245 | event of another coordinator election, this Ubik process votes for the | |
246 | <lowest_IP_addr> site instead (assuming they are in contact), because it | |
247 | has a bias to vote in elections for the site with the lowest IP address. | |
248 | ||
249 | =back | |
250 | ||
251 | For both the synchronization and secondary sites, the output continues | |
252 | with the following messages. The first message reports the version number | |
253 | of the database at the synchronization site, which needs to match the | |
254 | <db_version> reported by the preceding C<Local db version> message. The | |
255 | second message indicates how many VLDB records are currently locked for | |
256 | any operation or for writing in particular. The values are nonzero if the | |
257 | B<udebug> command is issued while an operation is in progress. | |
258 | ||
259 | Sync site's db version is <db_version> | |
260 | <locked> locked pages, <writes> of them for write | |
261 | ||
262 | The following messages appear next only if there are any read or write | |
263 | locks on database records: | |
264 | ||
265 | There are read locks held | |
266 | There are write locks held | |
267 | ||
268 | Similarly, one or more of the following messages appear next only if there | |
269 | are any read or write transactions in progress when the B<udebug> command | |
270 | is issued: | |
271 | ||
272 | There is an active write transaction | |
273 | There is at least one active read transaction | |
274 | Transaction tid is <tid> | |
275 | ||
276 | If the machine named by the B<-server> argument is the coordinator, the | |
277 | next message reports when the current coordinator last updated the | |
278 | database. | |
279 | ||
280 | Last time a new db version was labelled was: | |
281 | <last_restart> secs ago (at <date/time>) | |
282 | ||
283 | If the machine named by the B<-server> argument is the coordinator, the | |
284 | output concludes with an entry for each secondary site that is | |
285 | participating in the quorum, in the following format. | |
286 | ||
287 | Server (<IP_address>): (db <db_version>) | |
288 | last vote rcvd <last_vote> secs ago (at <date/time>), | |
289 | last beacon sent <last_beacon> secs ago (at <date/time>), | |
290 | last vote was { yes | no } | |
291 | dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 } | |
292 | ||
293 | The first line reports the site's IP address and the version number of the | |
294 | database it is maintaining. The <last_vote> field reports how long ago the | |
295 | coordinator received a vote message from the Ubik process at the site, and | |
296 | the <last_beacon> field how long ago the coordinator last requested a vote | |
297 | message. If the B<udebug> command is issued during the coordinator | |
298 | election process and voting has not yet begun, the following messages | |
299 | appear instead. | |
300 | ||
301 | Last vote never rcvd | |
302 | Last beacon never sent | |
303 | ||
304 | On the final line of each entry, the fields have the following meaning: | |
305 | ||
306 | =over 4 | |
307 | ||
308 | =item * | |
309 | ||
310 | C<dbcurrent> is C<1> if the site has the database with the highest version | |
311 | number, C<0> if it does not. | |
312 | ||
313 | =item * | |
314 | ||
315 | C<up> is C<1> if the Ubik process at the site is functioning correctly, | |
316 | C<0> if it is not. | |
317 | ||
318 | =item * | |
319 | ||
320 | C<beaconSince> is C<1> if the site has responded to the coordinator's last | |
321 | request for votes, C<0> if it has not. | |
322 | ||
323 | =back | |
324 | ||
325 | Including the B<-long> flag produces peer entries even when the | |
326 | B<-server> argument names a secondary site, but in that case only the | |
327 | I<IP_address> field is guaranteed to be accurate. For example, the value | |
328 | in the <db_version> field is usually C<0.0>, because secondary sites do | |
329 | not poll their peers for this information. The values in the I<last_vote> | |
330 | and I<last_beacon> fields indicate when this site last received or | |
331 | requested a vote as coordinator; they generally indicate the time of the | |
332 | last coordinator election. | |
333 | ||
334 | =head1 EXAMPLES | |
335 | ||
336 | This example checks the status of the Ubik process for the Volume Location | |
337 | Server on the machine C<afs1>, which is the synchronization site. | |
338 | ||
339 | % udebug afs1 vlserver | |
340 | Host's addresses are: 192.12.107.33 | |
341 | Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999 | |
342 | Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs) | |
343 | Last yes vote for 192.12.107.33 was 1 secs ago (sync site); | |
344 | Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999) | |
345 | Local db version is 940902602.674 | |
346 | I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers) | |
347 | Recovery state 1f | |
348 | Sync site's db version is 940902602.674 | |
349 | 0 locked pages, 0 of them for write | |
350 | Last time a new db version was labelled was: | |
351 | 129588 secs ago (at Mon Oct 25 21:50:04 1999) | |
352 | ||
353 | Server( 192.12.107.35 ): (db 940902602.674) | |
354 | last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999), | |
355 | last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes | |
356 | dbcurrent=1, up=1 beaconSince=1 | |
357 | ||
358 | Server( 192.12.107.34 ): (db 940902602.674) | |
359 | last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999), | |
360 | last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes | |
361 | dbcurrent=1, up=1 beaconSince=1 | |
362 | ||
363 | This example checks the status of the Authentication Server on the machine | |
364 | with IP address 192.12.107.34, which is a secondary site. The local clock | |
365 | is about 4 minutes behind the database server machine's clock. | |
366 | ||
367 | % udebug 192.12.107.34 7004 | |
368 | Host's addresses are: 192.12.107.34 | |
369 | Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999 | |
370 | Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs) | |
371 | ****clock may be bad | |
372 | Last yes vote for 192.12.107.33 was 6 secs ago (sync site); | |
373 | Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999) | |
374 | Local db version is 940906574.25 | |
375 | I am not sync site | |
376 | Lowest host 192.12.107.33 was set 6 secs ago | |
377 | Sync host 192.12.107.33 was set 6 secs ago | |
378 | Sync site's db version is 940906574.25 | |
379 | 0 locked pages, 0 of them for write | |
380 | ||
381 | =head1 PRIVILEGE REQUIRED | |
382 | ||
383 | None | |
384 | ||
385 | =head1 SEE ALSO | |
386 | ||
387 | L<buserver(8)>, | |
388 | L<kaserver(8)>, | |
389 | L<ptserver(8)>, | |
390 | L<vlserver(8)> | |
391 | ||
392 | =head1 COPYRIGHT | |
393 | ||
394 | IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved. | |
395 | ||
396 | This documentation is covered by the IBM Public License Version 1.0. It was | |
397 | converted from HTML to POD by software written by Chas Williams and Russ | |
398 | Allbery, based on work by Alf Wachsmann and Elizabeth Cassell. |