Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / txt / rx-debug.txt
1
2 Rx Debug
3 --------
4
5 Introduction
6 ============
7
8 Rx provides data collections for remote debugging and troubleshooting using UDP
9 packets. This document provides details on the protocol, data formats, and
10 the data format versions.
11
12
13 Protocol
14 ========
15
16 A simple request/response protocol is used to request this information
17 from an Rx instance. Request and response packets contain an Rx header but
18 only a subset of the header fields are used, since the debugging packages are
19 not part of the Rx RPC protocol.
20
21 The protocol is simple. A client sends an Rx DEBUG (8) packet to an
22 address:port of an active Rx instance. This request contains an arbitrary
23 request number in the callNumber field of the Rx header (reused here since
24 DEBUG packets are never used in RPCs). The payload of the request is simply a
25 pair 32 bit integers in network byte order. The first integer indicates the
26 which data collection type is requested. The second integer indicates which
27 record number of the data type requested, for data types which have multiple
28 records, such as the rx connections and rx peers. The request packet must have
29 the CLIENT-INITIATED flag set in the Rx header.
30
31 Rx responds with a single Rx DEBUG (8) packet, the payload of which contains
32 the data record for the type and index requested. The callNumber in the Rx
33 header contains the same number as the value of the request, allowing the
34 client to match responses to requests. The response DEBUG packet does not
35 contain the request type and index parameters.
36
37 The first 32-bits, in network byte order, of the response payload indicate
38 error conditions:
39
40 * 0xFFFFFFFF (-1) index is out of range
41 * 0xFFFFFFF8 (-8) unknown request type
42
43
44 Data Collection Types
45 =====================
46
47 OpenAFS defines 5 types of data collections which may be
48 requested:
49
50 1 GETSTATS Basic Rx statistics (struct rx_debugStats)
51 2 GETCONN Active connections [indexed] (struct rx_debugConn)
52 3 GETALLCONN All connections [indexed] (struct rx_debugConn)
53 4 RXSTATS Detailed Rx statistics (struct rx_statistics)
54 5 GETPEER Rx peer info [indexed] (struct rx_peerDebug)
55
56 The format of the response data for each type is given below. XDR is
57 not used. All integers are in network byte order.
58
59 In a typical exchange, a client will request the "basic Rx stats" data first.
60 This contains a data layout version number (detailed in the next section).
61
62 Types GETCONN (2), GETALLCONN (3), and GETPEER (5), are array-like data
63 collections. The index field is used to retrieve each record, one per packet.
64 The first record is index 0. The client may request each record, starting with
65 zero, and incremented by one on each request packet, until the Rx service
66 returns -1 (out of range). No provisions are made for locking the data
67 collections between requests, as this is intended only to be a debugging
68 interface.
69
70
71 Data Collection Versions
72 ========================
73
74 Every Rx service has a single byte wide debugging version id, which is set at
75 build time. This version id allows clients to properly interpret the response
76 data formats for the various data types. The version id is present in the
77 basic Rx statistics (type 1) response data.
78
79 The first usable version is 'L', which was present in early Transarc/IBM AFS.
80 The first version in OpenAFS was 'Q', and versions after 'Q' are OpenAFS
81 specific extensions. The current version for OpenAFS is 'S'.
82
83 Historically, the version id has been incremented when a new debug data type is
84 added or changed. The version history is summarized in the following table:
85
86 'L' - Earliest usable version
87 - GETSTATS (1) supported
88 - GETCONNS (2) supported (with obsolete format rx_debugConn_vL)
89 - Added connection object security stats (rx_securityObjectStats) to GETCONNS (2)
90 - Transarc/IBM AFS
91
92 'M' - Added GETALLCONN (3) data type
93 - Added RXSTATS (4) data type
94 - Transarc/IBM AFS
95
96 'N' - Added calls waiting for a thread count (nWaiting) to GETSTATS (1)
97 - Transarc/IBM AFS
98
99 'O' - Added number of idle threads count (idleThreads) to GETSTATS (1)
100 - Transarc/IBM AFS
101
102 'P' - Added cbuf packet allocation failure counts (receiveCbufPktAllocFailures
103 and sendCbufPktAllocFailures) to RXSTATS (4)
104 - Transarc/IBM AFS
105
106 'Q' - Added GETPEER (5) data type
107 - Transarc/IBM AFS
108 - OpenAFS 1.0
109
110 (?) - Added number of busy aborts sent (nBusies) to RXSTATS (4)
111 - rxdebug was not changed to display this new count
112 - OpenAFS 1.4.0
113
114 'R' - Added total calls which waited for a thread (nWaited) to GETSTATS (1)
115 - OpenAFS 1.5.0 (devel)
116 - OpenAFS 1.6.0 (stable)
117
118 'S' - Added total packets allocated (nPackets) to GETSTATS (1)
119 - OpenAFS 1.5.53 (devel)
120 - OpenAFS 1.6.0 (stable)
121
122
123
124 Debug Request Parameters
125 ========================
126
127 The payload of DEBUG request packets is two 32 bit integers
128 in network byte order.
129
130
131 struct rx_debugIn {
132 afs_int32 type; /* requested type; range 1..5 */
133 afs_int32 index; /* record number: 0 .. n */
134 };
135
136 The index field should be set to 0 when type is GETSTAT (1) and RXSTATS (4).
137
138
139
140 GETSTATS (1)
141 ============
142
143 GETSTATS returns basic Rx performance statistics and the overall debug
144 version id.
145
146 struct rx_debugStats {
147 afs_int32 nFreePackets;
148 afs_int32 packetReclaims;
149 afs_int32 callsExecuted;
150 char waitingForPackets;
151 char usedFDs;
152 char version;
153 char spare1;
154 afs_int32 nWaiting; /* Version 'N': number of calls waiting for a thread */
155 afs_int32 idleThreads; /* Version 'O': number of server threads that are idle */
156 afs_int32 nWaited; /* Version 'R': total calls waited */
157 afs_int32 nPackets; /* Version 'S': total packets allocated */
158 afs_int32 spare2[6];
159 };
160
161
162 GETCONN (2) and GETALLCONN (3)
163 ==============================
164
165 GETCONN (2) returns an active connection information record, for the
166 given index.
167
168 GETALLCONN (3) returns a connection information record, active or not,
169 for the given index. The GETALLCONN (3) data type was added in
170 version 'M'.
171
172 The data format is the same for GETCONN (2) and GETALLCONN (3), and is
173 as follows:
174
175 struct rx_debugConn {
176 afs_uint32 host;
177 afs_int32 cid;
178 afs_int32 serial;
179 afs_int32 callNumber[RX_MAXCALLS];
180 afs_int32 error;
181 short port;
182 char flags;
183 char type;
184 char securityIndex;
185 char sparec[3]; /* force correct alignment */
186 char callState[RX_MAXCALLS];
187 char callMode[RX_MAXCALLS];
188 char callFlags[RX_MAXCALLS];
189 char callOther[RX_MAXCALLS];
190 /* old style getconn stops here */
191 struct rx_securityObjectStats secStats;
192 afs_int32 epoch;
193 afs_int32 natMTU;
194 afs_int32 sparel[9];
195 };
196
197
198 An obsolete layout, which exhibited a problem with data alignment, was used in
199 Version 'L'. This is defined as:
200
201 struct rx_debugConn_vL {
202 afs_uint32 host;
203 afs_int32 cid;
204 afs_int32 serial;
205 afs_int32 callNumber[RX_MAXCALLS];
206 afs_int32 error;
207 short port;
208 char flags;
209 char type;
210 char securityIndex;
211 char callState[RX_MAXCALLS];
212 char callMode[RX_MAXCALLS];
213 char callFlags[RX_MAXCALLS];
214 char callOther[RX_MAXCALLS];
215 /* old style getconn stops here */
216 struct rx_securityObjectStats secStats;
217 afs_int32 sparel[10];
218 };
219
220
221 The layout of the secStats field is as follows:
222
223 struct rx_securityObjectStats {
224 char type; /* 0:unk 1:null,2:vab 3:kad */
225 char level;
226 char sparec[10]; /* force correct alignment */
227 afs_int32 flags; /* 1=>unalloc, 2=>auth, 4=>expired */
228 afs_uint32 expires;
229 afs_uint32 packetsReceived;
230 afs_uint32 packetsSent;
231 afs_uint32 bytesReceived;
232 afs_uint32 bytesSent;
233 short spares[4];
234 afs_int32 sparel[8];
235 };
236
237
238
239 RXSTATS (4)
240 ===========
241
242 RXSTATS (4) returns general rx statistics. Every member of the returned
243 structure is a 32 bit integer in network byte order. The assumption is made
244 sizeof(int) is equal to sizeof(afs_int32).
245
246 The RXSTATS (4) data type was added in Version 'M'.
247
248
249 struct rx_statistics { /* General rx statistics */
250 int packetRequests; /* Number of packet allocation requests */
251 int receivePktAllocFailures;
252 int sendPktAllocFailures;
253 int specialPktAllocFailures;
254 int socketGreedy; /* Whether SO_GREEDY succeeded */
255 int bogusPacketOnRead; /* Number of inappropriately short packets received */
256 int bogusHost; /* Host address from bogus packets */
257 int noPacketOnRead; /* Number of read packets attempted when there was actually no packet to read off the wire */
258 int noPacketBuffersOnRead; /* Number of dropped data packets due to lack of packet buffers */
259 int selects; /* Number of selects waiting for packet or timeout */
260 int sendSelects; /* Number of selects forced when sending packet */
261 int packetsRead[RX_N_PACKET_TYPES]; /* Total number of packets read, per type */
262 int dataPacketsRead; /* Number of unique data packets read off the wire */
263 int ackPacketsRead; /* Number of ack packets read */
264 int dupPacketsRead; /* Number of duplicate data packets read */
265 int spuriousPacketsRead; /* Number of inappropriate data packets */
266 int packetsSent[RX_N_PACKET_TYPES]; /* Number of rxi_Sends: packets sent over the wire, per type */
267 int ackPacketsSent; /* Number of acks sent */
268 int pingPacketsSent; /* Total number of ping packets sent */
269 int abortPacketsSent; /* Total number of aborts */
270 int busyPacketsSent; /* Total number of busies sent received */
271 int dataPacketsSent; /* Number of unique data packets sent */
272 int dataPacketsReSent; /* Number of retransmissions */
273 int dataPacketsPushed; /* Number of retransmissions pushed early by a NACK */
274 int ignoreAckedPacket; /* Number of packets with acked flag, on rxi_Start */
275 struct clock totalRtt; /* Total round trip time measured (use to compute average) */
276 struct clock minRtt; /* Minimum round trip time measured */
277 struct clock maxRtt; /* Maximum round trip time measured */
278 int nRttSamples; /* Total number of round trip samples */
279 int nServerConns; /* Total number of server connections */
280 int nClientConns; /* Total number of client connections */
281 int nPeerStructs; /* Total number of peer structures */
282 int nCallStructs; /* Total number of call structures allocated */
283 int nFreeCallStructs; /* Total number of previously allocated free call structures */
284 int netSendFailures;
285 afs_int32 fatalErrors;
286 int ignorePacketDally; /* packets dropped because call is in dally state */
287 int receiveCbufPktAllocFailures; /* Version 'P': receive cbuf packet alloc failures */
288 int sendCbufPktAllocFailures; /* Version 'P': send cbuf packet alloc failures */
289 int nBusies; /* Version 'R': number of busy aborts sent */
290 int spares[4];
291 };
292
293
294 GETPEER (5)
295 ===========
296
297 GETPEER (5) returns a peer information record, for the given index.
298
299 struct rx_debugPeer {
300 afs_uint32 host;
301 u_short port;
302 u_short ifMTU;
303 afs_uint32 idleWhen;
304 short refCount;
305 u_char burstSize;
306 u_char burst;
307 struct clock burstWait;
308 afs_int32 rtt;
309 afs_int32 rtt_dev;
310 struct clock timeout;
311 afs_int32 nSent;
312 afs_int32 reSends;
313 afs_int32 inPacketSkew;
314 afs_int32 outPacketSkew;
315 afs_int32 rateFlag;
316 u_short natMTU;
317 u_short maxMTU;
318 u_short maxDgramPackets;
319 u_short ifDgramPackets;
320 u_short MTU;
321 u_short cwind;
322 u_short nDgramPackets;
323 u_short congestSeq;
324 afs_hyper_t bytesSent;
325 afs_hyper_t bytesReceived;
326 afs_int32 sparel[10];
327 };
328
329