| 1 | |
| 2 | Rx Debug |
| 3 | -------- |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | Rx provides data collections for remote debugging and troubleshooting using UDP |
| 9 | packets. This document provides details on the protocol, data formats, and |
| 10 | the data format versions. |
| 11 | |
| 12 | |
| 13 | Protocol |
| 14 | ======== |
| 15 | |
| 16 | A simple request/response protocol is used to request this information |
| 17 | from an Rx instance. Request and response packets contain an Rx header but |
| 18 | only a subset of the header fields are used, since the debugging packages are |
| 19 | not part of the Rx RPC protocol. |
| 20 | |
| 21 | The protocol is simple. A client sends an Rx DEBUG (8) packet to an |
| 22 | address:port of an active Rx instance. This request contains an arbitrary |
| 23 | request number in the callNumber field of the Rx header (reused here since |
| 24 | DEBUG packets are never used in RPCs). The payload of the request is simply a |
| 25 | pair 32 bit integers in network byte order. The first integer indicates the |
| 26 | which data collection type is requested. The second integer indicates which |
| 27 | record number of the data type requested, for data types which have multiple |
| 28 | records, such as the rx connections and rx peers. The request packet must have |
| 29 | the CLIENT-INITIATED flag set in the Rx header. |
| 30 | |
| 31 | Rx responds with a single Rx DEBUG (8) packet, the payload of which contains |
| 32 | the data record for the type and index requested. The callNumber in the Rx |
| 33 | header contains the same number as the value of the request, allowing the |
| 34 | client to match responses to requests. The response DEBUG packet does not |
| 35 | contain the request type and index parameters. |
| 36 | |
| 37 | The first 32-bits, in network byte order, of the response payload indicate |
| 38 | error conditions: |
| 39 | |
| 40 | * 0xFFFFFFFF (-1) index is out of range |
| 41 | * 0xFFFFFFF8 (-8) unknown request type |
| 42 | |
| 43 | |
| 44 | Data Collection Types |
| 45 | ===================== |
| 46 | |
| 47 | OpenAFS defines 5 types of data collections which may be |
| 48 | requested: |
| 49 | |
| 50 | 1 GETSTATS Basic Rx statistics (struct rx_debugStats) |
| 51 | 2 GETCONN Active connections [indexed] (struct rx_debugConn) |
| 52 | 3 GETALLCONN All connections [indexed] (struct rx_debugConn) |
| 53 | 4 RXSTATS Detailed Rx statistics (struct rx_statistics) |
| 54 | 5 GETPEER Rx peer info [indexed] (struct rx_peerDebug) |
| 55 | |
| 56 | The format of the response data for each type is given below. XDR is |
| 57 | not used. All integers are in network byte order. |
| 58 | |
| 59 | In a typical exchange, a client will request the "basic Rx stats" data first. |
| 60 | This contains a data layout version number (detailed in the next section). |
| 61 | |
| 62 | Types GETCONN (2), GETALLCONN (3), and GETPEER (5), are array-like data |
| 63 | collections. The index field is used to retrieve each record, one per packet. |
| 64 | The first record is index 0. The client may request each record, starting with |
| 65 | zero, and incremented by one on each request packet, until the Rx service |
| 66 | returns -1 (out of range). No provisions are made for locking the data |
| 67 | collections between requests, as this is intended only to be a debugging |
| 68 | interface. |
| 69 | |
| 70 | |
| 71 | Data Collection Versions |
| 72 | ======================== |
| 73 | |
| 74 | Every Rx service has a single byte wide debugging version id, which is set at |
| 75 | build time. This version id allows clients to properly interpret the response |
| 76 | data formats for the various data types. The version id is present in the |
| 77 | basic Rx statistics (type 1) response data. |
| 78 | |
| 79 | The first usable version is 'L', which was present in early Transarc/IBM AFS. |
| 80 | The first version in OpenAFS was 'Q', and versions after 'Q' are OpenAFS |
| 81 | specific extensions. The current version for OpenAFS is 'S'. |
| 82 | |
| 83 | Historically, the version id has been incremented when a new debug data type is |
| 84 | added or changed. The version history is summarized in the following table: |
| 85 | |
| 86 | 'L' - Earliest usable version |
| 87 | - GETSTATS (1) supported |
| 88 | - GETCONNS (2) supported (with obsolete format rx_debugConn_vL) |
| 89 | - Added connection object security stats (rx_securityObjectStats) to GETCONNS (2) |
| 90 | - Transarc/IBM AFS |
| 91 | |
| 92 | 'M' - Added GETALLCONN (3) data type |
| 93 | - Added RXSTATS (4) data type |
| 94 | - Transarc/IBM AFS |
| 95 | |
| 96 | 'N' - Added calls waiting for a thread count (nWaiting) to GETSTATS (1) |
| 97 | - Transarc/IBM AFS |
| 98 | |
| 99 | 'O' - Added number of idle threads count (idleThreads) to GETSTATS (1) |
| 100 | - Transarc/IBM AFS |
| 101 | |
| 102 | 'P' - Added cbuf packet allocation failure counts (receiveCbufPktAllocFailures |
| 103 | and sendCbufPktAllocFailures) to RXSTATS (4) |
| 104 | - Transarc/IBM AFS |
| 105 | |
| 106 | 'Q' - Added GETPEER (5) data type |
| 107 | - Transarc/IBM AFS |
| 108 | - OpenAFS 1.0 |
| 109 | |
| 110 | (?) - Added number of busy aborts sent (nBusies) to RXSTATS (4) |
| 111 | - rxdebug was not changed to display this new count |
| 112 | - OpenAFS 1.4.0 |
| 113 | |
| 114 | 'R' - Added total calls which waited for a thread (nWaited) to GETSTATS (1) |
| 115 | - OpenAFS 1.5.0 (devel) |
| 116 | - OpenAFS 1.6.0 (stable) |
| 117 | |
| 118 | 'S' - Added total packets allocated (nPackets) to GETSTATS (1) |
| 119 | - OpenAFS 1.5.53 (devel) |
| 120 | - OpenAFS 1.6.0 (stable) |
| 121 | |
| 122 | |
| 123 | |
| 124 | Debug Request Parameters |
| 125 | ======================== |
| 126 | |
| 127 | The payload of DEBUG request packets is two 32 bit integers |
| 128 | in network byte order. |
| 129 | |
| 130 | |
| 131 | struct rx_debugIn { |
| 132 | afs_int32 type; /* requested type; range 1..5 */ |
| 133 | afs_int32 index; /* record number: 0 .. n */ |
| 134 | }; |
| 135 | |
| 136 | The index field should be set to 0 when type is GETSTAT (1) and RXSTATS (4). |
| 137 | |
| 138 | |
| 139 | |
| 140 | GETSTATS (1) |
| 141 | ============ |
| 142 | |
| 143 | GETSTATS returns basic Rx performance statistics and the overall debug |
| 144 | version id. |
| 145 | |
| 146 | struct rx_debugStats { |
| 147 | afs_int32 nFreePackets; |
| 148 | afs_int32 packetReclaims; |
| 149 | afs_int32 callsExecuted; |
| 150 | char waitingForPackets; |
| 151 | char usedFDs; |
| 152 | char version; |
| 153 | char spare1; |
| 154 | afs_int32 nWaiting; /* Version 'N': number of calls waiting for a thread */ |
| 155 | afs_int32 idleThreads; /* Version 'O': number of server threads that are idle */ |
| 156 | afs_int32 nWaited; /* Version 'R': total calls waited */ |
| 157 | afs_int32 nPackets; /* Version 'S': total packets allocated */ |
| 158 | afs_int32 spare2[6]; |
| 159 | }; |
| 160 | |
| 161 | |
| 162 | GETCONN (2) and GETALLCONN (3) |
| 163 | ============================== |
| 164 | |
| 165 | GETCONN (2) returns an active connection information record, for the |
| 166 | given index. |
| 167 | |
| 168 | GETALLCONN (3) returns a connection information record, active or not, |
| 169 | for the given index. The GETALLCONN (3) data type was added in |
| 170 | version 'M'. |
| 171 | |
| 172 | The data format is the same for GETCONN (2) and GETALLCONN (3), and is |
| 173 | as follows: |
| 174 | |
| 175 | struct rx_debugConn { |
| 176 | afs_uint32 host; |
| 177 | afs_int32 cid; |
| 178 | afs_int32 serial; |
| 179 | afs_int32 callNumber[RX_MAXCALLS]; |
| 180 | afs_int32 error; |
| 181 | short port; |
| 182 | char flags; |
| 183 | char type; |
| 184 | char securityIndex; |
| 185 | char sparec[3]; /* force correct alignment */ |
| 186 | char callState[RX_MAXCALLS]; |
| 187 | char callMode[RX_MAXCALLS]; |
| 188 | char callFlags[RX_MAXCALLS]; |
| 189 | char callOther[RX_MAXCALLS]; |
| 190 | /* old style getconn stops here */ |
| 191 | struct rx_securityObjectStats secStats; |
| 192 | afs_int32 epoch; |
| 193 | afs_int32 natMTU; |
| 194 | afs_int32 sparel[9]; |
| 195 | }; |
| 196 | |
| 197 | |
| 198 | An obsolete layout, which exhibited a problem with data alignment, was used in |
| 199 | Version 'L'. This is defined as: |
| 200 | |
| 201 | struct rx_debugConn_vL { |
| 202 | afs_uint32 host; |
| 203 | afs_int32 cid; |
| 204 | afs_int32 serial; |
| 205 | afs_int32 callNumber[RX_MAXCALLS]; |
| 206 | afs_int32 error; |
| 207 | short port; |
| 208 | char flags; |
| 209 | char type; |
| 210 | char securityIndex; |
| 211 | char callState[RX_MAXCALLS]; |
| 212 | char callMode[RX_MAXCALLS]; |
| 213 | char callFlags[RX_MAXCALLS]; |
| 214 | char callOther[RX_MAXCALLS]; |
| 215 | /* old style getconn stops here */ |
| 216 | struct rx_securityObjectStats secStats; |
| 217 | afs_int32 sparel[10]; |
| 218 | }; |
| 219 | |
| 220 | |
| 221 | The layout of the secStats field is as follows: |
| 222 | |
| 223 | struct rx_securityObjectStats { |
| 224 | char type; /* 0:unk 1:null,2:vab 3:kad */ |
| 225 | char level; |
| 226 | char sparec[10]; /* force correct alignment */ |
| 227 | afs_int32 flags; /* 1=>unalloc, 2=>auth, 4=>expired */ |
| 228 | afs_uint32 expires; |
| 229 | afs_uint32 packetsReceived; |
| 230 | afs_uint32 packetsSent; |
| 231 | afs_uint32 bytesReceived; |
| 232 | afs_uint32 bytesSent; |
| 233 | short spares[4]; |
| 234 | afs_int32 sparel[8]; |
| 235 | }; |
| 236 | |
| 237 | |
| 238 | |
| 239 | RXSTATS (4) |
| 240 | =========== |
| 241 | |
| 242 | RXSTATS (4) returns general rx statistics. Every member of the returned |
| 243 | structure is a 32 bit integer in network byte order. The assumption is made |
| 244 | sizeof(int) is equal to sizeof(afs_int32). |
| 245 | |
| 246 | The RXSTATS (4) data type was added in Version 'M'. |
| 247 | |
| 248 | |
| 249 | struct rx_statistics { /* General rx statistics */ |
| 250 | int packetRequests; /* Number of packet allocation requests */ |
| 251 | int receivePktAllocFailures; |
| 252 | int sendPktAllocFailures; |
| 253 | int specialPktAllocFailures; |
| 254 | int socketGreedy; /* Whether SO_GREEDY succeeded */ |
| 255 | int bogusPacketOnRead; /* Number of inappropriately short packets received */ |
| 256 | int bogusHost; /* Host address from bogus packets */ |
| 257 | int noPacketOnRead; /* Number of read packets attempted when there was actually no packet to read off the wire */ |
| 258 | int noPacketBuffersOnRead; /* Number of dropped data packets due to lack of packet buffers */ |
| 259 | int selects; /* Number of selects waiting for packet or timeout */ |
| 260 | int sendSelects; /* Number of selects forced when sending packet */ |
| 261 | int packetsRead[RX_N_PACKET_TYPES]; /* Total number of packets read, per type */ |
| 262 | int dataPacketsRead; /* Number of unique data packets read off the wire */ |
| 263 | int ackPacketsRead; /* Number of ack packets read */ |
| 264 | int dupPacketsRead; /* Number of duplicate data packets read */ |
| 265 | int spuriousPacketsRead; /* Number of inappropriate data packets */ |
| 266 | int packetsSent[RX_N_PACKET_TYPES]; /* Number of rxi_Sends: packets sent over the wire, per type */ |
| 267 | int ackPacketsSent; /* Number of acks sent */ |
| 268 | int pingPacketsSent; /* Total number of ping packets sent */ |
| 269 | int abortPacketsSent; /* Total number of aborts */ |
| 270 | int busyPacketsSent; /* Total number of busies sent received */ |
| 271 | int dataPacketsSent; /* Number of unique data packets sent */ |
| 272 | int dataPacketsReSent; /* Number of retransmissions */ |
| 273 | int dataPacketsPushed; /* Number of retransmissions pushed early by a NACK */ |
| 274 | int ignoreAckedPacket; /* Number of packets with acked flag, on rxi_Start */ |
| 275 | struct clock totalRtt; /* Total round trip time measured (use to compute average) */ |
| 276 | struct clock minRtt; /* Minimum round trip time measured */ |
| 277 | struct clock maxRtt; /* Maximum round trip time measured */ |
| 278 | int nRttSamples; /* Total number of round trip samples */ |
| 279 | int nServerConns; /* Total number of server connections */ |
| 280 | int nClientConns; /* Total number of client connections */ |
| 281 | int nPeerStructs; /* Total number of peer structures */ |
| 282 | int nCallStructs; /* Total number of call structures allocated */ |
| 283 | int nFreeCallStructs; /* Total number of previously allocated free call structures */ |
| 284 | int netSendFailures; |
| 285 | afs_int32 fatalErrors; |
| 286 | int ignorePacketDally; /* packets dropped because call is in dally state */ |
| 287 | int receiveCbufPktAllocFailures; /* Version 'P': receive cbuf packet alloc failures */ |
| 288 | int sendCbufPktAllocFailures; /* Version 'P': send cbuf packet alloc failures */ |
| 289 | int nBusies; /* Version 'R': number of busy aborts sent */ |
| 290 | int spares[4]; |
| 291 | }; |
| 292 | |
| 293 | |
| 294 | GETPEER (5) |
| 295 | =========== |
| 296 | |
| 297 | GETPEER (5) returns a peer information record, for the given index. |
| 298 | |
| 299 | struct rx_debugPeer { |
| 300 | afs_uint32 host; |
| 301 | u_short port; |
| 302 | u_short ifMTU; |
| 303 | afs_uint32 idleWhen; |
| 304 | short refCount; |
| 305 | u_char burstSize; |
| 306 | u_char burst; |
| 307 | struct clock burstWait; |
| 308 | afs_int32 rtt; |
| 309 | afs_int32 rtt_dev; |
| 310 | struct clock timeout; |
| 311 | afs_int32 nSent; |
| 312 | afs_int32 reSends; |
| 313 | afs_int32 inPacketSkew; |
| 314 | afs_int32 outPacketSkew; |
| 315 | afs_int32 rateFlag; |
| 316 | u_short natMTU; |
| 317 | u_short maxMTU; |
| 318 | u_short maxDgramPackets; |
| 319 | u_short ifDgramPackets; |
| 320 | u_short MTU; |
| 321 | u_short cwind; |
| 322 | u_short nDgramPackets; |
| 323 | u_short congestSeq; |
| 324 | afs_hyper_t bytesSent; |
| 325 | afs_hyper_t bytesReceived; |
| 326 | afs_int32 sparel[10]; |
| 327 | }; |
| 328 | |
| 329 | |