Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / txt / rx-spec.txt
CommitLineData
805e021f
CE
1Rx protocol specification draft
2Nickolai Zeldovich, kolya@MIT.EDU
3
4Introduction
5============
6
7Rx is a client-server RPC protocol, an extended and combined version
8of the older R and RFTP protocols. This document describes Rx, but
9the details of Rx security protocols (such as Rxkad) are not specified.
10
11Rx communicates via UDP datagrams on a user-specified port. Rx also
12provides for multiplexing of Rx services on a single port, via a
1316-bit service ID which identifies a particular Rx service that's
14listening on a given port akin to a port number. Therefore, an Rx
15service is identified by a triple of <IP address; UDP port number;
16Rx service ID>.
17
18The protocol is connection-oriented -- a client and a server must
19first hand-shake and establish a connection before Rx calls can be
20made. Said hand-shaking is implicit upon the first request if no
21authentication is desired, or can consist of a pair of Challenge
22and Response requests in order to establish authentication between
23the client and the server.
24
25Protocol Overview
26=================
27
28As mentioned above, Rx uses UDP/IP datagrams on a user-specified
29port to communicate. An optional user-selectable authentication
30and encryption method can be used to achieve desired security.
31Each Rx server may provide multiple services, specified by the
32Service ID. This allows for service multiplexing, much in the
33same way as UDP port numbers allow for multiplexing of UDP
34datagrams addressed to the same host.
35
36Each client and server pair that want to communicate using Rx must
37establish an Rx connection, which can be thought of as a context
38for all subsequent Rx activity between these two parties. An Rx
39connection can only be associated with a single Rx service.
40
41Each Rx connection context contains multiple channels, which are
42used for data transmission and actually performing an RPC call.
43The channels are independent of each other, allowing multiple
44RPC calls to be performed to the same Rx server simultaneously.
45
46An Rx call involves the transmission of call arguments over an Rx
47channel to the server and reception of the reply data. For each
48Rx call, an available Rx channel must be allocated exclusively to
49that call. The channel cannot be used for anything else until the
50call completes. After call completion, the channel may be reused
51for subsequent Rx calls.
52
53Rx Connections
54==============
55
56This section makes many references to fields of an Rx header; see
57the ``Packet Formats'' section for specific layout of the Rx header.
58
59The connection epoch is a unique value chosen by Rx on startup and
60used by the peer to both to identify connections to this host, and
61to detect when this host's Rx restarts. An Rx connection between
62two hosts is identified by:
63
64 { Epoch, Connection ID, Peer IP, Peer Port },
65 if the high bit of the epoch (+) is not set
66 { Epoch, Connection ID },
67 if the high bit of the epoch (+) is set
68
69This means that if the high epoch bit is set, the recipient of a
70packet should accept packets for this Rx connection from any IP
71address and port number. Conversely, if the high bit is not set,
72the IP and port number must be the same in order for packets to
73be properly recognized as being part of the same connection.
74
75Connection ID is chosen by the client that establishes the connection.
76The last two bits of the same 32-bit field are used by Rx to multiplex
77between 4 parallel calls on the same connection. Each one of them is
78called an Rx channel, and therefore the field is denoted "Channel ID".
79
80Call number identifies a particular call within a channel (so there
81are four call numbers associated with an Rx connection). Each new
82call should start with a higher number than the previous call, and
83typically this is just the previous call number + 1. The initial
84call number must be non-zero, since call number zero indicates a
85connection-only Rx packet (see below). The call number is chosen
86by the peer initiating the call. Although only one call can use
87a channel at one time, the call number allows peers to distinguish
88packets on the same channel that belong to different calls.
89
90The sequence number is similar to the sequence number in TCP, but
91instead of bytes they count packets within a call. Sequence numbers
92always start with 1 at the beginning of each call, and are incremented
93by 1 for each additional packet sent. Retransmissions in Rx are done
94on a packet-by-packet basis, identified by these sequence numbers.
95
96Every outgoing packet associated with a certain connection is stamped
97with a serial number in the serial field, and the serial number is
98incremented by 1 for every packet sent. This is used by the flow
99control mechanisms (described below). The serial number for a
100connection should start out with 1 (i.e., the first packet sent
101should have a serial number of 1.)
102
103Service ID identifies a particular Rx service running on a given
104host/port combination. This is analogous to how UDP port numbers
105allow multiplexing packets to a single IP address. Note that once
106an Rx connection has been created, the service ID may not be changed;
107existing implementations cache the service ID value for a given
108connection, and will ignore service ID values in subsequent packets.
109
110The Checksum field allows for an optional packet checksum. A zero
111checksum field value means that checksums are not being computed.
112An Rx security protocol (identified by the security field, described
113below) may choose to use this field to transport some checksum of
114the packet that is computed and verified by it (for example, rxkad
115uses this field for a cryptographic header checksum). Rx itself
116makes no use of the checksum field.
117
118The status field allows for additional user flags to be transported
119with each packet. These have no significance to the protocol itself.
120These flags are associated with a call rather than an individual
121packet.
122
123The security field specifies the type of security in use on this
124connection. These values don't have a defined mapping in the Rx
125protocol but rather are mapped to specific Rx security types by
126the application using Rx.
127
128An Rx security protocol can use the checksum field as described
129above, and can also modify the packet payload in any way, for
130instance by encrypting the contents or adding headers or trailers
131specific to the security protocol (although the end result must
132be a properly sized packet that Rx will be able to transmit.)
133
134The "Flags" field consists of a number of single-bit flags with
135meanings as follows. The actual bit values are defined below,
136in the ``Protocol Constants'' section.
137
138 * CLIENT-INITIATED
139 This packet originated from an Rx client (as opposed
140 to server). To avoid packet loops, a server should
141 always clear the CLIENT-INITIATED flag on any packets
142 it sends, and discard incoming packets without the
143 CLIENT-INITIATED flag.
144
145 * REQUEST-ACK
146 Sender is requesting acknowledgement of this packet,
147 via an Ack packet response.
148
149 * LAST-PACKET
150 This packet is the last packet in this call from the
151 sender.
152
153 NOTE: some older Rx implementations, which do not
154 support the trailing packet size fields in Rx Ack
155 packets, use the LAST-PACKET flag for computing the
156 MTU. In particular, when a DATA packet with the
157 REQUEST-ACK flag but without the LAST-PACKET flag
158 is received, the MTU is adjusted down to the size
159 of that packet.
160
161 * MORE-PACKETS
162 More packets are going to be following this one. This
163 flag is set on all but the last packet by the sender
164 transmitting a list of packets at once, for possible
165 optimization at the receiver end.
166
167 * SLOW-START-OK
168 In an ack packet, indicates that the sender of this
169 packet supports the slow-start mechanism, described
170 below under ``Flow Control''.
171
172 * JUMBO-PACKET
173 In a data packet, indicates that this packet is part
174 of a jumbogram, and is not the last one. See the
175 ``Jumbograms'' section below for more details.
176
177Packet Types
178============
179
180The "Type" field indicates the contents of this packet. Actual
181values are specified in the ``Protocol Constants'' section.
182This section describes the simpler packet types, and subsequent
183sections cover more complex packet types in more detail.
184
185Certain type packets are connection-only requests (that is, they
186are not associated with an RPC call). A connection-only request
187is indicated by a zero call number. Valid packet types in a
188connection-only context are Abort, Challenge, Response, Debug,
189Version, and the parameter exchange packet types. All other
190packets can only be used in the context of a call. Additionally,
191Abort can be used both in a connection and call context.
192
193The payload of the packet following the header depends on the
194type of the field, as follows:
195
196 * DATA type (Standard data packet)
197
198 The payload of a data packet is simply the Rx payload,
199 corresponding to the sequence number and call specified
200 in the header. The actual data that is transmitted in
201 Rx data packets is described below.
202
203 The receipt of a data packet by a client implicitly
204 acknowledges that the server has received and processed
205 all the packets that have been transmitted to it as
206 part of this call.
207
208 * ACK type (Acknowledgement of received data)
209
210 An acknowledgement packet provides information about
211 which packets were or were not received by the peer,
212 and other useful parameters. The semantics of these
213 packets are described below in the ``Call Layer''
214 section.
215
216 * BUSY type (Busy response)
217
218 When a client tries to start a new call on a channel
219 which the server still considers active, a busy response
220 is returned. The call and channel number in the packet
221 header indicate which call is being rejected. This packet
222 type has no payload associated with it.
223
224 * ABORT type (Abort packet)
225
226 Indicates that the relevant connection or call (if the
227 call number field is non-zero) has encountered an error
228 and has been terminated. The payload of the packet has
229 a network-byte-order 32-bit user error code.
230
231 * ACKALL type (Acknowledgement of all packets)
232
233 An acknowledge-all packet indicates the obvious: the peer
234 wants to acknowledge the receipt of all packets sent to
235 it. This could be used, for example, when a connection
236 is being closed and the client wants to ensure that no
237 retransmissions are attempted after it exits.
238
239 There is no payload associated with an acknowledge-all
240 packet.
241
242 * CHALLENGE, RESPONSE types (Challenge request/response)
243
244 The payload of the packet is security-layer-specific
245 data, and is used to authenticate an Rx connection.
246
247 Perhaps this should include a reference to some spec
248 on rxkad (or rxkad should just be added to this spec.)
249
250 * DEBUG type (Debug packet)
251
252 Rx supports an optional debugging interface; see the
253 ``Debugging'' section below for more details.
254
255 * PARAMS types (Parameter exchange)
256
257 These types were assigned in AFS 3.2 but never used for
258 anything, and therefore have no protocol significance
259 at this time.
260
261 * VERSION type (Get AFS version)
262
263 If a server receives a packet with a type value of 13, and
264 the client-initiated flag set, it should respond with a
265 65-byte payload containing a string that identifies the
266 version of AFS software it is running. The response should
267 not have the client-initiated flag set.
268
269 Nothing should respond to a version packet without the
270 client-initiated flag, to avoid infinite packet loops.
271
272Call Layer
273==========
274
275 The call layer provides a reliable data transport over an
276 Rx channel, and is used by the RPC layer to make Rx calls.
277 One of the most important pieces of the call layer is the
278 Rx acknowledgement packet. The acknowledgement packet is
279 used by Rx to determine when retransmissions are needed,
280 as well as determining the proper transmission / receiving
281 parameters to use (such as the transmit window size and
282 jumbogram length, described in more detail below).
283
284 A new call is established by the client simply sending a
285 data packet to the server on an available channel. Either
286 side can indicate that they have no more data to send by
287 setting the LAST-PACKET flag in their last Rx packet. The
288 call remains open until the upper layer informs Rx that it
289 is done with the call. (The upper layer in this case would
290 most likely be the Rx RPC layer.)
291
292 The structure of an Rx acknowledgement packet is described
293 in the Packet Formats section. We will refer to particular
294 fields of the acknowledgement packet here by names.
295
296 The <Buffer Space> field specifies the number of packets that
297 the sender of the acknowledgement is willing to provide for
298 receiving packets for this call. The sender, presumably,
299 should not send packets beyond the number specified here,
300 without receiving further acknowledgement allowing it.
301
302 The <Max Skew> field indicates the maximum packet skew that
303 the sender of this packet has seen for this call. If a
304 packet is received N packets later than expected (based
305 on the packet's serial number, i.e. if the last received
306 packet's serial number is N higher than this packet's),
307 then it is defined to have a skew of N. This can be used
308 to avoid retransmission because of packet reordering.
309
310 The <First Sequence> number specifies the sequence number of
311 the first packet that is being explicitly acknowledged (either
312 positively or negatively) by this packet. All packets with
313 sequence numbers smaller than this are implicitly acknowledged.
314
315 The <Reserved> field, previously used to indicate the previous
316 received packet, is no longer used. It should be set to zero
317 by the sender and not interpreted by the receiver.
318
319 The <Serial Number> field indicates the serial number of the
320 packet which has triggered this acknowledgement, or zero if there
321 is no such packet (i.e. the ack packet was delayed and should not
322 be used for round-trip time computation). The receiver should
323 note that any transmitted packets with a serial number less than
324 this, which are not acknowledged by this packet, are likely lost
325 or reordered. Thus, these packets should be retransmitted, after
326 a possible delay to allow for packet reordering (as measured by
327 packet skew).
328
329 The trailing fields after the variable-length acknowledgements
330 section are not always 32-bit aligned with respect to the packet,
331 and aren't always present. (Their presence depends on the Rx
332 version of the peer.) The maximum and recommended packet sizes
333 are, respectively, the largest possible packet size that the peer
334 is willing to accept from us, and the size of the packet they
335 would prefer to receive. In absence of these fields, it should
336 be assumed that the maximum allowed packet size is 1444 bytes.
337
338 The receive window size indicates the size of the ACK sender's
339 receive window, in packets. Its use is described below in
340 the "Flow Control" section. If this field is absent, the
341 implementation must assume a maximum window size of 15 packets;
342 older implementations that do not support this trailing field
343 only allow for a window of 15 packets.
344
345 The "Max Packets per Jumbogram" field indicates how many packets
346 the ACK sender is willing to receive in a jumbogram (also
347 described below). All packets in a jumbogram are always of the
348 same size (except the last one), regardless of the maximum and
349 recommended packet sizes described above.
350
351 The <Reason> field specifies a particular type of an ack packet.
352 Valid reason codes are specified in the ``Packet Formats and
353 Protocol Constants'' section; their meanings are as follows:
354
355 REQUESTED
356 Acknowledgement was requested. The peer received
357 a packet from us with the acknowledgement-requested
358 flag set, and is acknowledging it.
359
360 DUPLICATE
361 A duplicate packet was received. The duplicate
362 packet's serial number is in the <Serial> field.
363
364 OUT-OF-SEQUENCE
365 A packet was received out of sequence. The serial
366 number of said packet is in the <Serial> field.
367
368 WINDOW-EXCEEDED
369 A packet was received but exceeded the current
370 receive window, and was dropped.
371
372 NO-SPACE
373 A packet was received, but no buffer space was
374 available and therefore it was dropped.
375
376 PING
377 This is a keep-alive packet, used to verify that
378 the peer is still alive. If the REQUEST-ACK flag
379 in the Rx packet is set, the recipient of this
380 packet should reply with a PING-RESPONSE packet.
381
382 PING-RESPONSE
383 This is a response to a keep-alive ack (ping).
384
385 DELAYED
386 A delayed acknowledgement, usually because a certain
387 amount of time has passed since the receipt of the
388 last packet and there are outstanding unacknowledged
389 packets. Should not be used for RTT computation.
390
391 OTHER
392 Un-delayed general acknowledgement, which does not
393 fall in any of the above categories.
394
395 A peer should never delay the transmission of an ack packet
396 in response to a received packet unless it sets the delayed
397 ack type field. This is because ack packets (except for
398 delayed ones) are used for RTT computation by Rx.
399
400 All acknowledgement packets should have the REQUEST-ACK
401 flag in the Rx header turned off, except for PING type
402 ack packets.
403
404 The <Ack Count> field specifies the number of bytes following
405 in the acknowledgements section. Each of those bytes indicate
406 the acknowledgement status corresponding to a sequence number
407 between firstSequence and firstSequence+ackCount-1 inclusively.
408 There can be up to 255 bytes in the acknowledgements section.
409 Typically the ack count is the receive window size of the
410 ack packet sender, and the individual packet status bytes
411 correspond to the packets in the current receive window.
412 The values in each of those bytes can be as follows:
413
414 0 Explicit negative acknowledgement: packet with the
415 corresponding sequence number has not been received
416 or has been dropped.
417 1 Explicit acknowledgement: packet with the corresponding
418 sequence number has been received but not processed by
419 the application yet.
420
421 It's important to note the distinction between packets with
422 sequence numbers before firstSequence, between firstSequence
423 and firstSequence+ackCount-1, and those with sequence numbers
424 of at least firstSequence+ackCount. Those in the first category
425 have been passed up to the application level and the sender
426 (recipient of this ack) can recycle packets with such sequence
427 numbers.
428
429 Packets in the second category are individually acknowledged
430 in the acknowledgements section, either as being queued for
431 the application or not received. The recipient of the ack
432 should keep all packets with sequence numbers in this range,
433 but avoid retransmitting the positively acknowledged ones.
434 Negatively acknowledged packets should be retransmitted.
435 A more detailed explaination of the retransmit strategy is
436 given below.
437
438 Packets in the third category are not acknowledged at all,
439 and the recipient of the ack should assume no knowledge
440 of their state. Since the Rx receive window should not
441 exceed the size of an ack packet, the sender shouldn't
442 have transmitted any packets in this category anyway.
443
444 * Round-trip time computation
445
446 To determine when packet retransmission is necessary, Rx
447 computes some statistics about the round-trip time between
448 the two hosts: exponentially-decaying averages of the
449 round-trip time and the standard deviation thereof. Each
450 acknowledgement packet which mentions a specific packet in
451 the <Serial> field and is not delayed is used to update the
452 round-trip statistics. First, the round-trip time for this
453 packet (R) is computed as the difference between the arrival
454 time of the ack packet and the time we transmitted the
455 packet with the serial number specified in <Serial>.
456
457 Next, the round-trip time average and standard deviation
458 values are updated. For instance, this algorithm could
459 be used:
460
461 RTTdev = RTTdev * (3/4) + |RTTavg - R| / 4
462 RTTavg = RTTavg * (7/8) + R / 8
463
464 * Packet retransmission
465
466 In order to support reliable data transport, Rx must retransmit
467 packet which are lost in the network. This must not be done
468 too early, otherwise we might retransmit a packet whose first
469 copy is still in transit, thereby wasting bandwidth.
470
471 Rx computes a retransmit timeout value T, and retransmits any
472 packet which hasn't been positively acknowledged since last
473 transmission for at least T seconds. This timeout could be
474 computed as follows from the round-trip statistics above:
475
476 T = RTTavg + 4 * RTTdev + 0.350
477
478 This allows the packet to be up to 4 deviations late and still
479 not be retransmitted. The 350 msec fudge factor is used to
480 compensate for bursty networks, though it is likely becoming
481 less relevant (and accurate) with time.
482
483 A more clever algorithm could take into account the maximum
484 packet skew rate, and improve the retransmission strategy to
485 take into the account the likelihood that a given packet has
486 been reordered, and give it extra time before retransmission.
487
488 * Keepalive and Timeout
489
490 The upper layer (either the Rx RPC layer or the application)
491 have to specify a timeout, T, to the call layer. If the peer
492 is not heard from within T seconds, the call layer declares
493 the call to be dead and propagates the error to the upper
494 layer.
495
496 In order to determine whether the peer is still alive or not,
497 keepalive requests are used. These take form of an ack PING
498 and PING-RESPONSE packets. When the client has not received
499 any response from the server, either to the original request
500 or the keepalive requests, in T seconds, the call times out.
501
502 The following strategy may be used to determine when to send
503 keepalive requests:
504
505 Compute a keepalive timeout, KT = T/6
506
507 If the call was initiated KT seconds ago, or KT
508 seconds have passed since the last keepalive
509 request transmission, send a keepalive packet.
510
511 This strategy limits the number of transmitted keepalive
512 packets to a fixed number in the case of a dead server,
513 and proportional to the real timeout in case of a slow
514 server. It also allows up to 5 keepalives to be dropped
515 before the server is erroneously declared dead.
516
517 * Flow Control
518
519 Every Rx client or server has associated with each Rx call a
520 receive and transmit window. These windows indicate the number
521 of packets that haven't been fully acknowledged packets (that
522 is, not read by the peer's application) that an Rx sender can
523 have outstanding at any time. A sender's transmit window may
524 never be greater than it's peer's receive window for that call.
525 The receive windows are exchanged via the "Receive Window Size"
526 parameter in an Ack packet.
527
528 Rx ``sliding windows'' are similar to those used by TCP, except
529 they measure packets rather than bytes. Also, in TCP the window
530 effectively applies to bytes in flight between the two peers,
531 whileas in Rx the window applies to packets between the user
532 applications. For example, a transmit window of 8 on a certain
533 Rx connection means that at most 8 packets can be transmitted
534 and not yet read by the peer's application at any time. The
535 sequence number of the first packet that hasn't been read by
536 the application is indicated by the First Sequence field of
537 an Ack packet.
538
539 The selection of initial window sizes isn't strictly defined
540 by the Rx protocol, but here are a few things that one might
541 want to consider when choosing initial windows:
542
543 * A useful strategy can be to advertise a small receive
544 window until the application starts reading data, and
545 advertise a larger window afterwards.
546
547 * The transmit window should be initially a conservative
548 small value. Once an Ack packet is received, the peer's
549 advertised receive window can be used to choose a better
550 transmit window.
551
552 Rx uses the slow start, congestion avoidance, and fast recovery
553 algorithms[6]. The algorithms are modified to work in the context
554 of Rx packet-based transmission windows, and are described below.
555
556 These algorithms require two additional variables to be maintained
557 for each active Rx call: a congestion window, cwind, and a slow
558 start threshold, ssthresh.
559
560 Define a "negative ack" as an Ack packet that contains a negative
561 acknowledgement followed by a positive one. Similarly, define a
562 "positive ack" to be any Ack that is not negative. Upon receiving
563 three negative acks for a call in a row since the last congestion
564 avoidance attempt (if any), the Rx protocol enters congestion
565 avoidance for that Rx call.
566
567 * Slow start, congestion avoidance, and fast recovery algorithms
568
569 First, the congestion window, cwind, is initialized to 1.
570 The number of unread transmitted packets is now limited not
571 only by the transmission window, but also by the congestion
572 window. The latter limit is a little different: Rx may
573 send up to cwind packets (by sequence number) past the last
574 contiguous positively acknowledged packet. For example,
575 if an Ack packet indicates that packets 1, 2 and 8 were
576 received, and cwind is 2, Rx may transmit packets 3 and 4.
577
578 When congestion occurs (indicated by a negative ack or a
579 packet retransmission timeout), Rx enters congestion avoidance
580 and fast recovery. The slow-start threshold, ssthresh, is
581 set to half of the effective transmission window (minimum of
582 cwind and transmit window), but no less than 2 packets.
583
584 If triggered by a negative ack, any negatively acknowledged
585 packets should be retransmitted as soon as possible (i.e.
586 window-permitting).
587
588 If triggered by a retransmission timeout, the congestion
589 window is reset to a single packet.
590
591 When in fast-recovery mode, every additional negative ack
592 packet received causes cwind to be increased by one packet.
593 A positive ack packet causes cwind to be set to ssthresh,
594 and terminates fast recovery. At this point we are back
595 to congestion avoidance, since the cwind is half the original
596 transmission window.
597
598 When packet acknowledgements are received, the congestion
599 window should be increased. If cwind is less than ssthresh,
600 cwind should be increased by 1 for each newly acknowledged
601 packet. If cwind is at least ssthresh, cwind is increased
602 by 1 for each newly received Ack packet.
603
604 The size of the receive window should not grow past the size of
605 an Rx ack packet (which can acknowledge up to 255 packets at a
606 time.)
607
608Debugging
609=========
610
611Rx provides for an optional debugging interface, using the Debug AFS
612packet type, allowing remote Rx clients to query an Rx server for
613some Rx protocol statistics. Not all implementations are required
614to implement this interface. Some parts of this interface may also
615be specific to a particular implementation of Rx. In order to prevent
616packet loops, a server should only reply to debug packets with the
617client-initiated flag set.
618
619The payload of a debug request packet is always the same; both of
620the 32-bit quantities are in network byte order:
621
622 0 1 2 3
623 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
625 | Debug Type |
626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
627 | Debug Index |
628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
629
630The debug type indicates the kind of debug information being sent
631or requested, and determines the format of the rest of the packet.
632The debug index allows some debug types to export array-like data,
633indexed by this field. The following debug types are defined for
634the Transarc implementation:
635
636 0x01 Retrieve basic connection statistics
637 0x02 Get information about some connections
638 0x03 Get information about all connections
639 0x04 Get all Rx stats
640 0x05 Get all peers of this server
641
642The index field in the debug packet indicates which element of the
643debug information the client wants to access, in cases where there
644are multiple entries in question.
645
646The responses to each of those debug queries contain the following
647information:
648
6491. Retrieve basic connection stats
650
651 An array of general statistics about packet allocation,
652 server performance, and so on. The first octet in this
653 response represents the debug protocol version being used
654 by the server. See RX_DEBUGI_VERSION* in rx/rx.h.
655
6562, 3. Get information about connections
657
658 Both of these calls return a struct rx_debugConn (see
659 rx/rx.h), indexed by the "index" field.
660
661 The first version of the debug call (type 2) only retrieves
662 information about connections which are deemed interesting,
663 that is, connections which are active, or about to be
664 reaped.
665
666 The end of the list is signaled by a response where the
667 connection ID value is 0xFFFFFFFF.
668
6694. Get Rx stats
670
671 This call returns a struct rx_stats to the client in network
672 byte order, containing various statistics about the state of
673 Rx on the server (see rx/rx.h).
674
6755. Get all Rx peers
676
677 Similar to the connection request above (2, 3) this call
678 returns all the Rx peers of the server (in a network-byte-order
679 struct rx_debugPeer), indexed by the index field in the request.
680 End of list is indicated by a host value of 0xFFFFFFFF. (These
681 are the first 4 octets.)
682
683In response to unknown requests, the server returns 0xFFFFFFF8 in the
684debug type field.
685
686 XXX The response interface should probably be fixed
687 to include a fixed header that indicates whether
688 the request was successfully completed.
689
690Jumbograms
691==========
692
693To be able to transmit more data in a single packet, Rx supports
694``jumbograms'', which are single UDP datagrams containing multiple
695sequential Rx DATA packets. In a jumbogram, all packets except the
696last one must be of a fixed maximal size (1412 bytes). Because all
697the packets in the jumbogram are sequential, only one full header
698is needed. Here is what a jumbogram could look like:
699
700 +-----------+---------------+--------------+---------------+
701 | Rx header | 1412 byte pkt | Short header | 1412 byte pkt | ->
702 +-----------+---------------+--------------+---------------+
703
704 +--------------+- -+-----------------------+
705 -> | Short header | ... | <= 1412 byte last pkt |
706 +--------------+- -+-----------------------+
707
708Every Rx packet in a jumbogram except the first one must be preceeded
709by the short Rx header, and all packets except the last one must have
710the Jumbogram Rx flag set in their respective headers. The number of
711packets in a jumbogram may not exceed the peer's advertised Max Packets
712Per Jumbogram value in the Ack packet.
713
714The maximum number of packets per jumbogram should be assumed to be 1
715(i.e., no jumbograms) unless explicitly specified otherwise by an Ack
716packet. If an Ack packet is received without the packet-per-jumbogram
717field, it might indicate that the peer is now running a version of Rx
718that does not support jumbograms, and therefore no jumbograms should
719be sent until they are explicitly enabled again.
720
721The short header in a jumbogram has the following makeup:
722
723 0 1
724 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
726 | Flags | Reserved |
727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
728 | Checksum |
729 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
730
731All the packets in the jumbogram have the same Rx header fields
732(from the full Rx header) except for Flags, Checksum, Sequence,
733and Serial. The flags and checksum field for subsequent packets
734are taken from the short header preceeding that packet in the
735jumbogram. The sequence and serial numbers are assumed to be
736consecutive, and are incremented by 1 from the first packet in
737the jumbogram (ie the full Rx header).
738
739Retransmitted packets should not be sent in a jumbogram.
740
741RPC Layer
742=========
743
744This section discusses how an RPC call is made using the Rx protocol.
745There are two common ``types'' of Rx calls: simple and streaming.
746These mostly reflect a difference in the upper-level API rather than
747in the Rx protocol. A simple Rx call has a fixed number of input
748variables and a fixed number of output variables. A streaming Rx
749call, in addition to the above, allows the user to send and receive
750arbitrary amounts of data (whose length should be specified as a
751fixed-length argument.)
752
753In either case, an Rx call consists of two basic stages: client
754sending the data to the server, and server sending the response
755back to the client. No data can be sent by the client in the
756same call after the server has started sending its response.
757
758Each remote function call associated with a particular Rx service
759(identified by the IP-port-serviceId triplet, as mentioned above)
760is assigned a 32-bit integer opcode number. To make a simple Rx
761call, the caller must transmit the opcode number followed by the
762expected arguments for that call over an Rx channel using XDR
763encoding. The callee uses XDR to unmarshall the opcode and input
764arguments, performs a function call corresponding to that opcode
765and arguments, and then uses XDR to encode the return values back
766to the caller. The caller then uses XDR to receive the output
767variables.
768
769For streaming calls which send data from the caller to the callee,
770the convention is to include the length of the data to be sent as
771one of the fixed-length arguments, and send the variable-length
772data immediately after the fixed-length portion. For streaming
773calls which receive data, the convention is for the callee to first
774reply with a fixed-length field specifying the number of bytes it's
775about to send, and then send those bytes. Upon completion of the
776streaming part of the call, the output arguments are sent back to
777the caller in fixed-length XDR form, as with simple calls.
778
779Packet Formats and Protocol Constants
780=====================================
781
782 * Rx packet
783
784 Every simple Rx packet has an Rx header, of the form below.
785 All quantities are in network byte order.
786
787 0 1 2 3
788 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
790 |+| Connection Epoch |
791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
792 | Connection ID | * |
793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
794 | Call Number |
795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
796 | Sequence Number |
797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
798 | Serial Number |
799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
800 | Type | Flags | Status | Security |
801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
802 | Checksum | Service ID |
803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
804 | Payload ....
805 +-+-+-+-+-
806
807 [*] The field marked with * is the Channel ID. The last
808 two bits of the connection ID are used to multiplex
809 between 4 parallel calls.
810
811 [+] The bit marked with + is used to indicate that only
812 the connection ID should be used to identify this
813 connection, and sender host/port should not be used.
814
815 The values for the Flags field are defined as follows:
816
817 0000 0001 CLIENT-INITIATED
818 0000 0010 REQUEST-ACK
819 0000 0100 LAST-PACKET
820 0000 1000 MORE-PACKETS
821 0001 0000 - Reserved -
822 0010 0000 SLOW-START-OK
823 0010 0000 JUMBO-PACKET
824
825 Commonly, but not necessarily, the following value mappings
826 for the Security field are used:
827
828 0 No security or encryption
829 1 bcrypt security, only used in AFS 2.0
830 2 "krb4" rxkad
831 3 "krb4" rxkad with encryption (sometimes)
832
833 The following packet type values are defined:
834
835 1 DATA Standard data packet
836 2 ACK Acknowledgement of received data
837 3 BUSY Busy response
838 4 ABORT Abort packet
839 5 ACKALL Acknowledgement of all packets
840 6 CHALLENGE Challenge request
841 7 RESPONSE Challenge response
842 8 DEBUG Debug packet
843 9 PARAMS Exchange of parameters
844 10 PARAMS Exchange of parameters
845 11 PARAMS Exchange of parameters
846 12 PARAMS Exchange of parameters
847 13 VERSION Get AFS version
848
849 * Rx acknowledgement packet
850
851 0 1 2 3
852 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
854 | Buffer Space | Max Skew |
855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
856 | First Sequence |
857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
858 | Reserved |
859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
860 | Serial |
861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
862 | Reason | Ack Count | Acknowledgements ...
863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ..
864
865 ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
866 ... Acks | Reserved | Reserved |
867 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
868 | Maximum Packet Size |
869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
870 | Recommended Packet Size |
871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
872 | Receive Window Size |
873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
874 | Max Packets per Jumbogram |
875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
876
877 Note that the trailing fields can have arbitrary alignment,
878 determined by the number of individual acks in the packet.
879 There are three reserved octets between the variable acks
880 section and the start of the trailing fields; they also have
881 no particular alignment.
882
883 The valid values for the Reason code are:
884
885 1 REQUESTED
886 2 DUPLICATE
887 3 OUT-OF-SEQUENCE
888 4 WINDOW-EXCEEDED
889 5 NO-SPACE
890 6 PING
891 7 PING-RESPONSE
892 8 DELAYED
893 9 OTHER
894
895Acknowledgements
896================
897
898Jeffrey Hutzelman <jhutz@cmu.edu> reviewed an early draft of this
899specification, and provided much appreciated feedback on technical
900details as well as document structuring.
901
902Love Hornquist-Astrand <lha@stacken.kth.se> made many corrections
903to this specification, especially regarding backwards-compatibility
904with older Rx implementations.
905
906References
907==========
908
909 [1] /afs/sipb.mit.edu/contrib/doc/AFS/hijacking-afs.ps.gz
910
911 [2] OpenAFS: src/rx/
912
913 [3] /afs/sipb.mit.edu/contrib/doc/AFS/ps/rx-spec.ps
914
915 [4] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/r.vdoc
916
917 [5] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/rx.mss
918
919 [6] http://web.mit.edu/rfc/rfc2001.txt
920
921$Id: rx-spec,v 1.22 2002/10/20 06:46:00 kolya Exp $