sp-tcp-mapping-01.xml 9.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
  1. <?xml version="1.0" encoding="US-ASCII"?>
  2. <!DOCTYPE rfc SYSTEM "rfc2629.dtd">
  3. <rfc category="info" docName="sp-tcp-mapping-01">
  4. <front>
  5. <title abbrev="TCP mapping for SPs">
  6. TCP Mapping for Scalability Protocols
  7. </title>
  8. <author fullname="Martin Sustrik" initials="M." role="editor"
  9. surname="Sustrik">
  10. <address>
  11. <email>sustrik@250bpm.com</email>
  12. </address>
  13. </author>
  14. <date month="March" year="2014" />
  15. <area>Applications</area>
  16. <workgroup>Internet Engineering Task Force</workgroup>
  17. <keyword>TCP</keyword>
  18. <keyword>SP</keyword>
  19. <abstract>
  20. <t>This document defines the TCP mapping for scalability protocols.
  21. The main purpose of the mapping is to turn the stream of bytes
  22. into stream of messages. Additionally, the mapping provides some
  23. additional checks during the connection establishment phase.</t>
  24. </abstract>
  25. </front>
  26. <middle>
  27. <section title = "Underlying protocol">
  28. <t>This mapping should be layered directly on the top of TCP.</t>
  29. <t>There's no fixed TCP port to use for the communication. Instead, port
  30. numbers are assigned to individual services by the user.</t>
  31. </section>
  32. <section title = "Connection initiation">
  33. <t>As soon as the underlying TCP connection is established, both parties
  34. MUST send the protocol header (described in detail below) immediately.
  35. Both endpoints MUST then wait for the protocol header from the peer
  36. before proceeding on.</t>
  37. <t>The goal of this design is to keep connection establishment as
  38. fast as possible by avoiding any additional protocol handshakes,
  39. i.e. network round-trips. Specifically, the protocol headers
  40. can be bundled directly with to the last packets of TCP handshake
  41. and thus have virtually zero performance impact.</t>
  42. <t>The protocol header is 8 bytes long and looks like this:</t>
  43. <figure>
  44. <artwork>
  45. 0 1 2 3
  46. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  47. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  48. | 0x00 | 0x53 | 0x50 | version |
  49. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  50. | type | reserved |
  51. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  52. </artwork>
  53. </figure>
  54. <t>First four bytes of the protocol header are used to make sure that
  55. the peer's protocol is compatible with the protocol used by the local
  56. endpoint. Keep in mind that this protocol is designed to run on an
  57. arbitrary TCP port, thus the standard compatibility check -- if it runs
  58. on port X and protocol Y is assigned to X by IANA, it speaks protocol Y
  59. -- does not apply. We have to use an alternative mechanism.</t>
  60. <t>First four bytes of the protocol header MUST be set to 0x00, 0x53, 0x50
  61. and 0x00 respectively. If the protocol header received from the peer
  62. differs, the TCP connection MUST be closed immediately.</t>
  63. <t>The fact that the first byte of the protocol header is binary zero
  64. eliminates any text-based protocols that were accidentally connected
  65. to the endpoint. Subsequent two bytes make the check even more
  66. rigorous. At the same time they can be used as a debugging hint to
  67. indicate that the connection is supposed to use one of the scalability
  68. protocols -- ASCII representation of these bytes is 'SP' that can
  69. be easily spotted in when capturing the network traffic. Finally,
  70. the fourth byte rules out any incompatible versions of this
  71. protocol.</t>
  72. <t>Fifth and sixth bytes of the header form a 16-bit unsigned integer in
  73. network byte order representing the type of SP endpoint on the layer
  74. above. The value SHOULD NOT be interpreted by the mapping, rather
  75. the interpretation should be delegated to the scalability protocol
  76. above the mapping. For informational purposes, it should be noted that
  77. the field encodes information such as SP protocol ID, protocol version
  78. and the role of endpoint within the protocol. Individual values are
  79. assigned by IANA.</t>
  80. <t>Finally, the last two bytes of the protocol header are reserved for
  81. future use and must be set to binary zeroes. If the protocol header
  82. from the peer contains anything else than zeroes in this field, the
  83. implementation MUST close the underlying TCP connection.</t>
  84. </section>
  85. <section title = "Message delimitation">
  86. <t>Once the protocol header is accepted, endpoint can send and receive
  87. messages. Message is an arbitrarily large chunk of binary data. Every
  88. message starts with 64-bit unsigned integer in network byte order
  89. representing the size, in bytes, of the remaining part of the message.
  90. Thus, the message payload can be from 0 to 2^64-1 bytes long.
  91. The payload of the specified size follows directly after the size
  92. field:</t>
  93. <figure>
  94. <artwork>
  95. +------------+-----------------+
  96. | size (64b) | payload |
  97. +------------+-----------------+
  98. </artwork>
  99. </figure>
  100. <t>It may seem that 64 bit message size is excessive and consumes too much
  101. of valuable bandwidth, especially given that most scenarios call for
  102. relatively small messages, in order of bytes or kilobytes.</t>
  103. <t>Variable length field may seem like a better solution, however, our
  104. experience is that variable length size field doesn't provide any
  105. performance benefit in the real world.</t>
  106. <t>For large messages, 64 bits used by the field form a negligible portion
  107. of the message and the performance impact is not even measurable.</t>
  108. <t>For small messages, the overall throughput is heavily CPU-bound, never
  109. I/O-bound. In other words, CPU processing associated with each
  110. individual message limits the message rate in such a way that network
  111. bandwidth limit is never reached. In the future we expect it to be
  112. even more so: network bandwidth is going to grow faster than CPU speed.
  113. All in all, some performance improvement could be achieved using
  114. variable length size field with huge streams of very small messages
  115. on very slow networks. We consider that scenario to be a corner case
  116. that's almost never seen in a real world.</t>
  117. <t>On the other hand, it may be argued that limiting the messages to
  118. 2^64-1 bytes can prove insufficient in the future. However,
  119. extrapolating the message size growth size seen in the past indicates
  120. that 64 bit size should be sufficient for the expected lifetime of
  121. the protocol (30-50 years).</t>
  122. <t>Finally, it may be argued that chaining arbitrary number of smaller
  123. data chunks can yield unlimited message size. The downside of this
  124. approach is that the message payload cannot be continuous on the wire,
  125. it has to be interleaved with chunk headers. That typically requires
  126. one more copy of the data in the receiving part of the stack which
  127. may be a problem for very large messages.</t>
  128. </section>
  129. <section title = "Note on multiplexing">
  130. <t>Several modern general-purpose protocols built on top of TCP provide
  131. multiplexing capability, i.e. a way to transfer multiple independent
  132. message streams over a single TCP connection. This mapping deliberately
  133. opts to provide no such functionality. Instead, independent message
  134. streams should be implemented as different TCP connections. This
  135. section provides the rationale for the design decision.</t>
  136. <t>First of all, multiplexing is typically added to protocols to avoid
  137. the overhead of establishing additional TCP connections. This need
  138. arises in environments where the TCP connections are extremely
  139. short-lived, often used only for a single handshake between the peers.
  140. Scalability protocols, on the other hand, require long-lived
  141. connections which doesn't make the feature necessary.</t>
  142. <t>At the same time, multiplexing on top of TCP, while doable, is inferior
  143. to the real multiplexing done using multiple TCP connections.
  144. Specifically, TCP's head-of-line blocking feature means that a single
  145. lost TCP packet will hinder delivery for all the streams on the top of
  146. the connection, not just the one the missing packets belonged to.</t>
  147. <t>At the same time, implementing multiplexing is a non-trivial matter
  148. and results in increased development cost, more bugs and larger
  149. attack surface.</t>
  150. <t>Finally, for multiplexing to work properly, large messages have to be
  151. split into smaller data chunks interleaved by chunk headers, which
  152. makes receiving stack less efficient, as already discussed above.</t>
  153. </section>
  154. <section anchor="IANA" title="IANA Considerations">
  155. <t>This memo includes no request to IANA.</t>
  156. </section>
  157. <section anchor="Security" title="Security Considerations">
  158. <t>The mapping isn't intended to provide any additional security in
  159. addition to what TCP does. DoS concerns are addressed within
  160. the specification.</t>
  161. </section>
  162. </middle>
  163. </rfc>