sp-tcp-mapping-01.txt 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281
  1. Internet Engineering Task Force M. Sustrik, Ed.
  2. Internet-Draft
  3. Intended status: Informational March 2014
  4. Expires: September 2, 2014
  5. TCP Mapping for Scalability Protocols
  6. sp-tcp-mapping-01
  7. Abstract
  8. This document defines the TCP mapping for scalability protocols. The
  9. main purpose of the mapping is to turn the stream of bytes into
  10. stream of messages. Additionally, the mapping provides some
  11. additional checks during the connection establishment phase.
  12. Status of This Memo
  13. This Internet-Draft is submitted in full conformance with the
  14. provisions of BCP 78 and BCP 79.
  15. Internet-Drafts are working documents of the Internet Engineering
  16. Task Force (IETF). Note that other groups may also distribute
  17. working documents as Internet-Drafts. The list of current Internet-
  18. Drafts is at http://datatracker.ietf.org/drafts/current/.
  19. Internet-Drafts are draft documents valid for a maximum of six months
  20. and may be updated, replaced, or obsoleted by other documents at any
  21. time. It is inappropriate to use Internet-Drafts as reference
  22. material or to cite them other than as "work in progress."
  23. This Internet-Draft will expire on September 2, 2014.
  24. Copyright Notice
  25. Copyright (c) 2014 IETF Trust and the persons identified as the
  26. document authors. All rights reserved.
  27. This document is subject to BCP 78 and the IETF Trust's Legal
  28. Provisions Relating to IETF Documents
  29. (http://trustee.ietf.org/license-info) in effect on the date of
  30. publication of this document. Please review these documents
  31. carefully, as they describe your rights and restrictions with respect
  32. to this document. Code Components extracted from this document must
  33. include Simplified BSD License text as described in Section 4.e of
  34. the Trust Legal Provisions and are provided without warranty as
  35. described in the Simplified BSD License.
  36. Sustrik Expires September 2, 2014 [Page 1]
  37. Internet-Draft TCP mapping for SPs March 2014
  38. 1. Underlying protocol
  39. This mapping should be layered directly on the top of TCP.
  40. There's no fixed TCP port to use for the communication. Instead,
  41. port numbers are assigned to individual services by the user.
  42. 2. Connection initiation
  43. As soon as the underlying TCP connection is established, both parties
  44. MUST send the protocol header (described in detail below)
  45. immediately. Both endpoints MUST then wait for the protocol header
  46. from the peer before proceeding on.
  47. The goal of this design is to keep connection establishment as fast
  48. as possible by avoiding any additional protocol handshakes, i.e.
  49. network round-trips. Specifically, the protocol headers can be
  50. bundled directly with to the last packets of TCP handshake and thus
  51. have virtually zero performance impact.
  52. The protocol header is 8 bytes long and looks like this:
  53. 0 1 2 3
  54. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  55. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  56. | 0x00 | 0x53 | 0x50 | version |
  57. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  58. | type | reserved |
  59. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  60. First four bytes of the protocol header are used to make sure that
  61. the peer's protocol is compatible with the protocol used by the local
  62. endpoint. Keep in mind that this protocol is designed to run on an
  63. arbitrary TCP port, thus the standard compatibility check -- if it
  64. runs on port X and protocol Y is assigned to X by IANA, it speaks
  65. protocol Y -- does not apply. We have to use an alternative
  66. mechanism.
  67. First four bytes of the protocol header MUST be set to 0x00, 0x53,
  68. 0x50 and 0x00 respectively. If the protocol header received from the
  69. peer differs, the TCP connection MUST be closed immediately.
  70. The fact that the first byte of the protocol header is binary zero
  71. eliminates any text-based protocols that were accidentally connected
  72. to the endpoint. Subsequent two bytes make the check even more
  73. rigorous. At the same time they can be used as a debugging hint to
  74. indicate that the connection is supposed to use one of the
  75. scalability protocols -- ASCII representation of these bytes is 'SP'
  76. Sustrik Expires September 2, 2014 [Page 2]
  77. Internet-Draft TCP mapping for SPs March 2014
  78. that can be easily spotted in when capturing the network traffic.
  79. Finally, the fourth byte rules out any incompatible versions of this
  80. protocol.
  81. Fifth and sixth bytes of the header form a 16-bit unsigned integer in
  82. network byte order representing the type of SP endpoint on the layer
  83. above. The value SHOULD NOT be interpreted by the mapping, rather
  84. the interpretation should be delegated to the scalability protocol
  85. above the mapping. For informational purposes, it should be noted
  86. that the field encodes information such as SP protocol ID, protocol
  87. version and the role of endpoint within the protocol. Individual
  88. values are assigned by IANA.
  89. Finally, the last two bytes of the protocol header are reserved for
  90. future use and must be set to binary zeroes. If the protocol header
  91. from the peer contains anything else than zeroes in this field, the
  92. implementation MUST close the underlying TCP connection.
  93. 3. Message delimitation
  94. Once the protocol header is accepted, endpoint can send and receive
  95. messages. Message is an arbitrarily large chunk of binary data.
  96. Every message starts with 64-bit unsigned integer in network byte
  97. order representing the size, in bytes, of the remaining part of the
  98. message. Thus, the message payload can be from 0 to 2^64-1 bytes
  99. long. The payload of the specified size follows directly after the
  100. size field:
  101. +------------+-----------------+
  102. | size (64b) | payload |
  103. +------------+-----------------+
  104. It may seem that 64 bit message size is excessive and consumes too
  105. much of valuable bandwidth, especially given that most scenarios call
  106. for relatively small messages, in order of bytes or kilobytes.
  107. Variable length field may seem like a better solution, however, our
  108. experience is that variable length size field doesn't provide any
  109. performance benefit in the real world.
  110. For large messages, 64 bits used by the field form a negligible
  111. portion of the message and the performance impact is not even
  112. measurable.
  113. For small messages, the overall throughput is heavily CPU-bound,
  114. never I/O-bound. In other words, CPU processing associated with each
  115. individual message limits the message rate in such a way that network
  116. bandwidth limit is never reached. In the future we expect it to be
  117. Sustrik Expires September 2, 2014 [Page 3]
  118. Internet-Draft TCP mapping for SPs March 2014
  119. even more so: network bandwidth is going to grow faster than CPU
  120. speed. All in all, some performance improvement could be achieved
  121. using variable length size field with huge streams of very small
  122. messages on very slow networks. We consider that scenario to be a
  123. corner case that's almost never seen in a real world.
  124. On the other hand, it may be argued that limiting the messages to
  125. 2^64-1 bytes can prove insufficient in the future. However,
  126. extrapolating the message size growth size seen in the past indicates
  127. that 64 bit size should be sufficient for the expected lifetime of
  128. the protocol (30-50 years).
  129. Finally, it may be argued that chaining arbitrary number of smaller
  130. data chunks can yield unlimited message size. The downside of this
  131. approach is that the message payload cannot be continuous on the
  132. wire, it has to be interleaved with chunk headers. That typically
  133. requires one more copy of the data in the receiving part of the stack
  134. which may be a problem for very large messages.
  135. 4. Note on multiplexing
  136. Several modern general-purpose protocols built on top of TCP provide
  137. multiplexing capability, i.e. a way to transfer multiple independent
  138. message streams over a single TCP connection. This mapping
  139. deliberately opts to provide no such functionality. Instead,
  140. independent message streams should be implemented as different TCP
  141. connections. This section provides the rationale for the design
  142. decision.
  143. First of all, multiplexing is typically added to protocols to avoid
  144. the overhead of establishing additional TCP connections. This need
  145. arises in environments where the TCP connections are extremely short-
  146. lived, often used only for a single handshake between the peers.
  147. Scalability protocols, on the other hand, require long-lived
  148. connections which doesn't make the feature necessary.
  149. At the same time, multiplexing on top of TCP, while doable, is
  150. inferior to the real multiplexing done using multiple TCP
  151. connections. Specifically, TCP's head-of-line blocking feature means
  152. that a single lost TCP packet will hinder delivery for all the
  153. streams on the top of the connection, not just the one the missing
  154. packets belonged to.
  155. At the same time, implementing multiplexing is a non-trivial matter
  156. and results in increased development cost, more bugs and larger
  157. attack surface.
  158. Sustrik Expires September 2, 2014 [Page 4]
  159. Internet-Draft TCP mapping for SPs March 2014
  160. Finally, for multiplexing to work properly, large messages have to be
  161. split into smaller data chunks interleaved by chunk headers, which
  162. makes receiving stack less efficient, as already discussed above.
  163. 5. IANA Considerations
  164. This memo includes no request to IANA.
  165. 6. Security Considerations
  166. The mapping isn't intended to provide any additional security in
  167. addition to what TCP does. DoS concerns are addressed within the
  168. specification.
  169. Author's Address
  170. Martin Sustrik (editor)
  171. Email: sustrik@250bpm.com
  172. Sustrik Expires September 2, 2014 [Page 5]