sp-request-reply-01.txt 35 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840
  1. Internet Engineering Task Force M. Sustrik, Ed.
  2. Internet-Draft
  3. Intended status: Informational August 2013
  4. Expires: February 2, 2014
  5. Request/Reply Scalability Protocol
  6. sp-request-reply-01
  7. Abstract
  8. This document defines a scalability protocol used for distributing
  9. processing tasks among arbitrary number of stateless processing nodes
  10. and returning the results of the processing.
  11. Status of This Memo
  12. This Internet-Draft is submitted in full conformance with the
  13. provisions of BCP 78 and BCP 79.
  14. Internet-Drafts are working documents of the Internet Engineering
  15. Task Force (IETF). Note that other groups may also distribute
  16. working documents as Internet-Drafts. The list of current Internet-
  17. Drafts is at http://datatracker.ietf.org/drafts/current/.
  18. Internet-Drafts are draft documents valid for a maximum of six months
  19. and may be updated, replaced, or obsoleted by other documents at any
  20. time. It is inappropriate to use Internet-Drafts as reference
  21. material or to cite them other than as "work in progress."
  22. This Internet-Draft will expire on February 2, 2014.
  23. Copyright Notice
  24. Copyright (c) 2013 IETF Trust and the persons identified as the
  25. document authors. All rights reserved.
  26. This document is subject to BCP 78 and the IETF Trust's Legal
  27. Provisions Relating to IETF Documents
  28. (http://trustee.ietf.org/license-info) in effect on the date of
  29. publication of this document. Please review these documents
  30. carefully, as they describe your rights and restrictions with respect
  31. to this document. Code Components extracted from this document must
  32. include Simplified BSD License text as described in Section 4.e of
  33. the Trust Legal Provisions and are provided without warranty as
  34. described in the Simplified BSD License.
  35. Sustrik Expires February 2, 2014 [Page 1]
  36. Internet-Draft Request/Reply SP August 2013
  37. 1. Introduction
  38. One of the most common problems in distributed applications is how to
  39. delegate a work to another processing node and get the result back to
  40. the original node. In other words, the goal is to utilise the CPU
  41. power of a remote node.
  42. There's a wide range of RPC systems addressing the problem, however,
  43. instead of relying on simple RPC algorithm, we will aim at solving a
  44. more general version of the problem. First, we want to issue
  45. processing requests from multiple clients, not just a single one.
  46. Second, we want to distribute the tasks to any number processing
  47. nodes instead of a single one so that the processing can be scaled up
  48. by adding new processing nodes as necessary.
  49. Solving the generalised problem requires that the algorithm executing
  50. the task in question -- also known as "service" -- is stateless.
  51. To put it simply, the service is called "stateless" when there's no
  52. way for the user to distinguish whether a request was processed by
  53. one instance of the service or another one.
  54. So, for example, a service which accepts two integers and multiplies
  55. them is stateless. Request for "2x2" is always going to produce "4",
  56. no matter what instance of the service have computed it.
  57. Service that accepts empty requests and produces the number of
  58. requests processed so far (1, 2, 3 etc.), on the other hand, is not
  59. stateless. To prove it you can run two instances of the service.
  60. First reply, no matter which instance produces it is going to be 1.
  61. Second reply though is going to be either 2 (if processed by the same
  62. instance as the first one) or 1 (if processed by the other instance).
  63. You can distinguish which instance produced the result. Thus,
  64. according to the definition, the service is not stateless.
  65. Despite the name, being "stateless" doesn't mean that the service has
  66. no state at all. Rather it means that the service doesn't retain any
  67. business-logic-related state in-between processing two subsequent
  68. requests. The service is, of course, allowed to have state while
  69. processing a single request. It can also have state that is
  70. unrelated to its business logic, say statistics about the processing
  71. that are used for administrative purposes and never returned to the
  72. clients.
  73. Also note that "stateless" doesn't necessarily mean "fully
  74. deterministic". For example, a service that generates random numbers
  75. is non-deterministic. However, the client, after receiving a new
  76. Sustrik Expires February 2, 2014 [Page 2]
  77. Internet-Draft Request/Reply SP August 2013
  78. random number cannot tell which instance has produced it, thus, the
  79. service can be considered stateless.
  80. While stateless services are often implemented by passing the entire
  81. state inside the request, they are not required to do so. Especially
  82. when the state is large, passing it around in each request may be
  83. impractical. In such cases, it's typically just a reference to the
  84. state that's passed in the request, such as ID or path. The state
  85. itself can then be retrieved by the service from a shared database, a
  86. network file system or similar storage mechanism.
  87. Requiring services to be stateless serves a specific purpose. It
  88. allows for using any number of service instances to handle the
  89. processing load. After all, the client won't be able to tell the
  90. difference between replies from instance A and replies from instance
  91. B. You can even start new instances on the fly and get away with it.
  92. The client still won't be able to tell the difference. In other
  93. words, statelessness is a prerequisite to make your service cluster
  94. fully scalable.
  95. Once it is ensured that the service is stateless there are several
  96. topologies for a request/reply system to form. What follows are the
  97. most common:
  98. 1. One client sends a request to one server and gets a reply. The
  99. common RPC scenario.
  100. 2. Many clients send requests to one server and get replies. The
  101. classic client/server model. Think of a database server and
  102. database clients. Alternatively think of a messaging broker and
  103. messaging clients.
  104. 3. One client send requests to many servers and gets replies. The
  105. load-balancer model. Think of HTTP load balancers.
  106. 4. Many clients send requests to be processed by many servers. The
  107. "enterprise service bus" model. In the simplest case the bus can
  108. be implemented as a simple hub-and-spokes topology. In complex
  109. cases the bus can span multiple physical locations or multiple
  110. organisations with intermediate nodes at the boundaries
  111. connecting different parts of the topology.
  112. In addition to distributing tasks to processing nodes, request/reply
  113. model comes with full end-to-end reliability. The reliability
  114. guarantee can be defined as follows: As long as the client is alive
  115. and there's at least one server accessible from the client, the task
  116. will eventually get processed and the result will be delivered back
  117. to the client.
  118. Sustrik Expires February 2, 2014 [Page 3]
  119. Internet-Draft Request/Reply SP August 2013
  120. End-to-end reliability is achieved, similar to TCP, by re-sending the
  121. request if the client believes the original instance of the request
  122. has failed. Typically, request is believed to have failed when
  123. there's no reply received within a specified time.
  124. Note that, unlike with TCP, the reliability algorithm is resistant to
  125. a server failure. Even if server fails while processing a request,
  126. the request will be re-sent and eventually processed by a different
  127. instance of the server.
  128. As can be seen from the above, one request may be processed multiple
  129. times. For example, reply may be lost on its way back to the client.
  130. Client will assume that the request was not processed yet, it will
  131. resend it and thus cause duplicate execution of the task.
  132. Some applications may want to prevent duplicate execution of tasks.
  133. It often turns out that hardening such applications to be idempotent
  134. is relatively easy as they already possess the tools to do so. For
  135. example, a payment processing server already has access to a shared
  136. database which it can use to verify that the payment with specified
  137. ID was not yet processed.
  138. On the other hand, many applications don't care about occasional
  139. duplicate processed tasks. Therefore, request/reply protocol does
  140. not require the service to be idempotent. Instead, the idempotence
  141. issue is left to the user to decide on.
  142. Finally, it should be noted that this specification discusses several
  143. features that are of little use in simple topologies and are rather
  144. aimed at large, geographically or organisationally distributed
  145. topologies. Features like channel prioritisation and loop avoidance
  146. fall into this category.
  147. 2. Underlying protocol
  148. The request/reply protocol can be run on top of any SP mapping, such
  149. as, for example, SP TCPmapping [SPoverTCP].
  150. Also, given that SP protocols describe the behaviour of entire
  151. arbitrarily complex topology rather than of a single node-to-node
  152. communication, several underlying protocols can be used in parallel.
  153. For example, a client may send a request via WebSocket, then, on the
  154. edge of the company network an intermediary node may retransmit it
  155. using TCP etc.
  156. Sustrik Expires February 2, 2014 [Page 4]
  157. Internet-Draft Request/Reply SP August 2013
  158. +---+ WebSocket +---+ TCP +---+
  159. | |-------------| |-----------| |
  160. +---+ +---+ +---+
  161. | |
  162. +---+ IPC | | SCTP +---+ DCCP +---+
  163. | |---------+ +--------| |-----------| |
  164. +---+ +---+ +---+
  165. 3. Overview of the algorithm
  166. Request/reply protocol defines two different endpoint types: The
  167. requester or REQ (the client) and the replier or REP (the service).
  168. REQ endpoint can be connected only to a REP endpoint. REP endpoint
  169. can be connected only to the REQ endpoint. If the underlying
  170. protocol indicates that there's an attempt to create a channel to an
  171. incompatible endpoint, the channel MUST NOT be used. In the case of
  172. TCP mapping, for example, the underlying TCP connection MUST be
  173. closed.
  174. When creating more complex topologies, REQ and REP endpoints are
  175. paired in the intermediate nodes to form a forwarding component, so
  176. called "device". Device receives requests from the REP endpoint and
  177. forwards them to the REQ endpoint. At the same time it receives
  178. replies from the REQ endpoint and forwards them to the REP endpoint:
  179. --- requests -->
  180. +-----+ +-----+-----+ +-----+-----+ +-----+
  181. | |-->| | |-->| | |-->| |
  182. | REQ | | REP | REQ | | REP | REQ | | REP |
  183. | |<--| | |<--| | |<--| |
  184. +-----+ +-----+-----+ +-----+-----+ +-----+
  185. <-- replies ---
  186. Using devices, arbitrary complex topologies can be built. The rest
  187. of this section explains how are the requests routed through a
  188. topology towards processing nodes and how are replies routed back
  189. from processing nodes to the original clients, as well as how the
  190. reliability is achieved.
  191. The idea for routing requests is to implement a simple coarse-grained
  192. scheduling algorithm based on pushback capabilities of the underlying
  193. transport.
  194. Sustrik Expires February 2, 2014 [Page 5]
  195. Internet-Draft Request/Reply SP August 2013
  196. The algorithm works by interpreting pushback on a particular channel
  197. as "the part of topology accessible through this channel is busy at
  198. the moment and doesn't accept any more requests."
  199. Thus, when a node is about to send a request, it can choose to send
  200. it only to one of the channels that don't report pushback at the
  201. moment. To implement approximately fair distribution of the workload
  202. the node choses a channel from that pool using the round-robin
  203. algorithm.
  204. As for delivering replies back to the clients, it should be
  205. understood that the client may not be directly accessible (say using
  206. TCP/IP) from the processing node. It may be beyond a firewall, have
  207. no static IP address etc. Furthermore, the client and the processing
  208. may not even speak the same transport protocol -- imagine client
  209. connecting to the topology using WebSockets and processing node via
  210. SCTP.
  211. Given the above, it becomes obvious that the replies must be routed
  212. back through the existing topology rather than directly. In fact,
  213. request/reply topology may be thought of as an overlay network on the
  214. top of underlying transport mechanisms.
  215. As for routing replies within the request/topology, it is designed in
  216. such a way that each reply contains the whole routing path, rather
  217. than containing just the address of destination node, as is the case
  218. with, for example, TCP/IP.
  219. The downside of the design is that replies are a little bit longer
  220. and that is in intermediate node gets restarted, all the requests
  221. that were routed through it will fail to complete and will have to be
  222. resent by request/reply end-to-end reliability mechanism.
  223. The upside, on the other hand, is that the nodes in the topology
  224. don't have to maintain any routing tables beside the simple table of
  225. adjacent channels along with their IDs. There's also no need for any
  226. additional protocols for distributing routing information within the
  227. topology.
  228. The most important reason for adopting the design though is that
  229. there's no propagation delay and any nodes becomes accessible
  230. immediately after it is started. Given that some nodes in the
  231. topology may be extremely short-lived this is a crucial requirement.
  232. Imagine a database client that sends a query, reads the result and
  233. terminates. It makes no sense to delay the whole process until the
  234. routing tables are synchronised between the client and the server.
  235. Sustrik Expires February 2, 2014 [Page 6]
  236. Internet-Draft Request/Reply SP August 2013
  237. The algorithm thus works as follows: When request is routed from the
  238. client to the processing node, every REP endpoint determines which
  239. channel it was received from and adds the ID of the channel to the
  240. request. Thus, when the request arrives at the ultimate processing
  241. node it already contains a full backtrace stack, which in turn
  242. contains all the info needed to route a message back to the original
  243. client.
  244. After processing the request, the processing node attaches the
  245. backtrace stack from the request to the reply and sends it back to
  246. the topology. At that point every REP endpoint can check the
  247. traceback and determine which channel it should send the reply to.
  248. In addition to routing, request/reply protocol takes care of
  249. reliability, i.e. ensures that every request will be eventually
  250. processed and the reply will be delivered to the user, even when
  251. facing failures of processing nodes, intermediate nodes and network
  252. infrastructure.
  253. Reliability is achieved by simply re-sending the request, if the
  254. reply is not received with a certain timeframe. To make that
  255. algorithm work flawlessly, the client has to be able to filter out
  256. any stray replies (delayed replies for the requests that we've
  257. already received reply to).
  258. The client thus adds an unique request ID to the request. The ID
  259. gets copied from the request to the reply by the processing node.
  260. When the reply gets back to the client, it can simply check whether
  261. the request in question is still being processed and if not so, it
  262. can ignore the reply.
  263. To implement all the functionality described above, messages (both
  264. requests and replies have the following format:
  265. +-+------------+-+------------+ +-+------------+-------------+
  266. |0| Channel ID |0| Channel ID |...|1| Request ID | payload |
  267. +-+------------+-+------------+ +-+------------+ ------------+
  268. Payload of the message is preceded by a stack of 32-bit tags. The
  269. most significant bit of each tag is set to 0 except for the very last
  270. tag. That allows the algorithm to find out where the tags end and
  271. where the message payload begins.
  272. As for the remaining 31 bits, they are either request ID (in the last
  273. tag) or a channel ID (in all the remaining tags). The first channel
  274. ID is added and processed by the REP endpoint closest to the
  275. processing node. The last channel ID is added and processed by the
  276. REP endpoint closest to the client.
  277. Sustrik Expires February 2, 2014 [Page 7]
  278. Internet-Draft Request/Reply SP August 2013
  279. Following picture shows an example of request saying "Hello" being
  280. routed from the client through two intermediate nodes to the
  281. processing node and the reply "World" being routed back. It shows
  282. what messages are passed over the network at each step of the
  283. process:
  284. client
  285. Hello | World
  286. | +-----+ ^
  287. | | REQ | |
  288. V +-----+ |
  289. 1|823|Hello | 1|823|World
  290. | +-----+ ^
  291. | | REP | |
  292. | +-----+ |
  293. | | REQ | |
  294. V +-----+ |
  295. 0|299|1|823|Hello | 0|299|1|823|World
  296. | +-----+ ^
  297. | | REP | |
  298. | +-----+ |
  299. | | REQ | |
  300. V +-----+ |
  301. 0|446|0|299|1|823|Hello | 0|446|0|299|1|823|World
  302. | +-----+ ^
  303. | | REP | |
  304. V +-----+ |
  305. Hello | World
  306. service
  307. 4. Hop-by-hop vs. End-to-end
  308. All endpoints implement so called "hop-by-hop" functionality. It's
  309. the functionality concerned with sending messages to the immediately
  310. adjacent components and receiving messages from them.
  311. In addition to that, the endpoints on the edge of the topology
  312. implement so called "end-to-end" functionality that is concerned with
  313. issues such as, for example, reliability.
  314. Sustrik Expires February 2, 2014 [Page 8]
  315. Internet-Draft Request/Reply SP August 2013
  316. end to end
  317. +-----------------------------------------+
  318. | |
  319. +-----+ +-----+-----+ +-----+-----+ +-----+
  320. | |-->| | |-->| | |-->| |
  321. | REQ | | REP | REQ | | REP | REQ | | REP |
  322. | |<--| | |<--| | |<--| |
  323. +-----+ +-----+-----+ +-----+-----+ +-----+
  324. | | | | | |
  325. +---------+ +---------+ +---------+
  326. hop by hop hop by hop hop by hop
  327. To make an analogy with the TCP/IP stack, IP provides hop-by-hop
  328. functionality, i.e. routing of the packets to the adjacent node,
  329. while TCP implements end-to-end functionality such resending of lost
  330. packets.
  331. As a rule of thumb, raw hop-by-hop endpoints are used to build
  332. devices (intermediary nodes in the topology) while end-to-end
  333. endpoints are used directly by the applications.
  334. To prevent confusion, the specification of the endpoint behaviour
  335. below will discuss hop-by-hop and end end-to-end functionality in
  336. separate chapters.
  337. 5. Hop-by-hop functionality
  338. 5.1. REQ endpoint
  339. The REQ endpoint is used by the user to send requests to the
  340. processing nodes and receive the replies afterwards.
  341. When user asks REQ endpoint to send a request, the endpoint should
  342. send it to one of the associated outbound channels (TCP connections
  343. or similar). The request sent is exactly the message supplied by the
  344. user. REQ socket MUST NOT modify an outgoing request in any way.
  345. If there's no channel to send the request to, the endpoint won't send
  346. the request and MUST report the backpressure condition to the user.
  347. For example, with BSD socket API, backpressure is reported as EAGAIN
  348. error.
  349. If there are associated channels but none of them is available for
  350. sending, i.e. all of them are already reporting backpressure, the
  351. endpoint won't send the message and MUST report the backpressure
  352. condition to the user.
  353. Sustrik Expires February 2, 2014 [Page 9]
  354. Internet-Draft Request/Reply SP August 2013
  355. Backpressure is used as a means to redirect the requests from the
  356. congested parts of the topology to to the parts that are still
  357. responsive. It can be thought of as a crude scheduling algorithm.
  358. However crude though, it's probably still the best you can get
  359. without knowing estimates of execution time for individual tasks, CPU
  360. capacity of individual processing nodes etc.
  361. Alternatively, backpressure can be thought of as a congestion control
  362. mechanism. When all available processing nodes are busy, it slows
  363. down the client application, i.e. it prevents the user from sending
  364. any more requests.
  365. If the channel is not capable of reporting backpressure (e.g. DCCP)
  366. the endpoint SHOULD consider it as always available for sending new
  367. request. However, such channels should be used with care as when the
  368. congestion hits they may suck in a lot of requests just to discard
  369. them silently and thus cause re-transmission storms later on. The
  370. implementation of the REQ endpoint MAY choose to prohibit the use of
  371. such channels altogether.
  372. When there are multiple channels available for sending the request
  373. endpoint MAY use any prioritisation mechanism to decide which channel
  374. to send the request to. For example, it may use classic priorities
  375. attached to channels and send message to the channel with the highest
  376. priority. That allows for routing algorithms such as: "Use local
  377. processing nodes if any are available. Send the requests to remote
  378. nodes only if there are no local ones available." Alternatively, the
  379. endpoint may implement weighted priorities ("send 20% of the request
  380. to node A and 80% to node B). The endpoint also may not implement
  381. any prioritisation strategy and treat all channels as equal.
  382. Whatever the case, two rules must apply.
  383. First, by default the priority settings for all channels MUST be
  384. equal. Creating a channel with different priority MUST be triggered
  385. by an explicit action by the user.
  386. Second, if there are several channels with equal priority, the
  387. endpoint MUST distribute the messages among them in fair fashion
  388. using round-robin algorithm. The round-robin implementation MUST
  389. also take care not to become unfair when new channels are added or
  390. old ones are removed on the fly.
  391. As for incoming messages, i.e. replies, REQ endpoint MUST fair-queues
  392. them. In other words, if there are replies available on several
  393. channels, it MUST receive them in a round-robin fashion. It must
  394. also take care not to compromise the fairness when new channels are
  395. added or old ones removed.
  396. Sustrik Expires February 2, 2014 [Page 10]
  397. Internet-Draft Request/Reply SP August 2013
  398. In addition to providing basic fairness, the goal of fair-queueing is
  399. to prevent DoS attacks where a huge stream of fake replies from one
  400. channel would be able to block the real replies coming from different
  401. channels. Fair queueing ensures that messages from every channel are
  402. received at approximately the same rate. That way, DoS attack can
  403. slow down the system but it can't entirely block it.
  404. Incoming replies MUST be handed to the user exactly as they were
  405. received. REQ endpoint MUST not modify the replies in any way.
  406. 5.2. REP endpoint
  407. REP endpoint is used to receive requests from the clients and send
  408. replies back to the clients.
  409. First of all, REP socket is responsible for assigning unique 31-bit
  410. channel IDs to the individual associated channels.
  411. First ID assigned MUST be random. Next is computed by adding 1 to
  412. the previous one with potential overflow to 0.
  413. The implementation MUST ensure that the random number is different
  414. each time the endpoint is re-started, the process that contains it is
  415. restarted or similar. So, for example, using pseudo-random generator
  416. with a constant seed won't do.
  417. The goal of the algorithm is to the spread of possible channel ID
  418. values and thus minimise the chance that a reply is routed to an
  419. unrelated channel, even in the face of intermediate node failures.
  420. When receiving a message, REP endpoint MUST fair-queue among the
  421. channels available for receiving. In other words it should round-
  422. robin among such channels and receive one request from a channel at a
  423. time. It MUST also implement the round-robin algorithm is such a way
  424. that adding or removing channels don't break its fairness.
  425. In addition to guaranteeing basic fairness in access to computing
  426. resources the above algorithm makes it impossible for a malevolent or
  427. misbehaving client to completely block the processing of requests
  428. from other clients by issuing steady stream of requests.
  429. After getting hold on the request, the REP socket should prepend it
  430. by 32 bit value, consisting of 1 bit set to 0 followed by the 31-bit
  431. ID of the channel the request was received from. The extended
  432. request will be then handed to the user.
  433. The goal of adding the channel ID to the request is to be able to
  434. route the reply back to the original channel later on. Thus, when
  435. Sustrik Expires February 2, 2014 [Page 11]
  436. Internet-Draft Request/Reply SP August 2013
  437. the user sends a reply, endpoint strips first 32 bits off and uses
  438. the value to determine where it is to be routed.
  439. If the reply is shorter than 32 bits, it is malformed and the
  440. endpoint MUST ignore it. Also, if the most relevant bit of the
  441. 32-bit value isn't set to 0, the reply is malformed and MUST be
  442. ignored.
  443. Otherwise, the endpoint checks whether its table of associated
  444. channels contains the channel with a corresponding ID. If so, it
  445. sends the reply (with first 32 bits stripped off) to that channel.
  446. If the channel is not found, the reply MUST be dropped. If the
  447. channel is not available for sending, i.e. it is applying
  448. backpressure, the reply MUST be dropped.
  449. Note that when the reply is unroutable two things might have
  450. happened. Either there was some kind of network disruption, in which
  451. case the request will be re-sent later on, or the original client
  452. have failed or been shut down. In such case the request won't be
  453. resent, however, it doesn't really matter because there's no one to
  454. deliver the reply to any more anyway.
  455. Unlike requests, there's no pushback applied to the replies; they are
  456. simply dropped. If the endpoint blocked and waited for the channel
  457. to become available, all the subsequent replies, possibly destined
  458. for different unblocked channels, would be blocked in the meantime.
  459. That allows for a DoS attack simply by firing a lot of requests and
  460. not receiving the replies.
  461. 6. End-to-end functionality
  462. End-to-end functionality is built on top of hop-to-hop functionality.
  463. Thus, an endpoint on the edge of a topology contains all the hop-by-
  464. hop functionality, but also implements additional functionality of
  465. its own. This end-to-end functionality acts basically as a user of
  466. the underlying hop-by-hop functionality.
  467. 6.1. REQ endpoint
  468. End-to-end functionality for REQ sockets is concerned with re-sending
  469. the requests in case of failure and with filtering out stray or
  470. outdated replies.
  471. To be able to do the latter, the endpoint must tag the requests with
  472. unique 31-bit request IDs. First request ID is picked at random.
  473. All subsequent request IDs are generated by adding 1 to the last
  474. request ID and possibly overflowing to 0.
  475. Sustrik Expires February 2, 2014 [Page 12]
  476. Internet-Draft Request/Reply SP August 2013
  477. To improve robustness of the system, the implementation MUST ensure
  478. that the random number is different each time the endpoint, the
  479. process or the machine is restarted. Pseudo-random generator with
  480. fixed seed won't do.
  481. When user asks the endpoint to send a message, the endpoint prepends
  482. a 32-bit value to the message, consisting of a single bit set to 1
  483. followed by a 31-bit request ID and passes it on in a standard hop-
  484. by-hop way.
  485. If the hop-by-hop layer reports pushback condition, the end-to-end
  486. layer considers the request unsent and MUST report pushback condition
  487. to the user.
  488. If the request is successfully sent, the endpoint stores the request
  489. including its request ID, so that it can be resent later on if
  490. needed. At the same time it sets up a timer to trigger the re-
  491. transmission in case the reply is not received within a specified
  492. timeout. The user MUST be allowed to specify the timeout interval.
  493. The default timeout interval must be 60 seconds.
  494. When a reply is received from the underlying hop-by-hop
  495. implementation, the endpoint should strip off first 32 bits from the
  496. reply to check whether it is a valid reply.
  497. If the reply is shorter than 32 bits, it is malformed and the
  498. endpoint MUST ignore it. If the most significant bit of the 32-bit
  499. value is set to 0, the reply is malformed and MUST be ignored.
  500. Otherwise, the endpoint should check whether the request ID in the
  501. reply matches any of the request IDs of the requests being processed
  502. at the moment. If not so, the reply MUST be ignored. It is either a
  503. stray message or a duplicate reply.
  504. Please note that the endpoint can support either one or more requests
  505. being processed in parallel. Which one is the case depends on the
  506. API exposed to the user and is not part of this specification.
  507. If the ID in the reply matches one of the requests in progress, the
  508. reply MUST be passed to the user (with the 32-bit prefix stripped
  509. off). At the same time the stored copy of the original request as
  510. well as re-transmission timer must be deallocated.
  511. Finally, REQ endpoint MUST make it possible for the user to cancel a
  512. particular request in progress. What it means technically is
  513. deleting the stored copy of the request and cancelling the associated
  514. timer. Thus, once the reply arrives, it will be discarded by the
  515. algorithm above.
  516. Sustrik Expires February 2, 2014 [Page 13]
  517. Internet-Draft Request/Reply SP August 2013
  518. The cancellation allows, for example, the user to time out a request.
  519. They can simply post a request and if there's no answer in specific
  520. timeframe, they can cancel it.
  521. 6.2. REP endpoint
  522. End-to-end functionality for REP endpoints is concerned with turning
  523. requests into corresponding replies.
  524. When user asks to receive a request, the endpoint gets next request
  525. from the hop-by-hop layer and splits it into the traceback stack and
  526. the message payload itself. The traceback stack is stored and the
  527. payload is returned to the user.
  528. The algorithm for splitting the request is as follows: Strip 32 bit
  529. tags from the message in one-by-one manner. Once the most
  530. significant bit of the tag is set, we've reached the bottom of the
  531. traceback stack and the splitting is done. If the end of the message
  532. is reached without finding the bottom of the stack, the request is
  533. malformed and MUST be ignored.
  534. Note that the payload produced by this procedure is the same as the
  535. request payload sent by the original client.
  536. Once the user processes the request and sends the reply, the endpoint
  537. prepends the reply with the stored traceback stack and sends it on
  538. using the hop-by-hop layer. At that point the stored traceback stack
  539. MUST be deallocated.
  540. Additionally, REP endpoint MUST support cancelling any request being
  541. processed at the moment. What it means, technically, is that state
  542. associated with the request, i.e. the traceback stack stored by the
  543. endpoint is deleted and reply to that particular request is never
  544. sent.
  545. The most important use of cancellation is allowing the service
  546. instances to ignore malformed requests. If the application-level
  547. part of the request doesn't conform to the application protocol the
  548. service can simply cancel the request. In such case the reply is
  549. never sent. Of course, if application wants to send an application-
  550. specific error massage back to the client it can do so by not
  551. cancelling the request and sending a regular reply.
  552. 7. Loop avoidance
  553. It may happen that a request/reply topology contains a loop. It
  554. becomes increasingly likely as the topology grows out of scope of a
  555. single organisation and there are multiple administrators involved in
  556. Sustrik Expires February 2, 2014 [Page 14]
  557. Internet-Draft Request/Reply SP August 2013
  558. maintaining it. Unfortunate interaction between two perfectly
  559. legitimate setups can cause loop to be created.
  560. With no additional guards against the loops, it's likely that
  561. requests will be caught inside the loop, rotating there forever, each
  562. message gradually growing in size as new prefixes are added to it by
  563. each REP endpoint on the way. Eventually, a loop can cause
  564. congestion and bring the whole system to a halt.
  565. To deal with the problem REQ endpoints MUST check the depth of the
  566. traceback stack for every outgoing request and discard any requests
  567. where it exceeds certain threshold. The threshold should be defined
  568. by the user. The default value is suggested to be 8.
  569. 8. IANA Considerations
  570. New SP endpoint types REQ and REP should be registered by IANA. For
  571. now, value of 16 should be used for REQ endpoints and value of 17 for
  572. REP endpoints.
  573. 9. Security Considerations
  574. The mapping is not intended to provide any additional security to the
  575. underlying protocol. DoS concerns are addressed within the
  576. specification.
  577. 10. References
  578. [SPoverTCP]
  579. Sustrik, M., "TCP mapping for SPs", August 2013.
  580. Author's Address
  581. Martin Sustrik (editor)
  582. Email: sustrik@250bpm.com
  583. Sustrik Expires February 2, 2014 [Page 15]