LiuZe
/
env


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745
							<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<rfc category="info" docName="sp-request-reply-01">

  <front>

    <title abbrev="Request/Reply SP">
    Request/Reply Scalability Protocol
    </title>

    <author fullname="Martin Sustrik" initials="M." role="editor"
            surname="Sustrik">
      <address>
        <email>sustrik@250bpm.com</email>
      </address>
    </author>

    <date month="August" year="2013" />

    <area>Applications</area>
    <workgroup>Internet Engineering Task Force</workgroup>

    <keyword>Request</keyword>
    <keyword>Reply</keyword>
    <keyword>REQ</keyword>
    <keyword>REP</keyword>
    <keyword>stateless</keyword>
    <keyword>service</keyword>
    <keyword>SP</keyword>

    <abstract>
      <t>This document defines a scalability protocol used for distributing
         processing tasks among arbitrary number of stateless processing nodes
         and returning the results of the processing.</t>
    </abstract>

  </front>

  <middle>

    <section title = "Introduction">

      <t>One of the most common problems in distributed applications is how to
         delegate a work to another processing node and get the result back to
         the original node. In other words, the goal is to utilise the CPU
         power of a remote node.</t>

      <t>There's a wide range of RPC systems addressing the problem, however,
         instead of relying on simple RPC algorithm, we will aim at solving a
         more general version of the problem. First, we want to issue processing
         requests from multiple clients, not just a single one. Second, we want
         to distribute the tasks to any number processing nodes instead of a
         single one so that the processing can be scaled up by adding new
         processing nodes as necessary.</t>

      <t>Solving the generalised problem requires that the algorithm
         executing the task in question -- also known as "service" -- is
         stateless.</t>

      <t>To put it simply, the service is called "stateless" when there's no
         way for the user to distinguish whether a request was processed by
         one instance of the service or another one.</t>

      <t>So, for example, a service which accepts two integers and multiplies
         them is stateless. Request for "2x2" is always going to produce "4",
         no matter what instance of the service have computed it.</t>

      <t>Service that accepts empty requests and produces the number
         of requests processed so far (1, 2, 3 etc.), on the other hand, is
         not stateless. To prove it you can run two instances of the service.
         First reply, no matter which instance produces it is going to be 1.
         Second reply though is going to be either 2 (if processed by the same
         instance as the first one) or 1 (if processed by the other instance).
         You can distinguish which instance produced the result. Thus,
         according to the definition, the service is not stateless.</t>

      <t>Despite the name, being "stateless" doesn't mean that the service has
         no state at all. Rather it means that the service doesn't retain any
         business-logic-related state in-between processing two subsequent
         requests. The service is, of course, allowed to have state while
         processing a single request. It can also have state that is unrelated
         to its business logic, say statistics about the processing that are
         used for administrative purposes and never returned to the clients.</t>

      <t>Also note that "stateless" doesn't necessarily mean "fully
         deterministic". For example, a service that generates random numbers is
         non-deterministic. However, the client, after receiving a new random
         number cannot tell which instance has produced it, thus, the service
         can be considered stateless.</t>

      <t>While stateless services are often implemented by passing the entire
         state inside the request, they are not required to do so. Especially
         when the state is large, passing it around in each request may be
         impractical. In such cases, it's typically just a reference to the
         state that's passed in the request, such as ID or path. The state
         itself can then be retrieved by the service from a shared database,
         a network file system or similar storage mechanism.</t>
         
      <t>Requiring services to be stateless serves a specific purpose.
         It allows for using any number of service instances to handle
         the processing load. After all, the client won't be able to tell the
         difference between replies from instance A and replies from instance B.
         You can even start new instances on the fly and get away with it.
         The client still won't be able to tell the difference. In other
         words, statelessness is a prerequisite to make your service cluster
         fully scalable.</t>

      <t>Once it is ensured that the service is stateless there are several
         topologies for a request/reply system to form. What follows are
         the most common:

        <list style = "numbers">
          <t>One client sends a request to one server and gets a reply.
             The common RPC scenario.</t>
          <t>Many clients send requests to one server and get replies. The
             classic client/server model. Think of a database server and
             database clients. Alternatively think of a messaging broker and
             messaging clients.</t>
          <t>One client send requests to many servers and gets replies.
             The load-balancer model. Think of HTTP load balancers.</t>
          <t>Many clients send requests to be processed by many servers.
             The "enterprise service bus" model. In the simplest case the bus
             can be implemented as a simple hub-and-spokes topology. In complex
             cases the bus can span multiple physical locations or multiple
             organisations with intermediate nodes at the boundaries connecting
             different parts of the topology.</t>
        </list>

      </t>

      <t>In addition to distributing tasks to processing nodes, request/reply
         model comes with full end-to-end reliability. The reliability guarantee
         can be defined as follows: As long as the client is alive and there's
         at least one server accessible from the client, the task will
         eventually get processed and the result will be delivered back to
         the client.</t>

      <t>End-to-end reliability is achieved, similar to TCP, by re-sending the
         request if the client believes the original instance of the request
         has failed. Typically, request is believed to have failed when there's
         no reply received within a specified time.</t>

      <t>Note that, unlike with TCP, the reliability algorithm is resistant to
         a server failure. Even if server fails while processing a request, the
         request will be re-sent and eventually processed by a different
         instance of the server.</t>

      <t>As can be seen from the above, one request may be processed multiple
         times. For example, reply may be lost on its way back to the client.
         Client will assume that the request was not processed yet, it will
         resend it and thus cause duplicate execution of the task.</t>

      <t>Some applications may want to prevent duplicate execution of tasks. It
         often turns out that hardening such applications to be idempotent is
         relatively easy as they already possess the tools to do so. For
         example, a payment processing server already has access to a shared
         database which it can use to verify that the payment with specified ID
         was not yet processed.</t>

      <t>On the other hand, many applications don't care about occasional
         duplicate processed tasks. Therefore, request/reply protocol does not
         require the service to be idempotent. Instead, the idempotence issue
         is left to the user to decide on.</t>

      <t>Finally, it should be noted that this specification discusses several
         features that are of little use in simple topologies and are rather
         aimed at large, geographically or organisationally distributed
         topologies. Features like channel prioritisation and loop avoidance
         fall into this category.</t>

    </section>

    <section title = "Underlying protocol">

      <t>The request/reply protocol can be run on top of any SP mapping,
         such as, for example, <xref target='SPoverTCP'>SP TCPmapping</xref>.
         </t>

      <t>Also, given that SP protocols describe the behaviour of entire
         arbitrarily complex topology rather than of a single node-to-node
         communication, several underlying protocols can be used in parallel.
         For example, a client may send a request via WebSocket, then, on the
         edge of the company network an intermediary node may retransmit it
         using TCP etc.</t>

      <figure>
        <artwork>
+---+  WebSocket  +---+    TCP    +---+
|   |-------------|   |-----------|   |
+---+             +---+           +---+
                   | |
     +---+   IPC   | |  SCTP  +---+    DCCP   +---+
     |   |---------+ +--------|   |-----------|   |
     +---+                    +---+           +---+
        </artwork>
      </figure>

    </section>

    <section title = "Overview of the algorithm">

      <t>Request/reply protocol defines two different endpoint types:
         The requester or REQ (the client) and the replier or REP (the
         service).</t>

      <t>REQ endpoint can be connected only to a REP endpoint. REP endpoint
         can be connected only to the REQ endpoint. If the underlying protocol
         indicates that there's an attempt to create a channel to an
         incompatible endpoint, the channel MUST NOT be used. In the case of
         TCP mapping, for example, the underlying TCP connection MUST
         be closed.</t>

      <t>When creating more complex topologies, REQ and REP endpoints are
         paired in the intermediate nodes to form a forwarding component,
         so called "device". Device receives requests from the REP endpoint
         and forwards them to the REQ endpoint. At the same time it receives
         replies from the REQ endpoint and forwards them to the REP
         endpoint:</t>

      <figure>
        <artwork>
                --- requests --&gt;

+-----+   +-----+-----+   +-----+-----+   +-----+
|     |--&gt;|     |     |--&gt;|     |     |--&gt;|     |
| REQ |   | REP | REQ |   | REP | REQ |   | REP |
|     |&lt;--|     |     |&lt;--|     |     |&lt;--|     |
+-----+   +-----+-----+   +-----+-----+   +-----+

                &lt;-- replies ---
        </artwork>
      </figure>

      <t>Using devices, arbitrary complex topologies can be built. The rest
         of this section explains how are the requests routed through a topology
         towards processing nodes and how are replies routed back from
         processing nodes to the original clients, as well as how the
         reliability is achieved.</t>

      <t>The idea for routing requests is to implement a simple coarse-grained
         scheduling algorithm based on pushback capabilities of the underlying
         transport.</t>

      <t>The algorithm works by interpreting pushback on a particular channel
         as "the part of topology accessible through this channel is busy at
         the moment and doesn't accept any more requests."</t>

      <t>Thus, when a node is about to send a request, it can choose to send
         it only to one of the channels that don't report pushback at the
         moment. To implement approximately fair distribution of the workload
         the node choses a channel from that pool using the round-robin
         algorithm.</t>

      <t>As for delivering replies back to the clients, it should be understood
         that the client may not be directly accessible (say using TCP/IP) from
         the processing node. It may be beyond a firewall, have no static IP
         address etc. Furthermore, the client and the processing may not even
         speak the same transport protocol -- imagine client connecting to the
         topology using WebSockets and processing node via SCTP.</t>

      <t>Given the above, it becomes obvious that the replies must be routed
         back through the existing topology rather than directly. In fact,
         request/reply topology may be thought of as an overlay network on the
         top of underlying transport mechanisms.</t>

      <t>As for routing replies within the request/topology, it is designed in
         such a way that each reply contains the whole routing path, rather
         than containing just the address of destination node, as is the case
         with, for example, TCP/IP.</t>

      <t>The downside of the design is that replies are a little bit longer
         and that is in intermediate node gets restarted, all the requests
         that were routed through it will fail to complete and will have to be
         resent by request/reply end-to-end reliability mechanism.</t>

      <t>The upside, on the other hand, is that the nodes in the topology don't
         have to maintain any routing tables beside the simple table of
         adjacent channels along with their IDs. There's also no need for any
         additional protocols for distributing routing information within
         the topology.</t>

      <t>The most important reason for adopting the design though is that
         there's no propagation delay and any nodes becomes accessible
         immediately after it is started. Given that some nodes in the topology
         may be extremely short-lived this is a crucial requirement. Imagine
         a database client that sends a query, reads the result and terminates.
         It makes no sense to delay the whole process until the routing tables
         are synchronised between the client and the server.</t>

      <t>The algorithm thus works as follows: When request is routed from the
         client to the processing node, every REP endpoint determines which
         channel it was received from and adds the ID of the channel to the
         request. Thus, when the request arrives at the ultimate processing node
         it already contains a full backtrace stack, which in turn contains
         all the info needed to route a message back to the original client.</t>

      <t>After processing the request, the processing node attaches the
         backtrace stack from the request to the reply and sends it back
         to the topology. At that point every REP endpoint can check the
         traceback and determine which channel it should send the reply to.</t>

      <t>In addition to routing, request/reply protocol takes care of
         reliability, i.e. ensures that every request will be eventually
         processed and the reply will be delivered to the user, even when
         facing failures of processing nodes, intermediate nodes and network
         infrastructure.</t>

      <t>Reliability is achieved by simply re-sending the request, if the reply
         is not received with a certain timeframe. To make that algorithm
         work flawlessly, the client has to be able to filter out any stray
         replies (delayed replies for the requests that we've already received
         reply to).</t>

      <t>The client thus adds an unique request ID to the request. The ID gets
         copied from the request to the reply by the processing node. When the
         reply gets back to the client, it can simply check whether the request
         in question is still being processed and if not so, it can ignore
         the reply.</t>

      <t>To implement all the functionality described above, messages (both
         requests and replies have the following format:</t>

      <figure>
        <artwork>
+-+------------+-+------------+   +-+------------+-------------+
|0| Channel ID |0| Channel ID |...|1| Request ID |   payload   |
+-+------------+-+------------+   +-+------------+ ------------+
        </artwork>
      </figure>

      <t>Payload of the message is preceded by a stack of 32-bit tags. The most
         significant bit of each tag is set to 0 except for the very last tag.
         That allows the algorithm to find out where the tags end and where
         the message payload begins.</t>

      <t>As for the remaining 31 bits, they are either request ID (in the last
         tag) or a channel ID (in all the remaining tags). The first channel ID
         is added and processed by the REP endpoint closest to the processing
         node. The last channel ID is added and processed by the REP endpoint
         closest to the client.</t>

      <t>Following picture shows an example of request saying "Hello" being
         routed from the client through two intermediate nodes to the
         processing node and the reply "World" being routed back. It shows
         what messages are passed over the network at each step of the
         process:</t>

      <figure>
        <artwork>
                        client
                  Hello    |    World
                   |    +-----+    ^
                   |    | REQ |    |
                   V    +-----+    |
            1|823|Hello    |    1|823|World
                   |    +-----+    ^
                   |    | REP |    |
                   |    +-----+    |
                   |    | REQ |    |
                   V    +-----+    |
      0|299|1|823|Hello    |    0|299|1|823|World
                   |    +-----+    ^
                   |    | REP |    |
                   |    +-----+    |
                   |    | REQ |    |
                   V    +-----+    |
0|446|0|299|1|823|Hello    |    0|446|0|299|1|823|World
                   |    +-----+    ^
                   |    | REP |    |
                   V    +-----+    |
                  Hello    |    World
                        service
        </artwork>
      </figure>

    </section>

    <section title = "Hop-by-hop vs. End-to-end">

      <t>All endpoints implement so called "hop-by-hop" functionality. It's
         the functionality concerned with sending messages to the immediately
         adjacent components and receiving messages from them.</t>

      <t>In addition to that, the endpoints on the edge of the topology
         implement so called "end-to-end" functionality that is concerned
         with issues such as, for example, reliability.</t>

      <figure>
        <artwork>
                   end to end
   +-----------------------------------------+
   |                                         |
+-----+   +-----+-----+   +-----+-----+   +-----+
|     |--&gt;|     |     |--&gt;|     |     |--&gt;|     |
| REQ |   | REP | REQ |   | REP | REQ |   | REP |
|     |&lt;--|     |     |&lt;--|     |     |&lt;--|     |
+-----+   +-----+-----+   +-----+-----+   +-----+
   |         |     |         |     |         |
   +---------+     +---------+     +---------+
   hop by hop      hop by hop      hop by hop
        </artwork>
      </figure>

      <t>To make an analogy with the TCP/IP stack, IP provides hop-by-hop
         functionality, i.e. routing of the packets to the adjacent node,
         while TCP implements end-to-end functionality such resending of
         lost packets.</t>

      <t>As a rule of thumb, raw hop-by-hop endpoints are used to build
         devices (intermediary nodes in the topology) while end-to-end
         endpoints are used directly by the applications.</t>

      <t>To prevent confusion, the specification of the endpoint behaviour
         below will discuss hop-by-hop and end end-to-end functionality in
         separate chapters.</t>

    </section>

    <section title = "Hop-by-hop functionality">

      <section title = "REQ endpoint">

        <t>The REQ endpoint is used by the user to send requests to the
           processing nodes and receive the replies afterwards.</t>

        <t>When user asks REQ endpoint to send a request, the endpoint should
           send it to one of the associated outbound channels (TCP connections
           or similar). The request sent is exactly the message supplied by
           the user. REQ socket MUST NOT modify an outgoing request in any
           way.</t>

        <t>If there's no channel to send the request to, the endpoint won't send
           the request and MUST report the backpressure condition to the user.
           For example, with BSD socket API, backpressure is reported as EAGAIN
           error.</t>

        <t>If there are associated channels but none of them is available for
           sending, i.e. all of them are already reporting backpressure, the
           endpoint won't send the message and MUST report the backpressure
           condition to the user.</t>

        <t>Backpressure is used as a means to redirect the requests from the
           congested parts of the topology to to the parts that are still
           responsive. It can be thought of as a crude scheduling algorithm.
           However crude though, it's probably still the  best you can get
           without knowing estimates of execution time for individual tasks,
           CPU capacity of individual processing nodes etc.</t>

        <t>Alternatively, backpressure can be thought of as a congestion control
           mechanism. When all available processing nodes are busy, it slows
           down the client application, i.e. it prevents the user from sending
           any more requests.</t>

        <t>If the channel is not capable of reporting backpressure (e.g. DCCP)
           the endpoint SHOULD consider it as always available for sending new
           request. However, such channels should be used with care as when the
           congestion hits they may suck in a lot of requests just to discard
           them silently and thus cause re-transmission storms later on. The
           implementation of the REQ endpoint MAY choose to prohibit the use
           of such channels altogether.</t>

        <t>When there are multiple channels available for sending the request
           endpoint MAY use any prioritisation mechanism to decide which channel
           to send the request to. For example, it may use classic priorities
           attached to channels and send message to the channel with the highest
           priority. That allows for routing algorithms such as: "Use local
           processing nodes if any are available. Send the requests to remote
           nodes only if there are no local ones available." Alternatively,
           the endpoint may implement weighted priorities ("send 20% of the
           request to node A and 80% to node B). The endpoint also may not
           implement any prioritisation strategy and treat all channels as
           equal.</t>

        <t>Whatever the case, two rules must apply.</t>

        <t>First, by default the priority settings for all channels MUST be
           equal. Creating a channel with different priority MUST be triggered
           by an explicit action by the user.</t>

        <t>Second, if there are several channels with equal priority, the
           endpoint MUST distribute the messages among them in fair fashion
           using round-robin algorithm. The round-robin implementation MUST also
           take care not to become unfair when new channels are added or old
           ones are removed on the fly.</t>

        <t>As for incoming messages, i.e. replies, REQ endpoint MUST fair-queues
           them. In other words, if there are replies available on several
           channels, it MUST receive them in a round-robin fashion. It must also
           take care not to compromise the fairness when new channels are
           added or old ones removed.</t>

        <t>In addition to providing basic fairness, the goal of fair-queueing is
           to prevent DoS attacks where a huge stream of fake replies from one
           channel would be able to block the real replies coming from different
           channels. Fair queueing ensures that messages from every channel are
           received at approximately the same rate. That way, DoS attack can
           slow down the system but it can't entirely block it.</t>

        <t>Incoming replies MUST be handed to the user exactly as they were
           received. REQ endpoint MUST not modify the replies in any way.</t>

      </section>

      <section title = "REP endpoint">

        <t>REP endpoint is used to receive requests from the clients and send
           replies back to the clients.</t>

        <t>First of all, REP socket is responsible for assigning unique 31-bit
           channel IDs to the individual associated channels.</t>

        <t>First ID assigned MUST be random. Next is computed by adding 1 to
           the previous one with potential overflow to 0.</t>

        <t>The implementation MUST ensure that the random number is different
           each time the endpoint is re-started, the process that contains
           it is restarted or similar. So, for example, using pseudo-random
           generator with a constant seed won't do.</t>

        <t>The goal of the algorithm is to the spread of possible channel ID
           values and thus minimise the chance that a reply is routed to an
           unrelated channel, even in the face of intermediate node
           failures.</t>

        <t>When receiving a message, REP endpoint MUST fair-queue among the
           channels available for receiving. In other words it should
           round-robin among such channels and receive one request from
           a channel at a time. It MUST also implement the round-robin
           algorithm is such a way that adding or removing channels don't
           break its fairness.</t>

        <t>In addition to guaranteeing basic fairness in access to computing
           resources the above algorithm makes it impossible for a malevolent
           or misbehaving client to completely block the processing of requests
           from other clients by issuing steady stream of requests.</t>

        <t>After getting hold on the request, the REP socket should prepend it
           by 32 bit value, consisting of 1 bit set to 0 followed by the 31-bit
           ID of the channel the request was received from. The extended request
           will be then handed to the user.</t>

        <t>The goal of adding the channel ID to the request is to be able to
           route the reply back to the original channel later on. Thus, when
           the user sends a reply, endpoint strips first 32 bits off and uses
           the value to determine where it is to be routed.</t>

        <t>If the reply is shorter than 32 bits, it is malformed and
           the endpoint MUST ignore it. Also, if the most relevant bit of the
           32-bit value isn't set to 0, the reply is malformed and MUST
           be ignored.</t>

        <t>Otherwise, the endpoint checks whether its table of associated
           channels contains the channel with a corresponding ID. If so, it
           sends the reply (with first 32 bits stripped off) to that channel.
           If the channel is not found, the reply MUST be dropped. If the
           channel is not available for sending, i.e. it is applying
           backpressure, the reply MUST be dropped.</t>

        <t>Note that when the reply is unroutable two things might have
           happened. Either there was some kind of network disruption, in which
           case the request will be re-sent later on, or the original client
           have failed or been shut down. In such case the request won't be
           resent, however, it doesn't really matter because there's no one to
           deliver the reply to any more anyway.</t>

        <t>Unlike requests, there's no pushback applied to the replies; they are
           simply dropped. If the endpoint blocked and waited for the channel to
           become available, all the subsequent replies, possibly destined for
           different unblocked channels, would be blocked in the meantime. That
           allows for a DoS attack simply by firing a lot of requests and not
           receiving the replies.</t>

      </section>
   
    </section>

    <section title = "End-to-end functionality">

      <t>End-to-end functionality is built on top of hop-to-hop functionality.
         Thus, an endpoint on the edge of a topology contains all the
         hop-by-hop functionality, but also implements additional
         functionality of its own. This end-to-end functionality acts
         basically as a user of the underlying hop-by-hop functionality.</t>

      <section title = "REQ endpoint">

        <t>End-to-end functionality for REQ sockets is concerned with re-sending
           the requests in case of failure and with filtering out stray or
           outdated replies.</t>

        <t>To be able to do the latter, the endpoint must tag the requests with
           unique 31-bit request IDs. First request ID is picked at random. All
           subsequent request IDs are generated by adding 1 to the last request
           ID and possibly overflowing to 0.</t>

        <t>To improve robustness of the system, the implementation MUST ensure
           that the random number is different each time the endpoint, the
           process or the machine is restarted. Pseudo-random generator with
           fixed seed won't do.</t>

        <t>When user asks the endpoint to send a message, the endpoint prepends
           a 32-bit value to the message, consisting of a single bit set to 1
           followed by a 31-bit request ID and passes it on in a standard
           hop-by-hop way.</t>

        <t>If the hop-by-hop layer reports pushback condition, the end-to-end
           layer considers the request unsent and MUST report pushback condition
           to the user.</t>

        <t>If the request is successfully sent, the endpoint stores the request
           including its request ID, so that it can be resent later on if
           needed. At the same time it sets up a timer to trigger the
           re-transmission in case the reply is not received within a specified
           timeout. The user MUST be allowed to specify the timeout interval.
           The default timeout interval must be 60 seconds.</t>

        <t>When a reply is received from the underlying hop-by-hop
           implementation, the endpoint should strip off first 32 bits from
           the reply to check whether it is a valid reply.</t>

        <t>If the reply is shorter than 32 bits, it is malformed and the
           endpoint MUST ignore it. If the most significant bit of the 32-bit
           value is set to 0, the reply is malformed and MUST be ignored.</t>

        <t>Otherwise, the endpoint should check whether the request ID in
           the reply matches any of the request IDs of the requests being
           processed at the moment. If not so, the reply MUST be ignored.
           It is either a stray message or a duplicate reply.</t>

        <t>Please note that the endpoint can support either one or more
           requests being processed in parallel. Which one is the case depends
           on the API exposed to the user and is not part of this
           specification.</t>

        <t>If the ID in the reply matches one of the requests in progress, the
           reply MUST be passed to the user (with the 32-bit prefix stripped
           off). At the same time the stored copy of the original request as
           well as re-transmission timer must be deallocated.</t>

        <t>Finally, REQ endpoint MUST make it possible for the user to cancel
           a particular request in progress. What it means technically is
           deleting the stored copy of the request and cancelling the associated
           timer. Thus, once the reply arrives, it will be discarded by the
           algorithm above.</t>

        <t>The cancellation allows, for example, the user to time out a request.
           They can simply post a request and if there's no answer in specific
           timeframe, they can cancel it.</t>

      </section>

      <section title = "REP endpoint">

        <t>End-to-end functionality for REP endpoints is concerned with turning
           requests into corresponding replies.</t>

        <t>When user asks to receive a request, the endpoint gets next request
           from the hop-by-hop layer and splits it into the traceback stack and
           the message payload itself. The traceback stack is stored and the
           payload is returned to the user.</t>

        <t>The algorithm for splitting the request is as follows: Strip 32 bit
           tags from the message in one-by-one manner. Once the most significant
           bit of the tag is set, we've reached the bottom of the traceback
           stack and the splitting is done. If the end of the message is reached
           without finding the bottom of the stack, the request is malformed and
           MUST be ignored.</t>

        <t>Note that the payload produced by this procedure is the same as the
           request payload sent by the original client.</t>

        <t>Once the user processes the request and sends the reply, the endpoint
           prepends the reply with the stored traceback stack and sends it on
           using the hop-by-hop layer. At that point the stored traceback stack
           MUST be deallocated.</t>

        <t>Additionally, REP endpoint MUST support cancelling any request being
           processed at the moment. What it means, technically, is that
           state associated with the request, i.e. the traceback stack stored
           by the endpoint is deleted and reply to that particular
           request is never sent.</t>

        <t>The most important use of cancellation is allowing the service
           instances to ignore malformed requests. If the application-level
           part of the request doesn't conform to the application protocol
           the service can simply cancel the request. In such case the reply
           is never sent. Of course, if application wants to send an
           application-specific error massage back to the client it can do so
           by not cancelling the request and sending a regular reply.</t>

      </section>

    </section>

    <section title = "Loop avoidance">

      <t>It may happen that a request/reply topology contains a loop. It becomes
         increasingly likely as the topology grows out of scope of a single
         organisation and there are multiple administrators involved
         in maintaining it. Unfortunate interaction between two perfectly
         legitimate setups can cause loop to be created.</t>

      <t>With no additional guards against the loops, it's likely that
         requests will be caught inside the loop, rotating there forever,
         each message gradually growing in size as new prefixes are added to it
         by each REP endpoint on the way. Eventually, a loop can cause
         congestion and bring the whole system to a halt.</t>

      <t>To deal with the problem REQ endpoints MUST check the depth of the
         traceback stack for every outgoing request and discard any requests
         where it exceeds certain threshold. The threshold should be defined
         by the user. The default value is suggested to be 8.</t>

    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>New SP endpoint types REQ and REP should be registered by IANA. For
         now, value of 16 should be used for REQ endpoints and value of 17 for
         REP endpoints.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The mapping is not intended to provide any additional security to the
         underlying protocol. DoS concerns are addressed within
         the specification.</t>
    </section>

  </middle>

  <back>
    <references>
      <reference anchor='SPoverTCP'>
         <front>
           <title>TCP mapping for SPs</title>
           <author initials='M.' surname='Sustrik' fullname='M. Sustrik'/>
           <date month='August' year='2013'/>
         </front>
         <format type='TXT' target='sp-tcp-mapping-01.txt'/>
       </reference>
    </references>
  </back>

</rfc>