123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745 |
- <?xml version="1.0" encoding="US-ASCII"?>
- <!DOCTYPE rfc SYSTEM "rfc2629.dtd">
- <rfc category="info" docName="sp-request-reply-01">
- <front>
- <title abbrev="Request/Reply SP">
- Request/Reply Scalability Protocol
- </title>
- <author fullname="Martin Sustrik" initials="M." role="editor"
- surname="Sustrik">
- <address>
- <email>sustrik@250bpm.com</email>
- </address>
- </author>
- <date month="August" year="2013" />
- <area>Applications</area>
- <workgroup>Internet Engineering Task Force</workgroup>
- <keyword>Request</keyword>
- <keyword>Reply</keyword>
- <keyword>REQ</keyword>
- <keyword>REP</keyword>
- <keyword>stateless</keyword>
- <keyword>service</keyword>
- <keyword>SP</keyword>
- <abstract>
- <t>This document defines a scalability protocol used for distributing
- processing tasks among arbitrary number of stateless processing nodes
- and returning the results of the processing.</t>
- </abstract>
- </front>
- <middle>
- <section title = "Introduction">
- <t>One of the most common problems in distributed applications is how to
- delegate a work to another processing node and get the result back to
- the original node. In other words, the goal is to utilise the CPU
- power of a remote node.</t>
- <t>There's a wide range of RPC systems addressing the problem, however,
- instead of relying on simple RPC algorithm, we will aim at solving a
- more general version of the problem. First, we want to issue processing
- requests from multiple clients, not just a single one. Second, we want
- to distribute the tasks to any number processing nodes instead of a
- single one so that the processing can be scaled up by adding new
- processing nodes as necessary.</t>
- <t>Solving the generalised problem requires that the algorithm
- executing the task in question -- also known as "service" -- is
- stateless.</t>
- <t>To put it simply, the service is called "stateless" when there's no
- way for the user to distinguish whether a request was processed by
- one instance of the service or another one.</t>
- <t>So, for example, a service which accepts two integers and multiplies
- them is stateless. Request for "2x2" is always going to produce "4",
- no matter what instance of the service have computed it.</t>
- <t>Service that accepts empty requests and produces the number
- of requests processed so far (1, 2, 3 etc.), on the other hand, is
- not stateless. To prove it you can run two instances of the service.
- First reply, no matter which instance produces it is going to be 1.
- Second reply though is going to be either 2 (if processed by the same
- instance as the first one) or 1 (if processed by the other instance).
- You can distinguish which instance produced the result. Thus,
- according to the definition, the service is not stateless.</t>
- <t>Despite the name, being "stateless" doesn't mean that the service has
- no state at all. Rather it means that the service doesn't retain any
- business-logic-related state in-between processing two subsequent
- requests. The service is, of course, allowed to have state while
- processing a single request. It can also have state that is unrelated
- to its business logic, say statistics about the processing that are
- used for administrative purposes and never returned to the clients.</t>
- <t>Also note that "stateless" doesn't necessarily mean "fully
- deterministic". For example, a service that generates random numbers is
- non-deterministic. However, the client, after receiving a new random
- number cannot tell which instance has produced it, thus, the service
- can be considered stateless.</t>
- <t>While stateless services are often implemented by passing the entire
- state inside the request, they are not required to do so. Especially
- when the state is large, passing it around in each request may be
- impractical. In such cases, it's typically just a reference to the
- state that's passed in the request, such as ID or path. The state
- itself can then be retrieved by the service from a shared database,
- a network file system or similar storage mechanism.</t>
-
- <t>Requiring services to be stateless serves a specific purpose.
- It allows for using any number of service instances to handle
- the processing load. After all, the client won't be able to tell the
- difference between replies from instance A and replies from instance B.
- You can even start new instances on the fly and get away with it.
- The client still won't be able to tell the difference. In other
- words, statelessness is a prerequisite to make your service cluster
- fully scalable.</t>
- <t>Once it is ensured that the service is stateless there are several
- topologies for a request/reply system to form. What follows are
- the most common:
- <list style = "numbers">
- <t>One client sends a request to one server and gets a reply.
- The common RPC scenario.</t>
- <t>Many clients send requests to one server and get replies. The
- classic client/server model. Think of a database server and
- database clients. Alternatively think of a messaging broker and
- messaging clients.</t>
- <t>One client send requests to many servers and gets replies.
- The load-balancer model. Think of HTTP load balancers.</t>
- <t>Many clients send requests to be processed by many servers.
- The "enterprise service bus" model. In the simplest case the bus
- can be implemented as a simple hub-and-spokes topology. In complex
- cases the bus can span multiple physical locations or multiple
- organisations with intermediate nodes at the boundaries connecting
- different parts of the topology.</t>
- </list>
- </t>
- <t>In addition to distributing tasks to processing nodes, request/reply
- model comes with full end-to-end reliability. The reliability guarantee
- can be defined as follows: As long as the client is alive and there's
- at least one server accessible from the client, the task will
- eventually get processed and the result will be delivered back to
- the client.</t>
- <t>End-to-end reliability is achieved, similar to TCP, by re-sending the
- request if the client believes the original instance of the request
- has failed. Typically, request is believed to have failed when there's
- no reply received within a specified time.</t>
- <t>Note that, unlike with TCP, the reliability algorithm is resistant to
- a server failure. Even if server fails while processing a request, the
- request will be re-sent and eventually processed by a different
- instance of the server.</t>
- <t>As can be seen from the above, one request may be processed multiple
- times. For example, reply may be lost on its way back to the client.
- Client will assume that the request was not processed yet, it will
- resend it and thus cause duplicate execution of the task.</t>
- <t>Some applications may want to prevent duplicate execution of tasks. It
- often turns out that hardening such applications to be idempotent is
- relatively easy as they already possess the tools to do so. For
- example, a payment processing server already has access to a shared
- database which it can use to verify that the payment with specified ID
- was not yet processed.</t>
- <t>On the other hand, many applications don't care about occasional
- duplicate processed tasks. Therefore, request/reply protocol does not
- require the service to be idempotent. Instead, the idempotence issue
- is left to the user to decide on.</t>
- <t>Finally, it should be noted that this specification discusses several
- features that are of little use in simple topologies and are rather
- aimed at large, geographically or organisationally distributed
- topologies. Features like channel prioritisation and loop avoidance
- fall into this category.</t>
- </section>
- <section title = "Underlying protocol">
- <t>The request/reply protocol can be run on top of any SP mapping,
- such as, for example, <xref target='SPoverTCP'>SP TCPmapping</xref>.
- </t>
- <t>Also, given that SP protocols describe the behaviour of entire
- arbitrarily complex topology rather than of a single node-to-node
- communication, several underlying protocols can be used in parallel.
- For example, a client may send a request via WebSocket, then, on the
- edge of the company network an intermediary node may retransmit it
- using TCP etc.</t>
- <figure>
- <artwork>
- +---+ WebSocket +---+ TCP +---+
- | |-------------| |-----------| |
- +---+ +---+ +---+
- | |
- +---+ IPC | | SCTP +---+ DCCP +---+
- | |---------+ +--------| |-----------| |
- +---+ +---+ +---+
- </artwork>
- </figure>
- </section>
- <section title = "Overview of the algorithm">
- <t>Request/reply protocol defines two different endpoint types:
- The requester or REQ (the client) and the replier or REP (the
- service).</t>
- <t>REQ endpoint can be connected only to a REP endpoint. REP endpoint
- can be connected only to the REQ endpoint. If the underlying protocol
- indicates that there's an attempt to create a channel to an
- incompatible endpoint, the channel MUST NOT be used. In the case of
- TCP mapping, for example, the underlying TCP connection MUST
- be closed.</t>
- <t>When creating more complex topologies, REQ and REP endpoints are
- paired in the intermediate nodes to form a forwarding component,
- so called "device". Device receives requests from the REP endpoint
- and forwards them to the REQ endpoint. At the same time it receives
- replies from the REQ endpoint and forwards them to the REP
- endpoint:</t>
- <figure>
- <artwork>
- --- requests -->
- +-----+ +-----+-----+ +-----+-----+ +-----+
- | |-->| | |-->| | |-->| |
- | REQ | | REP | REQ | | REP | REQ | | REP |
- | |<--| | |<--| | |<--| |
- +-----+ +-----+-----+ +-----+-----+ +-----+
- <-- replies ---
- </artwork>
- </figure>
- <t>Using devices, arbitrary complex topologies can be built. The rest
- of this section explains how are the requests routed through a topology
- towards processing nodes and how are replies routed back from
- processing nodes to the original clients, as well as how the
- reliability is achieved.</t>
- <t>The idea for routing requests is to implement a simple coarse-grained
- scheduling algorithm based on pushback capabilities of the underlying
- transport.</t>
- <t>The algorithm works by interpreting pushback on a particular channel
- as "the part of topology accessible through this channel is busy at
- the moment and doesn't accept any more requests."</t>
- <t>Thus, when a node is about to send a request, it can choose to send
- it only to one of the channels that don't report pushback at the
- moment. To implement approximately fair distribution of the workload
- the node choses a channel from that pool using the round-robin
- algorithm.</t>
- <t>As for delivering replies back to the clients, it should be understood
- that the client may not be directly accessible (say using TCP/IP) from
- the processing node. It may be beyond a firewall, have no static IP
- address etc. Furthermore, the client and the processing may not even
- speak the same transport protocol -- imagine client connecting to the
- topology using WebSockets and processing node via SCTP.</t>
- <t>Given the above, it becomes obvious that the replies must be routed
- back through the existing topology rather than directly. In fact,
- request/reply topology may be thought of as an overlay network on the
- top of underlying transport mechanisms.</t>
- <t>As for routing replies within the request/topology, it is designed in
- such a way that each reply contains the whole routing path, rather
- than containing just the address of destination node, as is the case
- with, for example, TCP/IP.</t>
- <t>The downside of the design is that replies are a little bit longer
- and that is in intermediate node gets restarted, all the requests
- that were routed through it will fail to complete and will have to be
- resent by request/reply end-to-end reliability mechanism.</t>
- <t>The upside, on the other hand, is that the nodes in the topology don't
- have to maintain any routing tables beside the simple table of
- adjacent channels along with their IDs. There's also no need for any
- additional protocols for distributing routing information within
- the topology.</t>
- <t>The most important reason for adopting the design though is that
- there's no propagation delay and any nodes becomes accessible
- immediately after it is started. Given that some nodes in the topology
- may be extremely short-lived this is a crucial requirement. Imagine
- a database client that sends a query, reads the result and terminates.
- It makes no sense to delay the whole process until the routing tables
- are synchronised between the client and the server.</t>
- <t>The algorithm thus works as follows: When request is routed from the
- client to the processing node, every REP endpoint determines which
- channel it was received from and adds the ID of the channel to the
- request. Thus, when the request arrives at the ultimate processing node
- it already contains a full backtrace stack, which in turn contains
- all the info needed to route a message back to the original client.</t>
- <t>After processing the request, the processing node attaches the
- backtrace stack from the request to the reply and sends it back
- to the topology. At that point every REP endpoint can check the
- traceback and determine which channel it should send the reply to.</t>
- <t>In addition to routing, request/reply protocol takes care of
- reliability, i.e. ensures that every request will be eventually
- processed and the reply will be delivered to the user, even when
- facing failures of processing nodes, intermediate nodes and network
- infrastructure.</t>
- <t>Reliability is achieved by simply re-sending the request, if the reply
- is not received with a certain timeframe. To make that algorithm
- work flawlessly, the client has to be able to filter out any stray
- replies (delayed replies for the requests that we've already received
- reply to).</t>
- <t>The client thus adds an unique request ID to the request. The ID gets
- copied from the request to the reply by the processing node. When the
- reply gets back to the client, it can simply check whether the request
- in question is still being processed and if not so, it can ignore
- the reply.</t>
- <t>To implement all the functionality described above, messages (both
- requests and replies have the following format:</t>
- <figure>
- <artwork>
- +-+------------+-+------------+ +-+------------+-------------+
- |0| Channel ID |0| Channel ID |...|1| Request ID | payload |
- +-+------------+-+------------+ +-+------------+ ------------+
- </artwork>
- </figure>
- <t>Payload of the message is preceded by a stack of 32-bit tags. The most
- significant bit of each tag is set to 0 except for the very last tag.
- That allows the algorithm to find out where the tags end and where
- the message payload begins.</t>
- <t>As for the remaining 31 bits, they are either request ID (in the last
- tag) or a channel ID (in all the remaining tags). The first channel ID
- is added and processed by the REP endpoint closest to the processing
- node. The last channel ID is added and processed by the REP endpoint
- closest to the client.</t>
- <t>Following picture shows an example of request saying "Hello" being
- routed from the client through two intermediate nodes to the
- processing node and the reply "World" being routed back. It shows
- what messages are passed over the network at each step of the
- process:</t>
- <figure>
- <artwork>
- client
- Hello | World
- | +-----+ ^
- | | REQ | |
- V +-----+ |
- 1|823|Hello | 1|823|World
- | +-----+ ^
- | | REP | |
- | +-----+ |
- | | REQ | |
- V +-----+ |
- 0|299|1|823|Hello | 0|299|1|823|World
- | +-----+ ^
- | | REP | |
- | +-----+ |
- | | REQ | |
- V +-----+ |
- 0|446|0|299|1|823|Hello | 0|446|0|299|1|823|World
- | +-----+ ^
- | | REP | |
- V +-----+ |
- Hello | World
- service
- </artwork>
- </figure>
- </section>
- <section title = "Hop-by-hop vs. End-to-end">
- <t>All endpoints implement so called "hop-by-hop" functionality. It's
- the functionality concerned with sending messages to the immediately
- adjacent components and receiving messages from them.</t>
- <t>In addition to that, the endpoints on the edge of the topology
- implement so called "end-to-end" functionality that is concerned
- with issues such as, for example, reliability.</t>
- <figure>
- <artwork>
- end to end
- +-----------------------------------------+
- | |
- +-----+ +-----+-----+ +-----+-----+ +-----+
- | |-->| | |-->| | |-->| |
- | REQ | | REP | REQ | | REP | REQ | | REP |
- | |<--| | |<--| | |<--| |
- +-----+ +-----+-----+ +-----+-----+ +-----+
- | | | | | |
- +---------+ +---------+ +---------+
- hop by hop hop by hop hop by hop
- </artwork>
- </figure>
- <t>To make an analogy with the TCP/IP stack, IP provides hop-by-hop
- functionality, i.e. routing of the packets to the adjacent node,
- while TCP implements end-to-end functionality such resending of
- lost packets.</t>
- <t>As a rule of thumb, raw hop-by-hop endpoints are used to build
- devices (intermediary nodes in the topology) while end-to-end
- endpoints are used directly by the applications.</t>
- <t>To prevent confusion, the specification of the endpoint behaviour
- below will discuss hop-by-hop and end end-to-end functionality in
- separate chapters.</t>
- </section>
- <section title = "Hop-by-hop functionality">
- <section title = "REQ endpoint">
- <t>The REQ endpoint is used by the user to send requests to the
- processing nodes and receive the replies afterwards.</t>
- <t>When user asks REQ endpoint to send a request, the endpoint should
- send it to one of the associated outbound channels (TCP connections
- or similar). The request sent is exactly the message supplied by
- the user. REQ socket MUST NOT modify an outgoing request in any
- way.</t>
- <t>If there's no channel to send the request to, the endpoint won't send
- the request and MUST report the backpressure condition to the user.
- For example, with BSD socket API, backpressure is reported as EAGAIN
- error.</t>
- <t>If there are associated channels but none of them is available for
- sending, i.e. all of them are already reporting backpressure, the
- endpoint won't send the message and MUST report the backpressure
- condition to the user.</t>
- <t>Backpressure is used as a means to redirect the requests from the
- congested parts of the topology to to the parts that are still
- responsive. It can be thought of as a crude scheduling algorithm.
- However crude though, it's probably still the best you can get
- without knowing estimates of execution time for individual tasks,
- CPU capacity of individual processing nodes etc.</t>
- <t>Alternatively, backpressure can be thought of as a congestion control
- mechanism. When all available processing nodes are busy, it slows
- down the client application, i.e. it prevents the user from sending
- any more requests.</t>
- <t>If the channel is not capable of reporting backpressure (e.g. DCCP)
- the endpoint SHOULD consider it as always available for sending new
- request. However, such channels should be used with care as when the
- congestion hits they may suck in a lot of requests just to discard
- them silently and thus cause re-transmission storms later on. The
- implementation of the REQ endpoint MAY choose to prohibit the use
- of such channels altogether.</t>
- <t>When there are multiple channels available for sending the request
- endpoint MAY use any prioritisation mechanism to decide which channel
- to send the request to. For example, it may use classic priorities
- attached to channels and send message to the channel with the highest
- priority. That allows for routing algorithms such as: "Use local
- processing nodes if any are available. Send the requests to remote
- nodes only if there are no local ones available." Alternatively,
- the endpoint may implement weighted priorities ("send 20% of the
- request to node A and 80% to node B). The endpoint also may not
- implement any prioritisation strategy and treat all channels as
- equal.</t>
- <t>Whatever the case, two rules must apply.</t>
- <t>First, by default the priority settings for all channels MUST be
- equal. Creating a channel with different priority MUST be triggered
- by an explicit action by the user.</t>
- <t>Second, if there are several channels with equal priority, the
- endpoint MUST distribute the messages among them in fair fashion
- using round-robin algorithm. The round-robin implementation MUST also
- take care not to become unfair when new channels are added or old
- ones are removed on the fly.</t>
- <t>As for incoming messages, i.e. replies, REQ endpoint MUST fair-queues
- them. In other words, if there are replies available on several
- channels, it MUST receive them in a round-robin fashion. It must also
- take care not to compromise the fairness when new channels are
- added or old ones removed.</t>
- <t>In addition to providing basic fairness, the goal of fair-queueing is
- to prevent DoS attacks where a huge stream of fake replies from one
- channel would be able to block the real replies coming from different
- channels. Fair queueing ensures that messages from every channel are
- received at approximately the same rate. That way, DoS attack can
- slow down the system but it can't entirely block it.</t>
- <t>Incoming replies MUST be handed to the user exactly as they were
- received. REQ endpoint MUST not modify the replies in any way.</t>
- </section>
- <section title = "REP endpoint">
- <t>REP endpoint is used to receive requests from the clients and send
- replies back to the clients.</t>
- <t>First of all, REP socket is responsible for assigning unique 31-bit
- channel IDs to the individual associated channels.</t>
- <t>First ID assigned MUST be random. Next is computed by adding 1 to
- the previous one with potential overflow to 0.</t>
- <t>The implementation MUST ensure that the random number is different
- each time the endpoint is re-started, the process that contains
- it is restarted or similar. So, for example, using pseudo-random
- generator with a constant seed won't do.</t>
- <t>The goal of the algorithm is to the spread of possible channel ID
- values and thus minimise the chance that a reply is routed to an
- unrelated channel, even in the face of intermediate node
- failures.</t>
- <t>When receiving a message, REP endpoint MUST fair-queue among the
- channels available for receiving. In other words it should
- round-robin among such channels and receive one request from
- a channel at a time. It MUST also implement the round-robin
- algorithm is such a way that adding or removing channels don't
- break its fairness.</t>
- <t>In addition to guaranteeing basic fairness in access to computing
- resources the above algorithm makes it impossible for a malevolent
- or misbehaving client to completely block the processing of requests
- from other clients by issuing steady stream of requests.</t>
- <t>After getting hold on the request, the REP socket should prepend it
- by 32 bit value, consisting of 1 bit set to 0 followed by the 31-bit
- ID of the channel the request was received from. The extended request
- will be then handed to the user.</t>
- <t>The goal of adding the channel ID to the request is to be able to
- route the reply back to the original channel later on. Thus, when
- the user sends a reply, endpoint strips first 32 bits off and uses
- the value to determine where it is to be routed.</t>
- <t>If the reply is shorter than 32 bits, it is malformed and
- the endpoint MUST ignore it. Also, if the most relevant bit of the
- 32-bit value isn't set to 0, the reply is malformed and MUST
- be ignored.</t>
- <t>Otherwise, the endpoint checks whether its table of associated
- channels contains the channel with a corresponding ID. If so, it
- sends the reply (with first 32 bits stripped off) to that channel.
- If the channel is not found, the reply MUST be dropped. If the
- channel is not available for sending, i.e. it is applying
- backpressure, the reply MUST be dropped.</t>
- <t>Note that when the reply is unroutable two things might have
- happened. Either there was some kind of network disruption, in which
- case the request will be re-sent later on, or the original client
- have failed or been shut down. In such case the request won't be
- resent, however, it doesn't really matter because there's no one to
- deliver the reply to any more anyway.</t>
- <t>Unlike requests, there's no pushback applied to the replies; they are
- simply dropped. If the endpoint blocked and waited for the channel to
- become available, all the subsequent replies, possibly destined for
- different unblocked channels, would be blocked in the meantime. That
- allows for a DoS attack simply by firing a lot of requests and not
- receiving the replies.</t>
- </section>
-
- </section>
- <section title = "End-to-end functionality">
- <t>End-to-end functionality is built on top of hop-to-hop functionality.
- Thus, an endpoint on the edge of a topology contains all the
- hop-by-hop functionality, but also implements additional
- functionality of its own. This end-to-end functionality acts
- basically as a user of the underlying hop-by-hop functionality.</t>
- <section title = "REQ endpoint">
- <t>End-to-end functionality for REQ sockets is concerned with re-sending
- the requests in case of failure and with filtering out stray or
- outdated replies.</t>
- <t>To be able to do the latter, the endpoint must tag the requests with
- unique 31-bit request IDs. First request ID is picked at random. All
- subsequent request IDs are generated by adding 1 to the last request
- ID and possibly overflowing to 0.</t>
- <t>To improve robustness of the system, the implementation MUST ensure
- that the random number is different each time the endpoint, the
- process or the machine is restarted. Pseudo-random generator with
- fixed seed won't do.</t>
- <t>When user asks the endpoint to send a message, the endpoint prepends
- a 32-bit value to the message, consisting of a single bit set to 1
- followed by a 31-bit request ID and passes it on in a standard
- hop-by-hop way.</t>
- <t>If the hop-by-hop layer reports pushback condition, the end-to-end
- layer considers the request unsent and MUST report pushback condition
- to the user.</t>
- <t>If the request is successfully sent, the endpoint stores the request
- including its request ID, so that it can be resent later on if
- needed. At the same time it sets up a timer to trigger the
- re-transmission in case the reply is not received within a specified
- timeout. The user MUST be allowed to specify the timeout interval.
- The default timeout interval must be 60 seconds.</t>
- <t>When a reply is received from the underlying hop-by-hop
- implementation, the endpoint should strip off first 32 bits from
- the reply to check whether it is a valid reply.</t>
- <t>If the reply is shorter than 32 bits, it is malformed and the
- endpoint MUST ignore it. If the most significant bit of the 32-bit
- value is set to 0, the reply is malformed and MUST be ignored.</t>
- <t>Otherwise, the endpoint should check whether the request ID in
- the reply matches any of the request IDs of the requests being
- processed at the moment. If not so, the reply MUST be ignored.
- It is either a stray message or a duplicate reply.</t>
- <t>Please note that the endpoint can support either one or more
- requests being processed in parallel. Which one is the case depends
- on the API exposed to the user and is not part of this
- specification.</t>
- <t>If the ID in the reply matches one of the requests in progress, the
- reply MUST be passed to the user (with the 32-bit prefix stripped
- off). At the same time the stored copy of the original request as
- well as re-transmission timer must be deallocated.</t>
- <t>Finally, REQ endpoint MUST make it possible for the user to cancel
- a particular request in progress. What it means technically is
- deleting the stored copy of the request and cancelling the associated
- timer. Thus, once the reply arrives, it will be discarded by the
- algorithm above.</t>
- <t>The cancellation allows, for example, the user to time out a request.
- They can simply post a request and if there's no answer in specific
- timeframe, they can cancel it.</t>
- </section>
- <section title = "REP endpoint">
- <t>End-to-end functionality for REP endpoints is concerned with turning
- requests into corresponding replies.</t>
- <t>When user asks to receive a request, the endpoint gets next request
- from the hop-by-hop layer and splits it into the traceback stack and
- the message payload itself. The traceback stack is stored and the
- payload is returned to the user.</t>
- <t>The algorithm for splitting the request is as follows: Strip 32 bit
- tags from the message in one-by-one manner. Once the most significant
- bit of the tag is set, we've reached the bottom of the traceback
- stack and the splitting is done. If the end of the message is reached
- without finding the bottom of the stack, the request is malformed and
- MUST be ignored.</t>
- <t>Note that the payload produced by this procedure is the same as the
- request payload sent by the original client.</t>
- <t>Once the user processes the request and sends the reply, the endpoint
- prepends the reply with the stored traceback stack and sends it on
- using the hop-by-hop layer. At that point the stored traceback stack
- MUST be deallocated.</t>
- <t>Additionally, REP endpoint MUST support cancelling any request being
- processed at the moment. What it means, technically, is that
- state associated with the request, i.e. the traceback stack stored
- by the endpoint is deleted and reply to that particular
- request is never sent.</t>
- <t>The most important use of cancellation is allowing the service
- instances to ignore malformed requests. If the application-level
- part of the request doesn't conform to the application protocol
- the service can simply cancel the request. In such case the reply
- is never sent. Of course, if application wants to send an
- application-specific error massage back to the client it can do so
- by not cancelling the request and sending a regular reply.</t>
- </section>
- </section>
- <section title = "Loop avoidance">
- <t>It may happen that a request/reply topology contains a loop. It becomes
- increasingly likely as the topology grows out of scope of a single
- organisation and there are multiple administrators involved
- in maintaining it. Unfortunate interaction between two perfectly
- legitimate setups can cause loop to be created.</t>
- <t>With no additional guards against the loops, it's likely that
- requests will be caught inside the loop, rotating there forever,
- each message gradually growing in size as new prefixes are added to it
- by each REP endpoint on the way. Eventually, a loop can cause
- congestion and bring the whole system to a halt.</t>
- <t>To deal with the problem REQ endpoints MUST check the depth of the
- traceback stack for every outgoing request and discard any requests
- where it exceeds certain threshold. The threshold should be defined
- by the user. The default value is suggested to be 8.</t>
- </section>
- <section anchor="IANA" title="IANA Considerations">
- <t>New SP endpoint types REQ and REP should be registered by IANA. For
- now, value of 16 should be used for REQ endpoints and value of 17 for
- REP endpoints.</t>
- </section>
- <section anchor="Security" title="Security Considerations">
- <t>The mapping is not intended to provide any additional security to the
- underlying protocol. DoS concerns are addressed within
- the specification.</t>
- </section>
- </middle>
- <back>
- <references>
- <reference anchor='SPoverTCP'>
- <front>
- <title>TCP mapping for SPs</title>
- <author initials='M.' surname='Sustrik' fullname='M. Sustrik'/>
- <date month='August' year='2013'/>
- </front>
- <format type='TXT' target='sp-tcp-mapping-01.txt'/>
- </reference>
- </references>
- </back>
- </rfc>
|