sp-surveyor-01.xml 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647
  1. <?xml version="1.0" encoding="US-ASCII"?>
  2. <!DOCTYPE rfc SYSTEM "rfc2629.dtd">
  3. <rfc category="info" docName="sp-surveyor-01">
  4. <front>
  5. <title abbrev="Surveor/Respondent SP">
  6. Surveyor/Respondent Scalability Protocol
  7. </title>
  8. <author fullname="Garrett D'Amore" initials="G." role="editor"
  9. surname="D'Amore">
  10. <address>
  11. <email>garrett@damore.org</email>
  12. </address>
  13. </author>
  14. <date month="March" year="2015" />
  15. <area>Applications</area>
  16. <workgroup>Internet Engineering Task Force</workgroup>
  17. <keyword>Surveyor</keyword>
  18. <keyword>Respondent</keyword>
  19. <keyword>SURVEYOR</keyword>
  20. <keyword>RESPONDENT</keyword>
  21. <keyword>stateless</keyword>
  22. <keyword>service</keyword>
  23. <keyword>SP</keyword>
  24. <abstract>
  25. <t>This document defines a scalability protocol used for performing
  26. surveys and collecting responses amongst a number of stateless
  27. processing nodes, and returning the results of those
  28. surveyors. This protocol can be used for solving such problems
  29. as voting (consensus algorithms), presence detection, and peer
  30. discovery.</t>
  31. </abstract>
  32. </front>
  33. <middle>
  34. <section title = "Introduction">
  35. <t>A fairly common problem in building distributed applications is
  36. peer discovery -- or how do you find your peers. For example, imagine
  37. an internet chat type application, where server wants to determine
  38. the presence of all peers, including perhaps some information such
  39. as their unique social networking handle.</t>
  40. <t>Another similar problem involves voting algorithms, where a survey
  41. of all connected peers is required to arrive to some solution to
  42. a problem. This is common with distributed consensus algorithms.</t>
  43. <t>One of the most common problems in distributed applications is how to
  44. delegate a work to another processing node and get the result back to
  45. the original node. In other words, the goal is to utilise the CPU
  46. power of a remote node.</t>
  47. <t>It turns out that these problems are very similar. We can assume
  48. potential participants will register with a central process. Once
  49. that is done, the central process can send out a survey request
  50. to the participants when it wants to perform a survey.</t>
  51. <t>Also, note that it is reasonable and possible for a participant to
  52. decline to participate (i.e. decline to respond.) This can happen
  53. due to loss of network connectivity, or can represent a conscious
  54. decision on the part of the respondent.</t>
  55. <t>For example, a real-world example of this would be asking audience
  56. members to raise their hands if they like the color red. The act of
  57. raising one's hand can be thought of as responding.</t>
  58. <t>As a consequence, taken generally, the surveyor should not infer any
  59. thing about parties it doesn't get a response from. Perhaps the
  60. respondent simply
  61. didn't hear the question, or perhaps she declines to self-identify.</t>
  62. <t>This measn that surveying should be thought of as a best-effort
  63. service. Applications which need more resilience may repeat
  64. their inquiries. It is common in other networking protocols to
  65. do so periodically, and only "expire" the response from a peer that
  66. is non-responsive after it has missed several successive surveys.</t>
  67. <t>Furthermore, the act of asking a question has to be time bounded.
  68. This is particularly important if multiple surveys are to be issued.
  69. Sufficient time for responses from the first survey to occur must
  70. pass before starting a new one, unless some other identifying
  71. content is present to distinguish the results from one survey from
  72. another. (Going back to our raised hands, imagine two questions
  73. asked in rapid succession, one if you like the color red, the other
  74. if you like the color blue. If only one hand is used, and there is
  75. not sufficient time between the questions, it becomes impossible to
  76. distinguish which color is preferred. Of course, if one uses two
  77. hands -- a distinguishing identifier, now we can have two surveys
  78. running in parallel. Fortunately we usually have more bits available
  79. for conveying this kind of information in network protocols.)</t>
  80. <t>In all cases the act of surveying and replying can be thought of as
  81. state-less. In otherwords, a given response should not depend upon
  82. the content of any prior surveys. Ideally, because of the best-effort
  83. nature of this, it is also beneficial if surveying is itself
  84. idempotent, i.e. the act of responding to a survey should not itself
  85. change state on the respondent.</t>
  86. <t>Generally there are few common scenarios that come up with real-world
  87. situations. Here are some of them.
  88. <list style = "numbers">
  89. <t>One surveyor issues one survey, and then zero, one or many
  90. responders reply. The surveyor collects then these responses
  91. over a period of time before issuing a new survey.</t>
  92. <t>One surveyor issues multiple surveys, distinguishing which
  93. replies are to which survey based on some identifying content.
  94. For example, this can be thought of like ARP, where multiple
  95. requests can be outstanding.</t>
  96. <t>Multiple surveyors issue surveys, but one each at a time.
  97. Responders reply to each of these as appropriate. For
  98. example, imagine a network with two print clients and a number
  99. of networked printers. Both clients may occasionally desire
  100. to inquire as supply levels, and since they don't talk to
  101. each other, the replies may go to either system.</t>
  102. <t>Multiple surveyors issuing multiple surveys concurrently.
  103. This is the combination of the second and third cases above.</t>
  104. </list>
  105. </t>
  106. </section>
  107. <section title = "Underlying protocol">
  108. <t>The surveyor/respondent protocol can be run on top of any SP mapping,
  109. such as, for example, <xref target='SPoverTCP'>SP TCPmapping</xref>.
  110. </t>
  111. <t>Also, given that SP protocols describe the behaviour of entire
  112. arbitrarily complex topology rather than of a single node-to-node
  113. communication, several underlying protocols can be used in parallel.
  114. For example, a client may send a request via WebSocket, then, on the
  115. edge of the company network an intermediary node may retransmit it
  116. using TCP etc.</t>
  117. <figure>
  118. <artwork>
  119. +---+ WebSocket +---+ TCP +---+
  120. | |-------------| |-----------| |
  121. +---+ +---+ +---+
  122. | |
  123. +---+ IPC | | SCTP +---+ DCCP +---+
  124. | |---------+ +--------| |-----------| |
  125. +---+ +---+ +---+
  126. </artwork>
  127. </figure>
  128. </section>
  129. <section title = "Overview of the algorithm">
  130. <t>Surveyor/respondent protocol defines two different endpoint types:
  131. The SURVEYOR and the replier or RESPONDENT.</t>
  132. <t>A SURVEYOR endpoint can be connected only to a RESPONDENT endpoint,
  133. and vice versa. If the underlying protocol
  134. indicates that there's an attempt to create a channel to an
  135. incompatible endpoint, the channel MUST NOT be used. In the case of
  136. TCP mapping, for example, the underlying TCP connection MUST
  137. be closed.</t>
  138. <t>When creating more complex topologies, SURVEYOR and RESPONDENT
  139. endpoints are paired in the intermediate nodes to form a
  140. forwarding component,
  141. so called "device". Device receives requests from the SURVEYOR endpoint
  142. and forwards them to the RESPONDENT endpoint. At the same time it
  143. receives replies from the RESPONDENT endpoint and forwards them to
  144. the SURVEYOR endpoint:</t>
  145. <figure>
  146. <artwork>
  147. --- surveys --&gt;
  148. +----------+ +------------+----------+ +------------+
  149. | |--&gt;| | |--&gt;| |
  150. | SURVEYOR | | RESPONDENT | SURVEYOR | | RESPONDENT |
  151. | |&lt;--| | |&lt;--| |
  152. +----------+ +------------+----------+ +------------+
  153. &lt;-- responses ---
  154. </artwork>
  155. </figure>
  156. <t>Using devices, arbitrary complex topologies can be built. The rest
  157. of this section explains how are the requests routed through a topology
  158. towards processing nodes and how are responses routed back from
  159. processing nodes to the original clients.</t>
  160. <t>Because the delivery of both surveys and responses is handled on
  161. a best-effort basis, when the transport is faced with pushback, it
  162. is acceptable for the implementation to drop the message.</t>
  163. <t>Applications expecting resilience in the face of such events should
  164. expect to perform multiple surveys over time; a failure to respond
  165. to a survey shall not be taken as a critical fault.</t>
  166. <t>As for delivering replies back to the clients, it should be understood
  167. that the client may not be directly accessible (say using TCP/IP) from
  168. the processing node. It may be beyond a firewall, have no static IP
  169. address etc. Furthermore, the client and the processing may not even
  170. speak the same transport protocol -- imagine client connecting to the
  171. topology using WebSockets and processing node via SCTP.</t>
  172. <t>Given the above, it becomes obvious that the replies must be routed
  173. back through the existing topology rather than directly. In fact,
  174. surveyor/respondent topology may be thought of as an overlay network
  175. on the top of underlying transport mechanisms.</t>
  176. <t>As for routing replies within the surveyor/respondent topology, it
  177. is designed in
  178. such a way that each reply contains the whole routing path, rather
  179. than containing just the address of destination node, as is the case
  180. with, for example, TCP/IP.</t>
  181. <t>The downside of the design is that surveys and responses are a
  182. little bit longer. Also this assumes symmetric connectivity in the
  183. underlying transports.</t>
  184. <t>The upside, on the other hand, is that the nodes in the topology don't
  185. have to maintain any routing tables beside the simple table of
  186. adjacent channels along with their IDs. There's also no need for any
  187. additional protocols for distributing routing information within
  188. the topology.</t>
  189. <t>The most important reason for adopting the design though is that
  190. there's no propagation delay and any nodes becomes accessible
  191. immediately after it is started. Given that some nodes in the topology
  192. may be extremely short-lived this is a crucial requirement. Imagine
  193. a database client that sends a survey, gets a single response, and
  194. then immediately answers. (Think of a simple question like "is
  195. anyone here?" A single reply is sufficies to answer the question.)
  196. It makes no sense to delay the whole process until the routing tables
  197. are synchronised between the client and the server.</t>
  198. <t>The algorithm thus works as follows: When a survey is routed from the
  199. client to the processing node, every RESPONDENT endpoint determines
  200. which channel it was received from and adds the ID of the channel to
  201. the survey. Thus, when the survey arrives at the ultimate respondent
  202. it already contains a full backtrace stack, which in turn contains
  203. all the info needed to route a message back to the original
  204. surveyor.</t>
  205. <t>After processing the survey, the responding node attaches the
  206. backtrace stack from the survey to the response and sends it back
  207. to the topology. At that point every RESPONDENT endpoint can check the
  208. traceback and determine which channel it should send the reply to.</t>
  209. <t>In addition to routing, surveyor/respondent protocol takes care of
  210. matching responses and surveys. That is, it can ensure that a given
  211. response cannot be mismatched to a different survey.</t>
  212. <t>In order to avoid confusion, after the surveyor has received all the
  213. responses it expects to (typically when a period of time has passed),
  214. it should discard further stray responses.</t>
  215. <t>The surveyor thus adds an unique request ID to the survey. The ID gets
  216. copied from the survey to the response by the responding node. When the
  217. response gets back to the surveyor, it can simply check whether the
  218. survey in question is still being outstanding and if not so, it can
  219. ignore the response.</t>
  220. <t>To implement all the functionality described above, messages (both
  221. surveys and responses have the following format:</t>
  222. <figure>
  223. <artwork>
  224. +-+------------+-+------------+ +-+------------+-------------+
  225. |0| Channel ID |0| Channel ID |...|1| Request ID | payload |
  226. +-+------------+-+------------+ +-+------------+ ------------+
  227. </artwork>
  228. </figure>
  229. <t>The payload of the message is preceded by a stack of 32-bit tags.
  230. The most significant bit of each tag is set to 0 except for the very
  231. last tag.
  232. That allows the algorithm to find out where the tags end and where
  233. the message payload begins.</t>
  234. <t>As for the remaining 31 bits, they are either survey ID (in the last
  235. tag) or a channel ID (in all the remaining tags). The first channel ID
  236. is added and processed by the RESPONDENT endpoint closest to the
  237. processing
  238. node. The last channel ID is added and processed by the RESPONDENT
  239. endpoint closest to the client.</t>
  240. <t>Following picture shows an example of request saying "Hello" being
  241. routed from the client through two intermediate nodes to the
  242. processing node and the reply "World" being routed back. It shows
  243. what messages are passed over the network at each step of the
  244. process:</t>
  245. <figure>
  246. <artwork>
  247. client
  248. Hello | World
  249. | +------------+ ^
  250. | | SURVEYOR | |
  251. V +------------+ |
  252. 1|823|Hello | 1|823|World
  253. | +------------+ ^
  254. | | RESPONDENT | |
  255. | +------------+ |
  256. | | SURVEYOR | |
  257. V +------------+ |
  258. 0|299|1|823|Hello | 0|299|1|823|World
  259. | +------------+ ^
  260. | | RESPONDENT | |
  261. | +------------+ |
  262. | | SURVEYOR | |
  263. V +------------+ |
  264. 0|446|0|299|1|823|Hello | 0|446|0|299|1|823|World
  265. | +------------+ ^
  266. | | RESPONDENT | |
  267. V +------------+ |
  268. Hello | World
  269. server
  270. </artwork>
  271. </figure>
  272. </section>
  273. <section title = "Hop-by-hop vs. End-to-end">
  274. <t>All endpoints implement so called "hop-by-hop" functionality. It's
  275. the functionality concerned with sending messages to the immediately
  276. adjacent components and receiving messages from them.</t>
  277. <t>To make an analogy with the TCP/IP stack, IP provides hop-by-hop
  278. functionality, i.e. routing of the packets to the adjacent node,
  279. while TCP implements end-to-end functionality such resending of
  280. lost packets.</t>
  281. <t>As a rule of thumb, raw hop-by-hop endpoints are used to build
  282. devices (intermediary nodes in the topology) while end-to-end
  283. endpoints are used directly by the applications.</t>
  284. <t>To prevent confusion, the specification of the endpoint behaviour
  285. below will discuss hop-by-hop and end end-to-end functionality in
  286. separate chapters.</t>
  287. </section>
  288. <section title = "Hop-by-hop functionality">
  289. <section title = "SURVEYOR endpoint">
  290. <t>The SURVEYOR endpoint is used by the user to send surveyor to the
  291. responding nodes and receive the responses afterwards.</t>
  292. <t>When user asks the SURVEYOR endpoint to send a request, the
  293. endpoint should
  294. send it to ALL of the associated outbound channels (TCP connections
  295. or similar). The request sent is exactly the message supplied by
  296. the user. SURVEYOR sockets MUST NOT modify an outgoing survey in
  297. any way.</t>
  298. <t>If there's no channel to send the survey to, the survey is merely
  299. discarded. The endpoint MAY report the backpressure condition to
  300. the user as well.</t>
  301. <t>If there are associated channels but none of them is available for
  302. sending, i.e. all of them are already reporting backpressure, the
  303. endpoint won't send the message and MAY report the backpressure
  304. condition to the user. The actual survey is discarded.</t>
  305. <t>If the channel is not capable of reporting backpressure (e.g. DCCP)
  306. the endpoint SHOULD consider it as always available for sending new
  307. request.</t>
  308. <t>When there are multiple channels available for sending the survey
  309. endpoint MUST deliver the survey to all of them.</t>
  310. <t>As for incoming messages, i.e. responses, SURVEYOR endpoints MUST
  311. fair-queue them. In other words, if there are replies available
  312. on several channels, they MUST receive them in a round-robin fashion.
  313. They must also take care not to compromise the fairness when new
  314. channels are added or old ones removed.</t>
  315. <t>In addition to providing basic fairness, the goal of fair-queueing is
  316. to prevent DoS attacks where a huge stream of fake responses from one
  317. channel would be able to block the real replies coming from different
  318. channels. Fair queueing ensures that messages from every channel are
  319. received at approximately the same rate. That way, DoS attack can
  320. slow down the system but it can't entirely block it.</t>
  321. <t>Incoming responses MUST be handed to the user exactly as they were
  322. received. SURVEYOR endpoints MUST not modify the responses in any
  323. way.</t>
  324. </section>
  325. <section title = "RESPONDENT endpoint">
  326. <t>RESPONDENT endpoints are used to receive surveys from the clients
  327. and send resopnses back to the clients.</t>
  328. <t>First of all, each RESPONDENT socket is responsible for assigning
  329. unique 31-bit channel IDs to the individual associated channels.</t>
  330. <t>The first ID assigned MUST be random. Next is computed by adding 1 to
  331. the previous one with potential overflow to 0.</t>
  332. <t>The implementation MUST ensure that the random number is different
  333. each time the endpoint is re-started, the process that contains
  334. it is restarted or similar. So, for example, using pseudo-random
  335. generator with a constant seed won't do.</t>
  336. <t>The goal of the algorithm is to the spread of possible channel ID
  337. values and thus minimise the chance that a response is routed to an
  338. unrelated channel, even in the face of intermediate node
  339. failures.</t>
  340. <t>When receiving a message, RESPONDENT endpoints MUST fair-queue
  341. among the channels available for receiving. In other words they
  342. should round-robin among such channels and receive one request from
  343. a channel at a time. They MUST also implement the round-robin
  344. algorithm is such a way that adding or removing channels doesn't
  345. break the fairness.</t>
  346. <t>In addition to guaranteeing basic fairness in access to computing
  347. resources the above algorithm makes it impossible for a malevolent
  348. or misbehaving client to completely block the processing of requests
  349. from other clients by issuing steady stream of surveys.</t>
  350. <t>After receiving the survey, the RESPONDENT socket should prepend it
  351. by 32 bit value, consisting of 1 bit set to 0 followed by the 31-bit
  352. ID of the channel the request was received from. The extended survey
  353. will be then handed to the user.</t>
  354. <t>The goal of adding the channel ID to the response is to be able to
  355. route the response back to the original channel later on. Thus, when
  356. the user sends a response, endpoint strips first 32 bits off and uses
  357. the value to determine where it is to be routed.</t>
  358. <t>If the response is shorter than 32 bits, it is malformed and
  359. the endpoint MUST ignore it. Also, if the most relevant bit of the
  360. 32-bit value isn't set to 0, the response is malformed and MUST
  361. be ignored.</t>
  362. <t>Otherwise, the endpoint checks whether its table of associated
  363. channels contains the channel with a corresponding ID. If so, it
  364. sends the response (with first 32 bits stripped off) to that channel.
  365. If the channel is not found, the response MUST be dropped. If the
  366. channel is not available for sending, i.e. it is applying
  367. backpressure, the response MUST be dropped.</t>
  368. <t>Note that when the response is unroutable two things might have
  369. happened. Either there was some kind of network disruption, in which
  370. case the survey may be re-sent later on, or the original client
  371. have failed or been shut down. In such case the survey won't be
  372. resent, however, it doesn't really matter because there's no one to
  373. deliver the response to any more anyway.</t>
  374. <t>Unlike surveys, there's never pushback applied to the responses; they
  375. are simply dropped. If the endpoint blocked and waited for the
  376. channel to become available, all the subsequent replies, possibly
  377. destined for
  378. different unblocked channels, would be blocked in the meantime. That
  379. allows for a DoS attack simply by firing a lot of surveys and not
  380. receiving the responses.</t>
  381. </section>
  382. </section>
  383. <section title = "End-to-end functionality">
  384. <t>End-to-end functionality is built on top of hop-to-hop functionality.
  385. Thus, an endpoint on the edge of a topology contains all the
  386. hop-by-hop functionality, but also implements additional
  387. functionality of its own. This end-to-end functionality acts
  388. basically as a user of the underlying hop-by-hop functionality.</t>
  389. <section title = "SURVEYOR endpoint">
  390. <t>End-to-end functionality for SURVEYOR sockets is concerned with
  391. matching the responses to surveys, and with filtering out stray or
  392. outdated responses.</t>
  393. <t>To be able to do this, the endpoint must tag the survey with
  394. unique 31-bit survey IDs. First survey ID is picked at random. All
  395. subsequent survey IDs are generated by adding 1 to the last survey
  396. ID and possibly overflowing to 0.</t>
  397. <t>To improve robustness of the system, the implementation MUST ensure
  398. that the random number is different each time the endpoint, the
  399. process or the machine is restarted. Pseudo-random generator with
  400. fixed seed won't do.</t>
  401. <t>When user asks the endpoint to send a message, the endpoint prepends
  402. a 32-bit value to the message, consisting of a single bit set to 1
  403. followed by a 31-bit survey ID and passes it on in a standard
  404. hop-by-hop way.</t>
  405. <t>If the hop-by-hop layer reports pushback condition, the end-to-end
  406. layer considers the survey unsent and MAY report pushback condition
  407. to the user.</t>
  408. <t>If the survey is successfully sent, the endpoint stores the survey
  409. including its survey ID, so that it can be resent later on if
  410. needed. At the same time it sets up a timer to receive all of the
  411. responses. The user MUST be allowed to specify the timeout interval.
  412. The default timeout interval must be 60 seconds.</t>
  413. <t>When a response is received from the underlying hop-by-hop
  414. implementation, the endpoint should strip off first 32 bits from
  415. the response to check whether it is a valid reply.</t>
  416. <t>If the response is shorter than 32 bits, it is malformed and the
  417. endpoint MUST ignore it. If the most significant bit of the 32-bit
  418. value is set to 0, the reply is malformed and MUST be ignored.</t>
  419. <t>Otherwise, the endpoint should check whether the survey ID in
  420. the response matches any of the survey IDs of the surveys being
  421. processed at the moment. If not so, the response MUST be ignored.
  422. It is either a stray message or a too-long delayed response.</t>
  423. <t>Please note that the endpoint can support either one or more
  424. surveys being processed in parallel. Which one is the case depends
  425. on the API exposed to the user and is not part of this
  426. specification.</t>
  427. <t>If the ID in the response matches one of the surveys in progress, the
  428. response MUST be passed to the user (with the 32-bit prefix stripped
  429. off).</t>
  430. <t>A SURVEYOR endpoint MUST make it possible for the user to
  431. cancel a particular survey in progress. What it means technically is
  432. deleting the stored copy of the survey and cancelling the associated
  433. timer. Thus, once the response arrives, it will be discarded by the
  434. algorithm above.</t>
  435. <t>Finally, when the timeout for a survey expires, then the survey
  436. must be canceled in a manner similar to user-initiated cancelation.
  437. That is, the stored copy of the survey must be deleted, the timer
  438. removed, and any further responses received with the same survey ID
  439. are subsequently discarded.</t>
  440. </section>
  441. <section title = "RESPONDENT endpoint">
  442. <t>End-to-end functionality for RESPONDENT endpoints is concerned with
  443. turning surveys into corresponding responses.</t>
  444. <t>When user asks to receive a survey, the endpoint gets next request
  445. from the hop-by-hop layer and splits it into the traceback stack and
  446. the message payload itself. The traceback stack is stored and the
  447. payload is returned to the user.</t>
  448. <t>The algorithm for splitting the survey is as follows: Strip 32 bit
  449. tags from the message in one-by-one manner. Once the most significant
  450. bit of the tag is set, we've reached the bottom of the traceback
  451. stack and the splitting is done. If the end of the message is reached
  452. without finding the bottom of the stack, the survey is malformed and
  453. MUST be ignored.</t>
  454. <t>Note that the payload produced by this procedure is the same as the
  455. survey payload sent by the original client.</t>
  456. <t>Once the user processes the survey and sends the response, the
  457. endpoint prepends the response with the stored traceback stack and
  458. sends it on using the hop-by-hop layer. At that point the stored
  459. traceback stack MUST be deallocated.</t>
  460. <t>Additionally, RESPONDENT endpoints MUST support cancelling any
  461. survey being processed at the moment. What it means, technically,
  462. is that state associated with the survey, i.e. the traceback stack
  463. stored by the endpoint is deleted and reply to that particular
  464. survey is never sent.</t>
  465. <t>The most important use of cancellation is allowing the service
  466. instances to ignore surveys (whether due to malformation or for
  467. other application specific reasons.) In such case the reply
  468. is never sent. Of course, if application wants to send an
  469. application-specific error massage back to the client it can do so
  470. by not cancelling the survey and sending a regular response.</t>
  471. </section>
  472. </section>
  473. <section title = "Loop avoidance">
  474. <t>It may happen that a request/reply topology contains a loop. It becomes
  475. increasingly likely as the topology grows out of scope of a single
  476. organisation and there are multiple administrators involved
  477. in maintaining it. Unfortunate interaction between two perfectly
  478. legitimate setups can cause loop to be created.</t>
  479. <t>With no additional guards against the loops, it's likely that
  480. requests will be caught inside the loop, rotating there forever,
  481. each message gradually growing in size as new prefixes are added to it
  482. by each RESPONDENT endpoint on the way. Eventually, a loop can cause
  483. congestion and bring the whole system to a halt.</t>
  484. <t>To deal with the problem SURVEYOR endpoints MUST check the depth of the
  485. traceback stack for every outgoing request and discard any requests
  486. where it exceeds certain threshold. The threshold SHOULD be defined
  487. by the user. The default value is suggested to be 8.</t>
  488. </section>
  489. <section anchor="IANA" title="IANA Considerations">
  490. <t>New SP endpoint types SURVEYOR and RESPONDENT should be registered by
  491. IANA. For now, value of 98 should be used for SURVEYOR endpoints and
  492. value of 99 for RESPONDENT endpoints. (An earlier similar protocol
  493. without the backtrace headers used protocol numbers 96 and 97.)</t>
  494. </section>
  495. <section anchor="Security" title="Security Considerations">
  496. <t>The mapping is not intended to provide any additional security to the
  497. underlying protocol. DoS concerns are addressed within
  498. the specification.</t>
  499. </section>
  500. </middle>
  501. <back>
  502. <references>
  503. <reference anchor='SPoverTCP'>
  504. <front>
  505. <title>TCP mapping for SPs</title>
  506. <author initials='M.' surname='Sustrik' fullname='M. Sustrik'/>
  507. <date month='August' year='2013'/>
  508. </front>
  509. <format type='TXT' target='sp-tcp-mapping-01.txt'/>
  510. </reference>
  511. </references>
  512. </back>
  513. </rfc>