skip to main content
Error recovery in scalable reliable multicast
Publisher:
  • University of Southern California
  • Computer Science Dept. 200 University Park Los Angeles, CA
  • United States
ISBN:978-0-591-88638-2
Order Number:AAI9835111
Pages:
125
Reflects downloads up to 14 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

Several protocols have been proposed to provide reliable multicast transport. However, whereas TCP is a generic transport protocol for reliable unicast transmission, unfortunately, most of the proposed multicast solutions are designed for specific applications. A protocol that supports the requirements of one application may not fit the requirements of another. Furthermore, few of them scale well to both the size and dynamics of the network and multicast group. As a result, people often end up using multiple point-to-point reliable connections instead of taking advantage of more efficient multicast data delivery. Since applications may have different definitions of reliability in a multicast environment, they require different support from their underlying reliable multicast data delivery protocols, we focus on a fundamental framework that provides minimum reliability services and leaves semantics and flexibility to applications themselves as much as possible.

Generally speaking, there are two basic services required by a reliable multicast protocol; an error recovery mechanism to repair lost data, and a congestion control mechanism to regulate traffic flows. Our design target is applications that run over the Internet. Therefore, the approach must scale well with both the network size and the group size; it must support both network topology change and dynamic membership change efficiently; it must be simple and robust; and it must require no special support from IP other than a multicast routing facility.

This thesis started with the receiver-initiated reliable delivery model in SRM and investigates error recovery mechanisms to improve the efficiency and the scalability. We investigate, through analysis and simulation, the relationship between the timer setting parameters and error recovery performance in SRM. The performance metrics are error recovery delay and duplicates per loss. We propose an algorithm where the waiting period is proportional to a member's neighborhood size, and where a member's estimate of neighborhood sizes is based on the observations of current performance.

To further improve the scalability of SRM, one must localize the scope of error recovery traffic. We present two approaches to local recovery: hop-based scope control and use of local recovery groups. The first approach uses hop count to limit the distribution of requests and replies, whereas the second approach confines error recovery traffic using separately addressed local recovery groups. The local recovery groups and hop count settings are automatically created and dynamically adjusted based on observed loss patterns.

These techniques offer significant enhancement to SRM performance, as shown by the simulation results; they not only improve protocol scalability, but also provide efficient support for dynamic multicast sessions.

Contributors
  • Cornell Tech
  • University of Southern California

Recommendations