skip to main content
10.1145/1402958.1402968acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Dcell: a scalable and fault-tolerant network structure for data centers

Published: 17 August 2008 Publication History

Abstract

A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. DCell scales doubly exponentially as the node degree increases. DCell is fault tolerant since it does not have single point of failure and its distributed fault-tolerant routing protocol performs near shortest-path routing even in the presence of severe link or node failures. DCell also provides higher network capacity than the traditional tree-based structure for various types of services. Furthermore, DCell can be incrementally expanded and a partial DCell provides the same appealing features. Results from theoretical analysis, simulations, and experiments show that DCell is a viable interconnection structure for data centers.

References

[1]
S. Akers and B. Krishnamurthy. A group-theoretic model for symmetric interconnection networks. IEEE trans. Computers, 1989.
[2]
S. Arnold. Google Version 2.0: The Calculating Predator, 2007. Infonortics Ltd.
[3]
L. Barroso, J. Dean, and U. Hölzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, March-April 2003.
[4]
A. Carter. Do It Green: Media Interview with Michael Manos, 2007. http://edge.technet.com/Media/Doing-IT-Green/.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04, 2004.
[6]
J. Duato, S. Yalamanchili, and L. Ni. Interconnection networks: an engineering approach. Morgan Kaufmann, 2003.
[7]
F. Chang et. al. Bigtable: A Distributed Storage System for Structured Data. In OSDI'06, 2006.
[8]
S. Ghemawat, H. Gobioff, and S. Leung. The Google File System. In ACM SOSP'03, 2003.
[9]
T. Hoff. Google Architecture, July 2007. http://highscalability.com/google-architecture.
[10]
Intel. High-Performance 1000BASE-SX and 1000BASE-LX Gigabit Fiber Connections for Servers. http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/pro1000mf_mf-lx.pdf.
[11]
M. Isard, M. Budiu, and Y. Yu. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In ACM EuroSys, 2007.
[12]
F. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays. Trees. Hypercubes. Morgan Kaufmann, 1992.
[13]
K. Liszka, J. Antonio, and H. Siegel. Is an Alligator Better Than an Armadillo? IEEE Concurrency, Oct-Dec 1997.
[14]
D. Loguinov, A. Kumar, V. Rai, and S. Ganesh. Graph-Theoretic Analysis of Structured Peer-to-Peer Systems: Routing Distances and Fault Resilience. In ACM SIGCOMM, 2003.
[15]
J. Moy. OSPF Version 2, April 1998. RFC 2328.
[16]
L. Ni and P. McKinley. A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, Feb 1993.
[17]
B. Parhami. Introduction to Parallel Processing: Algorithms and Architectures. Kluwer Academic, 2002.
[18]
Jon Postel. Internet Protocol. RFC 791.
[19]
L. Rabbe. Powering the Yahoo! network, 2006. http://yodel.yahoo.com/2006/11/27/powering-the-yahoo-network/.
[20]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In ACM SIGCOMM'01, 2001.
[21]
H. Jay Seigel, W. Nation, C. Kruskal, and L. Napolitando. Using the Multistage Cube Network Topology in Parallel Supercomputers. Proceedings of the IEEE, Dec 1989.
[22]
J. Snyder. Microsoft: Datacenter Growth Defies Moore's Law, 2007. http://www.pcworld.com/article/id,130921/article.html.
[23]
I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In ACM SIGCOMM'01, 2001.

Cited By

View all
  • (2024)ScaleDFS: Accelerating Decentralized and Private File Sharing via Scaling Directed Acyclic Graph ProcessingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658690(295-308)Online publication date: 3-Jun-2024
  • (2024)Comprehensive Performance and Robustness Analysis of Expander-Based Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.330697121:1(670-683)Online publication date: Feb-2024
  • (2024)AlveoliNet: An Incrementally Scalable, Cost-Effective, and High-Performance Two-Layer Based Architecture for Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.341889011:5(4413-4427)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. Dcell: a scalable and fault-tolerant network structure for data centers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication
      August 2008
      452 pages
      ISBN:9781605581750
      DOI:10.1145/1402958
      • cover image ACM SIGCOMM Computer Communication Review
        ACM SIGCOMM Computer Communication Review  Volume 38, Issue 4
        October 2008
        436 pages
        ISSN:0146-4833
        DOI:10.1145/1402946
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 August 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data center
      2. fault-tolerance
      3. network topology
      4. throughput

      Qualifiers

      • Research-article

      Conference

      SIGCOMM '08
      Sponsor:
      SIGCOMM '08: ACM SIGCOMM 2008 Conference
      August 17 - 22, 2008
      WA, Seattle, USA

      Acceptance Rates

      Overall Acceptance Rate 462 of 3,389 submissions, 14%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)484
      • Downloads (Last 6 weeks)64
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ScaleDFS: Accelerating Decentralized and Private File Sharing via Scaling Directed Acyclic Graph ProcessingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658690(295-308)Online publication date: 3-Jun-2024
      • (2024)Comprehensive Performance and Robustness Analysis of Expander-Based Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.330697121:1(670-683)Online publication date: Feb-2024
      • (2024)AlveoliNet: An Incrementally Scalable, Cost-Effective, and High-Performance Two-Layer Based Architecture for Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.341889011:5(4413-4427)Online publication date: Sep-2024
      • (2024)Towards Easy-to-Monitor Networks: Network Design and Measurement Path ConstructionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.341878111:5(4397-4412)Online publication date: Sep-2024
      • (2024)Kirigami, the Verifiable Art of Network CuttingIEEE/ACM Transactions on Networking10.1109/TNET.2024.336037132:3(2447-2462)Online publication date: Jun-2024
      • (2024)A New Measure of Fault-Tolerance for Network Reliability: Double-Structure ConnectivityIEEE/ACM Transactions on Networking10.1109/TNET.2023.330561132:1(874-889)Online publication date: Feb-2024
      • (2024)Learning to Configure Converters in Hybrid Switching Data Center NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2023.329480332:1(520-534)Online publication date: Feb-2024
      • (2024)Topologies in distributed machine learning: Comprehensive survey, recommendations and future directionsNeurocomputing10.1016/j.neucom.2023.127009567(127009)Online publication date: Jan-2024
      • (2024)SARS: Towards minimizing average Coflow Completion Time in MapReduce systemsComputer Networks10.1016/j.comnet.2024.110429247(110429)Online publication date: Jun-2024
      • (2023)Arithmetic Study about Efficiency in Network Topologies for Data CentersNetwork10.3390/network30300153:3(298-325)Online publication date: 26-Jun-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media