Abstract
Identifying heavy hitters in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection such as detection of denial-of-service attacks. Existing methods generally examine newly arriving items in the stream, perform a small number of operations using a small amount of memory, and still provide guarantees on the identifying accuracy. In high-speed network monitoring, the update speed per item is extremely critical. However, so far as we know, there are no identifying algorithms which can provide constant update time (O(1)) in a weighted data stream. In this paper, we present an algorithm named Weighted Lossy Counting (WLC) which is able to identify heavy hitters in a high-speed weighted data stream with constant update time. WLC employs a novel efficient partially ordered data structure which is able to provide a fast per-item update speed while keeping the memory cost relatively low. We compare WLC with state-of-the-art algorithms for finding heavy hitters in real traffic traces. The experimental results show that WLC performs well in accuracy (recall, precision and average relative error) as other algorithms; moreover it has a much higher update speed at the cost of relatively larger memory space used. A theoretical worst-case memory bound of WLC is also derived in this paper; however, experiments with long real traffic traces show that WLC requires much less space than the theoretical bound in practice.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sommer R, Feldmann A. NetFlow: Information loss or win? In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2002. 173–174
Fred S, Bonald T, Proutiere A, et al. Statistical bandwidth sharing: A study of congestion at flow level. ACM SIGCOMM Comput Commun Rev, 2001, 31: 111–122
Mori T, Kawahara S, Naito S, et al. On the characteristics of Internet traffic variability: Spikes and elephants. In: Proceedings of the 2004 International Symposium on Applications and the Internet. Tokyo: IEEE, 2004. 99–106
Papagiannaki K, Taft N, Bhattachayya S, et al. On the feasibility of identifying elephants in Internet backbone traffic. Sprint ATL Technical Report TR01-ATL-110918, 2001
Zhang Y, Breslau L, Paxson V, et al. On the characteristics and origins of internet flow rates. In: Proceedings of the 2002 SIGCOMM Conference. New York: ACM, 2002. 309–322
Fang W J, Peterson L. Inter-AS traffic patterns and their implications. In: Proceedings of GLOBECOM’99. Rio de Janeireo: IEEE, 1999. 1859–1868
Mahajan R, Floyd S, Wetherall D. Controlling high-bandwidth flows at the congested router. In: Proceedings of IEEE ICNP’01. Washington: IEEE, 2001. 192
Feldmann A, Greenberg A, Lund C, et al. Deriving traffic demands for operational IP networks: Methodology and experience. IEEE/ACM Trans Netw (ToN), 2001, 9: 265–280
Estan C, Varghese G. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans Comput Syst (ToCS), 2003, 21: 270–313
Wang F Y, Yun X C, Wang X F, et al. Identifying heavy hitters in high-speed network monitoring (in Chinese). J Softw, 2007, 18: 3060–3070
Wang H B, Pei Y J, Lin Y, et al. A LRU based algorithm for identifying and measuring large flows (in Chinese). J Electron Inf Tech, 2007, 29: 2487–2492
Pei Y J, Wang H B, Cheng S D, et al. A dual-LRU based algorithm for identifying and measuring large flows (in Chinese). Acta Electron Sin, 2009, 37: 684–691
Raspall F, Sallent S. Adaptive shared-state sampling. In: Proceedings of the 8th ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2008. 271–284
Kumar A, Sung M, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. New York: ACM, 2004. 177–188
Metwally A, Agrawal D, El Abbadi A. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans Database Syst (TODS), 2006, 31: 1095–1133
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. Theor Comput Sci, 2004, 312: 3–15
Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. J Algorithms, 2005, 55: 58–75
Karp R, Shenker S, Papadimitriou C. A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst (TODS), 2003, 28: 51–55
Manku G, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong: VLDB Endowment, 2002. 346–357
Wang W P, Li J Z, Zhang D D, et al. An efficient algorithm for mining approximate frequent item over data streams (in Chinese). J Software, 2007, 18: 884–892
Jin C, Qian W, Sha C, et al. Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New York: ACM, 2003. 287–294
Cormode G, Muthukrishnan S. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans Database Syst (TODS), 2005, 30: 249–278
Cormode G, Hadjieleftheriou M. Finding frequent items in data streams. In: Proceedings of the VLDB Endowment. Auckland: VLDB Endowment, 2008. 1530–1541
Pang Y H, Wang J L, Xu C F. State-of-the-art on frequent pattern mining in data streams (in Chinese). Acta Automat Sin, 2006, 32: 594–602
Agilent Technologies. JTC 003: Mixed packet size throughput. J Internet Test Method, 2004, 16–18. Available at http://advanced.comms.agilent.com/n2x/docs/journal/index.htm
Cao Z, Wang Z. Flow identification for supporting per-flow queueing. In: Proceedings of 9th International Conference on Computer Communications and Networks. Las Vegas: IEEE, 2000. 88–93
Cormode G, Hadjieleftheriou M. Finding frequent items in data streams: Source code. http://www.research.att.com/~marioh/frequent-items/index.html
Cormode G, Muthukrishnan S. MassDAL public code bank. http://www.cs.rutgers.edu/~muthu/massdal.html
CAIDA anonymized OC48 Internet traces dataset. http://www.caida.org/data/passive/passive oc48 dataset.xml
CAIDA anonymized 2008 Internet traces dataset. http://www.caida.org/data/passive/passive 2008 dataset.xml
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Y., Fang, B. & Zhang, Y. Identifying heavy hitters in high-speed network monitoring. Sci. China Inf. Sci. 53, 659–676 (2010). https://doi.org/10.1007/s11432-010-0053-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-010-0053-5