Abstract
Failures are unavoidable in communication networks, so their detection and identification are vital for the reliable operation of the networks. The existing fault diagnosis techniques are based on many paradigms derived from different areas (e.g., mathematical theories, machine learning, statistical analysis) and with different purposes, such as, obtaining a representation model of the network for fault localization, selecting optimal probe sets for monitoring network devices, reducing fault detection time, and detection of faulty components in the network. Nevertheless, there are still challenges to be faced because those techniques are invasive on account of they increase network traffic and the control overhead. Also, they intensify the internal processes of the network through expanding management processes or monitoring agents on almost all networking devices. This paper introduces a non-invasive fault detection approach based on the observation of symptoms of internal network failures in gateway routers (called peripheral elements). We developed a link failure induction experiment in an emulated network that evidenced the existence of the fault propagation phenomenon to a peripheral level, which demonstrates the feasibility of our approach. Our results foster the use of learning techniques which do not require a complete dependency model of the network and could continuously diagnose the failure symptoms while being resilient to the dynamic changes of the network.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dusia, A., Sethi, A.S.: Recent advances in fault localization in computer networks. IEEE Commun. Surv. Tutor. 18(4), 3030–3051 (2016). https://doi.org/10.1109/COMST.2016.2570599
Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53(2), 165–194 (2004). https://doi.org/10.1016/j.scico.2004.01.010
Kitchenham, B., Charters, S.: Guidelines for Performing Systematic Literature Reviews in Software Engineering. Elsevier, Amsterdam (2017)
Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53(2), 165–194 (2004). https://doi.org/10.1016/j.scico.2004.01.010
Dusia, A., Sethi, A.S.: Recent advances in fault localization in computer networks. IEEE Commun. Surv. Tutor. 18(4), 3030–3051 (2016). https://doi.org/10.1109/COMST.2016.2570599
Yu, L., Cheng, L., Qiao, Y., Yuan, Y., Chen, X.: An efficient active probing approach based on the combination of online and offline strategies. In: 2010 International Conference on Network and Service Management, 2010, pp. 298–301. https://doi.org/10.1109/CNSM.2010.5691213
Lu, L., Xu, Z., Wang, W., Sun, Y.: A new fault detection method for computer networks. Reliab. Eng. Syst. Saf. 114, 45–51 (2013). https://doi.org/10.1016/j.ress.2012.12.015
Gillani, S.F., Demirci, M., Al-Shaer, E., Ammar, M.H.: Problem localization and quantification using formal evidential reasoning for virtual networks. IEEE Trans. Netw. Serv. Manag. 11(3), 307–320 (2014). https://doi.org/10.1109/TNSM.2014.2326297
Yan, C., Wang, Y., Qiu, X., Li, W., Guan, L.: Multi-layer fault diagnosis method in the Network Virtualization Environment. In The 16th Asia-Pacific Network Operations and Management Symposium, Sep. 2014, pp. 1–6. https://doi.org/10.1109/APNOMS.2014.6996580
Wang, H., Wang, Y., Qiu, X., Li, W., Xiao, A.: Fault diagnosis based on evidences screening in virtual network. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), May 2015, pp. 802–805. https://doi.org/10.1109/INM.2015.7140380
Steinert, R., Gillblad, D.: Long-Term Adaptation and Distributed Detection of Local Network Changes. In: 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, Dec 2010, pp. 1–5. https://doi.org/10.1109/GLOCOM.2010.5684137
Prieto, A.G., Gillblad, D., Steinert, R., Miron, A.: Toward decentralized probabilistic management. IEEE Commun. Mag. 49(7), 80–86 (2011). https://doi.org/10.1109/MCOM.2011.5936159
Mahimkar, A.A., Ge, Z., Shaikh, A., Wang, J., Yates, J., Zhang, Y., Zhao, Q.: Towards automated performance diagnosis in a large IPTV network. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, New York, NY, USA, 2009, pp. 231–242. https://doi.org/10.1145/1592568.1592596
Kavulya, S.P., Daniels, S., Joshi, K., Hiltunen, M., Gandhi, R., Narasimhan, P.: Draco: statistical diagnosis of chronic problems in large distributed systems. In: Proceedings of the: 2012 42Nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, 2012, pp. 1–12. http://dl.acm.org/citation.cfm?id=2354410.2355155
Johnsson, A., Meirosu, C.: Towards automatic network fault localization in real time using probabilistic inference. In: 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), May 2013, pp. 1393–1398
Johnsson, A., Meirosu, C., Flinta, C.: Online network performance degradation localization using probabilistic inference and change detection. In: 2014 IEEE Network Operations and Management Symposium (NOMS), May 2014, pp. 1–8. https://doi.org/10.1109/NOMS.2014.6838255
Wang, B., Ying, S., Cheng, G., Wang, R., Yang, Z., Dong, B.: Log-based anomaly detection with the improved K-nearest neighbor. Int. J. Soft. Eng. Knowl. Eng. 30(02), 239–262 (2020). https://doi.org/10.1142/S0218194020500114
Pal, A., Kumar, M.: DLME: distributed log mining using ensemble learning for fault prediction. IEEE Syst. J. 13(4), 3639–3650 (2019). https://doi.org/10.1109/JSYST.2019.2904513
Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. In: Proceedings of the ACM SIGCOMM 2011 Conference, New York, USA, 2011, pp. 350–361. https://doi.org/10.1145/2018436.2018477
Srinivasan, S.M., Truong-Huu, T., Gurusamy, M.: TE-based machine learning techniques for link fault localization in complex networks. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), Aug 2018, pp. 25–32. https://doi.org/10.1109/FiCloud.2018.00012
Srinivasan, S.M., Truong-Huu, T., Gurusamy, M.: Machine learning-based link fault identification and localization in complex networks. IEEE Internet Things J. 6(4), 6556–6566 (2019). https://doi.org/10.1109/JIOT.2019.2908019
Ayoubi, S., Limam, N., Salahuddin, M.A., Shahriar, N., Boutaba, R., Estrada-Solano, F., Caicedo, O.M.: Machine learning for cognitive network management. IEEE Commun. Mag. 56(1), 158–165 (2018). https://doi.org/10.1109/MCOM.2018.1700560
Boutaba, R., Salahuddin, M.A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., Caicedo, O.M.: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J. Internet Serv. Appl. 9(1), 16 (2018). https://doi.org/10.1186/s13174-018-0087-2
Tayal, A., Hubballi, N., Natu, M., Sadaphal, V.: Congestion-aware probe selection for fault detection in networks. In: 2018 10th International Conference on Communication Systems Networks (COMSNETS), Jan 2018, pp. 407–409. https://doi.org/10.1109/COMSNETS.2018.8328229
Bombal, D., Duponchelle, J.: Getting Started with GNS3–GNS3, GNS3 Documentation, Jan. 25, 2019. https://docs.gns3.com/1PvtRW5eAb8RJZ11maEYD9_aLY8kkdhgaMB0wPCz8a38/index.html#h.a45sndw9oea8. Accessed 14 Jul 2019
Potharaju, R., Jain, N.: When the network crumbles: an empirical study of cloud network failures and their impact on services. In: Proceedings of the 4th Annual Symposium on Cloud Computing, New York, USA, 2013, pp. 15:1–15:17. https://doi.org/10.1145/2523616.2523638
Claise, B.: Cisco Systems NetFlow Services Export Version 9, RFC Editor, RFC 3954, Oct 2004. http://www.rfc-editor.org/rfc/rfc3954.txt
Gerhards, R.: The Syslog Protocol, RFC Editor, RFC 5424, Mar 2009. http://www.rfc-editor.org/rfc/rfc5424.txt
Rezaei, S., Radmanesh, H., Alavizadeh, P., Nikoofar, H., Lahouti, F.: Automatic fault detection and diagnosis in cellular networks using operations support systems data. In: NOMS 2016–2016 IEEE/IFIP Network Operations and Management Symposium, Apr. 2016, pp. 468–473. https://doi.org/10.1109/NOMS.2016.7502845
Acknowledgements
This work has been developed thanks to the support of Telematics Engineering Group (GIT) of the University of Cauca and Systems Control, Learning and Optimization group (CAOS) of the Carlos III University of Madrid. The authors are grateful to the following Colombian institutions for funding the Ph.D. project in which this work was developed: Administrative Department of Science, Technology, and Innovation -COLCIENCIAS- (call for national doctorates No. 647–2014) and the Vice-Rectorate for Research of the University of Cauca (Project ID 4660). This work has been also supported by the Spanish Government under Projects TRA2016-78886-C3-1-R and PID2019-104793RB-C31.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vargas-Arcila, A.M., Corrales, J.C., Sanchis, A. et al. Peripheral Diagnosis for Propagated Network Faults. J Netw Syst Manage 29, 14 (2021). https://doi.org/10.1007/s10922-020-09579-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10922-020-09579-0