Abstract
We consider the problem of approximate reduction of non-deterministic automata that appear in hardware-accelerated network intrusion detection systems (NIDSes). We define an error distance of a reduced automaton from the original one as the probability of packets being incorrectly classified by the reduced automaton (wrt the probabilistic distribution of packets in the network traffic). We use this notion to design an approximate reduction procedure that achieves a great size reduction (much beyond the state-of-the-art language-preserving techniques) with a controlled and small error. We have implemented our approach and evaluated it on use cases from Snort, a popular NIDS. Our results provide experimental evidence that the method can be highly efficient in practice, allowing NIDSes to follow the rapid growth in the speed of networks.
Similar content being viewed by others
Notes
In theory, disambiguation can produce smaller automata, but, in our experiments, determinization proved to work better.
\({\mathcal {R}}\) is not necessarily a PA since there might be transitions in \({\mathcal {P}}\) that are either removed or copied several times in the product construction.
We use \(\log k\) to denote the base-2 logarithm of k.
We assume an implicit bijection between states of the product \({\mathcal {R}}\) and \(\{1, \ldots , |Q[{\mathcal {R}}]|\}\).
We emphasize that this does not mean that states from V will be simply removed from \({{\mathcal {A}}}\)—the performed operation depends on the particular reduction.
References
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). https://doi.org/10.1016/0890-5401(87)90052-6
Baier, C., Kiefer, S., Klein, J., Klüppelholz, S., Müller, D., Worrell, J.: Markov chains and unambiguous Büchi automata. In: CAV’16, pp. 23–42. Springer (2016)
Becchi, M., Crowley, P.: A hybrid finite automaton for practical deep packet inspection. In: CoNEXT’07, p. 1. ACM (2007)
Becchi, M., Crowley, P.: An improved algorithm to accelerate regular expression evaluation. In: ANCS’07, pp. 145–154. ACM (2007)
Becchi, M., Wiseman, C., Crowley, P.: Evaluating regular expression matching engines on network and general purpose processors. In: Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS ’09, pp. 30–39. ACM (2009)
Benedikt, M., Lenhardt, R., Worrell, J.: Model checking Markov chains against unambiguous Büchi automata. CoRR arXiv:1405.4560v2 (2016)
Bouajjani, A., Habermehl, P., Rogalewicz, A., Vojnar, T.: Abstract regular (tree) model checking. STTT 14(2), 167–191 (2012)
Brodie, B.C., Taylor, D.E., Cytron, R.K.: A scalable architecture for high-throughput regular-expression pattern matching. In: ISCA’06, pp. 191–202. IEEE Computer Society (2006)
Bustan, D., Grumberg, O.: Simulation-based minimization. ACM Trans. Comput. Log. 4(2), 181–206 (2003)
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Proceedings of the Second International Colloquium on Grammatical Inference and Applications, ICGI ’94, pp. 139–152. Springer (1994)
Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Figshare (2018). https://doi.org/10.6084/m9.figshare.5907055
Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Proceedings of TACAS’18, LNCS, vol. 10806. Springer (2018)
Champarnaud, J., Coulon, F.: NFA reduction algorithms by means of regular inequalities. Theor. Comput. Sci. 327(3), 241–253 (2004)
Clark, C.R., Schimmel, D.E.: Efficient reconfigurable logic circuits for matching complex network intrusion detection patterns. In: FPL’03, Lecture Notes in Computer Science, vol. 2778, pp. 956–959. Springer (2003)
Clemente, L.: Büchi automata can have smaller quotients. In: ICALP’11, Lecture Notes in Computer Science, vol. 6756, pp. 258–270. Springer (2011)
Csanky, L.: Fast parallel matrix inversion algorithms. In: 16th Annual Symposium on Foundations of Computer Science, pp. 11–12 (1975). https://doi.org/10.1109/SFCS.1975.14
Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Berlin (2009)
Etessami, K.: A hierarchy of polynomial-time computable simulations for automata. In: CONCUR 2002—Concurrency Theory, 13th International Conference, Brno, Czech Republic, August 20-23, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2421, pp. 131–144. Springer (2002)
Fortune, S., Wyllie, J.: Parallelism in random access machines. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, STOC ’78, pp. 114–118. ACM, New York, NY, USA (1978). https://doi.org/10.1145/800133.804339
Gange, G., Ganty, P., Stuckey, P.J.: Fixing the state budget: approximation of regular languages with small DFAs. In: ATVA’17, Lecture Notes in Computer Science, vol. 10482, pp. 67–83. Springer (2017)
Gawrychowski, P., Jez, A.: Hyper-minimisation made efficient. In: MFCS’09, Lecture Notes in Computer Science, vol. 5734, pp. 356–368. Springer (2009)
Hartmanns, A., Wendler, P.: TACAS 2018 artifact evaluation VM. In: Figshare (2018). https://doi.org/10.6084/m9.figshare.5896615
Hogben, L.: Handbook of Linear Algebra, 2nd edn. CRC Press, Boca Raton (2013)
Hopcroft, J.E.: An N log N algorithm for minimizing states in a finite automaton. Technical report (1971)
Hutchings, B.L., Franklin, R., Carver, D.: Assisting network intrusion detection with reconfigurable hardware. In: FCCM’02, pp. 111–120. IEEE Computer Society (2002)
Jiang, T., Ravikumar, B.: Minimal NFA problems are hard. SIAM J. Comput. 22(6), 1117–1141 (1993)
Kaštil, J., Kořenek, J., Lengál, O.: Methodology for fast pattern matching by deterministic finite automaton with perfect hashing. In: 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, pp. 823–829 (2009)
Kořenek, J., Kobierský, P.: Intrusion detection system intended for multigigabit networks. In: 2007 IEEE Design and Diagnostics of Electronic Circuits and Systems, pp. 1–4 (2007)
Kumar, S., Chandrasekaran, B., Turner, J.S., Varghese, G.: Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia. In: ANCS’07, pp. 155–164. ACM (2007)
Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.S.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: SIGCOMM’06, pp. 339–350. ACM (2006)
Kumar, S., Turner, J.S., Williams, J.: Advanced algorithms for fast and scalable deep packet inspection. In: ANCS’06, pp. 81–92. ACM (2006)
Liu, C., Wu, J.: Fast deep packet inspection with a dual finite automata. IEEE Trans. Comput. 62(2), 310–321 (2013)
Luchaup, D., De Carli, L., Jha, S., Bach, E.: Deep packet inspection with DFA-trees and parametrized language overapproximation. In: INFOCOM’14, pp. 531–539. IEEE (2014)
Malcher, A.: Minimizing finite automata is computationally hard. Theor. Comput. Sci. 327(3), 375–390 (2004)
Maletti, A., Quernheim, D.: Optimal hyper-minimization. CoRR arXiv:1104.3007 (2011)
Matoušek, D., Kořenek, J., Puš, V.: High-speed regular expression matching with pipelined automata. In: 2016 International Conference on Field-Programmable Technology (FPT), pp. 93–100 (2016)
Mayr, R., Clemente, L.: Advanced automata minimization. In: POPL’13, Transactions on Computer Logic, pp. 63–74. ACM (2013)
Mayr, R., et al.: Reduce: A tool for minimizing nondeterministic finite-word and Büchi automata. http://languageinclusion.org/doku.php?id=tools (2017). Accessed 30 Sept 2017
Mitra, A., Najjar, W.A., Bhuyan, L.N.: Compiling PCRE to FPGA for accelerating SNORT IDS. In: ANCS’07, pp. 127–136. ACM (2007)
Mohri, M.: A disambiguation algorithm for finite automata and functional transducers. In: CIAA’12, pp. 265–277. Springer (2012)
Mohri, M.: Edit-distance of weighted automata. In: CIAA’02, Lecture Notes in Computer Science, vol. 2608, pp. 1–23. Springer (2002)
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)
Papadimitriou, C.M.: Computational Complexity. Addison-Wesley, Reading (1994)
Parker, A.J., Yancey, K.B., Yancey, M.P.: Regular language distance and entropy. CoRR arXiv:1602.07715 (2016)
Puš, V., Tobola, J., Košař, V., Kaštil, J., Kořenek, J.: Netbench: framework for evaluation of packet processing algorithms. In: Symposium On Architecture For Networking And Communications Systems pp. 95–96 (2011)
Shützenberger, M.: On the definition of a family of automata. Inf. Control 4, 245–270 (1961)
Sidhu, R.P.S., Prasanna, V.K.: Fast regular expression matching using FPGAs. In: FCCM’01, pp. 227–238. IEEE Computer Society (2001)
Solodovnikov, V.I.: Upper bounds on the complexity of solving systems of linear equations. J. Sov. Math. 29(4), 1482–1501 (1985)
Tan, L., Sherwood, T.: A high throughput string matching architecture for intrusion detection and prevention. In: ISCA’05, pp. 112–122. IEEE Computer Society (2005)
The Snort Team: Snort. http://www.snort.org. Accessed 30 Sept 2017
Thollard, F., Clark, A.: Learning stochastic deterministic regular languages. In: G. Paliouras, Y. Sakakibara (eds.) Grammatical Inference: Algorithms and Applications: 7th International Colloquium, ICGI 2004, Athens, Greece, October 11–13, 2004. Proceedings, pp. 248–259. Springer Berlin Heidelberg, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30195-0_22
Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: SFCS ’85, pp. 327–338. IEEE
Yu, F., Chen, Z., Diao, Y., Lakshman, T.V., Katz, R.H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: ANCS’06, pp. 93–102. ACM (2006)
Acknowledgements
We thank Jan Kořenek, Vlastimil Košař, and Denis Matoušek for their help with translating regexes into automata and synthesis of FPGA designs, and Martin Žádník for providing us with the backbone network traffic. We thank Stefan Kiefer for helping us proving the PSPACE part of Lemma 1 and Petr Peringer for testing our artefact. We also thank the anonymous reviewers for their useful comments, which improved the quality of the paper. The work on this paper was supported by the Czech Science Foundation project 16-24707Y, the IT4IXS: IT4Innovations Excellence in Science project (LQ1602), and the FIT BUT internal project FIT-S-17-4014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Češka, M., Havlena, V., Holík, L. et al. Approximate reduction of finite automata for high-speed network intrusion detection. Int J Softw Tools Technol Transfer 22, 523–539 (2020). https://doi.org/10.1007/s10009-019-00520-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-019-00520-8