Abstract
Many stemming techniques are used in the context of Arabic Text Classification. In this paper, we show the effect of stemming on classification systems. We introduce a new stemming technique -approximate stemming- based on the use of Arabic patterns. These patterns are modeled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming words, documents are transformed into finite state transducers. This allow us to use rational kernels as a framework for Arabic Text Classification. Experiments show that, when compared with other approaches, our approach is more effective specially in term of Accuracy, Recall and F1.
This work is supported by the MESRS - Algeria under Project 8/U03/7015.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sebastiani, F., Ricerche, C.N.D.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)
Althubaity, A., Almuhareb, A., Alharbi, S., Al-Rajeh, A., Khorsheed, M.: KACST Arabic Text Classification Project: Overview and Preliminary Results. In: Proceedings of The 9th IBIMA Conference on Information Management in Modern Organizations (January 2008)
Duwairi, R.M.: Arabic Text Categorization. Int. Arab J. Inf. Technol. 4(2), 125–132 (2007)
Gharib, T., Habib, M., Fayed, Z.: Arabic Text Classification Using Support Vector Machines. International Journal of Computers and Their Applications 16(4), 192–199 (2009)
Khreisat, L.: A machine learning approach for Arabic text classification using N-gram frequency statistics. Journal of Informatrics 3(1), 72–77 (2009)
Mesleh, A.: Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study. In: Sobh, T. (ed.) Advances in Computer and Information Sciences and Engineering, pp. 11–16. Springer, Netherlands (2008)
Syiam, M., Fayed, Z., Habib, M.: An Intelligent System For Arabic Text Categorization. International Journal of Intelligent Computing and Information Sciences 6(1), 1–19 (2006)
Al-Nashashibi, M., Neagu, D., Yaghi, A.: Stemming techniques for Arabic words: A comparative study. In: Computer Technology and Development (ICCTD), pp. 270–276 (November 2010)
Khoja, S., Garside, R.: Stemming arabic text (1999)
Al-Serhan, H., Shalabi, R.A., Kannan, G.: New Approach For Extracting Arabic Roots. In: Proceedings of The 2003 Arab Conf. on Infor. Technology, Alexandria, Egypt, pp. 42–59 (December 2003)
Aljlayl, M., Frieder, O.: On Arabic Search: Improving the Retrieval Effectiveness Via Light Stemming Approach. In: ACM Eleventh Conference on Infor. and Knowledge Management, pp. 340–347 (2002)
Cortes, C., Haffner, P., Mohri, M.: Rational Kernels: Theory and Algorithms. J. Mach. Learn. Res. 5, 1035–1062 (2004)
Berstel, J.: Transductions and Context-Free Languages. Teubner Studienbücher, Stuttgart (1979)
Cortes, C., Kontorovich, L., Mohri, M.: Learning languages with rational kernels. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 349–364. Springer, Heidelberg (2007)
Allauzen, C., Riley, M.D., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A General and Efficient Weighted Finite-State Transducer Library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007)
Nehar, A., Ziadi, D., Cherroun, H., Guellouma, Y.: An Efficient Stemming for Arabic Text Classification. In: Innovations in Information Technology (IIT), pp. 328–332. IEEE (March 2012)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)
Lakhdari, A., Cherroun, H.: Effective Unsupervised Morphological Analysis and Modeling: Statistical Study for Arabic Language. In: Book of Abstracts of the 23rd Meeting of Computational Linguistics in the Netherlands: CLIN, p. 85 (January 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nehar, A., Ziadi, D., Cherroun, H. (2013). Rational Kernels for Arabic Text Classification. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-39593-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)