Research projects

1. ANT Corpus active project

ANT Corpus stands for “Arabic News Texts Corpus”. It is a research project that aims to collect texts from different sources of the web by incrementing the amount of data progressively.

Project website :

GitHub project page :

“ANT Corpus: un Corpus Multi-Sources pour la Classification et la Fouille des Textes Arabes” (Poster des Journées Scientifiques Pluridisciplinaires (JSP’2018))

View full JSP2018 poster image (in french)

News :

  • v1.1 is released (10 161 articles | 10 June 2018)

2. Kunuz AlMustafa – كنوز المصطفى past project

Abstract : Kunuz is a standard Arabic test collection for mono- and cross-language Information Retrieval (CLIR). The project deals with “Hadith” texts and provide a portal for sampling and evaluation of Hadiths’ results listed in both Arabic and English versions. The new called “Kunuz” standard Arabic test collection aims to promote the development of Arabic mono retrieval and CLIR systems abandoned since the earlier TREC-2001 and TREC-2002 editions.


View full Kunuz poster (presented in the University of Manouba Symposium’2013)

Project data description  :