Research projects

ANT Corpus

ANT Corpus stands for “Arabic News Texts Corpus”. It is a research project that aims to collect texts from different sources of the web by incrementing the amount of data progressively.

Project website :

GitHub project page :

“ANT Corpus: un Corpus Multi-Sources pour la Classification et la Fouille des Textes Arabes” (Poster des Journées Scientifiques Pluridisciplinaires (JSP’2018))

View full JSP2018 poster image (in french)

News :

  • v1.1 is released (10 161 articles | 10 June 2018)