1. ANT Corpus active project
ANT Corpus stands for “Arabic News Texts Corpus”. It is a research project that aims to collect texts from different sources of the web by incrementing the amount of data progressively.
Project website : https://antcorpus.github.io
GitHub project page : https://github.com/antcorpus
- v1.1 is released (10 161 articles | 10 June 2018)
2. Kunuz AlMustafa – كنوز المصطفى past project
Abstract : Kunuz is a standard Arabic test collection for mono- and cross-language Information Retrieval (CLIR). The project deals with “Hadith” texts and provide a portal for sampling and evaluation of Hadiths’ results listed in both Arabic and English versions. The new called “Kunuz” standard Arabic test collection aims to promote the development of Arabic mono retrieval and CLIR systems abandoned since the earlier TREC-2001 and TREC-2002 editions.
Project data description : http://www.jarir.tn/kunuzcorpus