Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers

Abstract : In this paper, we present a new parallel corpus addressed to researchers, teachers, and speech therapists interested in text simplification as a means of alleviating difficulties in children learning to read. The corpus is composed of excerpts drawn from 79 authentic literary (tales, stories) and scientific (documentary) texts commonly used in French schools for children aged between 7 to 9 years old. The excerpts were manually simplified at the lexical, morpho-syntactic, and discourse levels in order to propose a parallel corpus for reading tests and for the development of automatic text simplification tools. A sample of 21 poor-reading and dyslexic children with an average reading delay of 2.5 years read a portion of the corpus. The transcripts of readings errors were integrated into the corpus with the goal of identifying lexical difficulty in the target population. By means of statistical testing, we provide evidence that the manual simplifications significantly reduced reading errors, highlighting that the words targeted for simplification were not only well-chosen but also substituted with substantially easier alternatives. The entire corpus is available for consultation through a web interface and available on demand for research purposes.
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-02503986
Contributeur : Núria Gala Pavia <>
Soumis le : mardi 10 mars 2020 - 13:50:36
Dernière modification le : vendredi 13 mars 2020 - 01:45:28

Fichier

LREC_2020_ALECTOR_Corpus-final...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-02503986, version 1

Citation

Núria Gala, Anaïs Tack, Ludivine Javourey-Drevet, Thomas François, Johannes C. Ziegler. Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers. Language Resources and Evaluation for Language Technologies (LREC), May 2020, Marseille, France. ⟨hal-02503986⟩

Partager

Métriques

Consultations de la notice

124

Téléchargements de fichiers

102