Skip to the content.

Datasets for Automatic Scoring of Educational Text Data

This page provides an overview of freely available automatic free-text scoring datasets from the areas of automatic essay grading and content/short-answer scoring.

Corpus_Name Link/Paper Population Language Task Modality prompts answers labels
ASAP-AES Dataset high school students (grade 7 to 10) English Essay ? 7 13000 varies per prompt (holistic only)
ASAP-AES++ Dataset Paper high school students (grade 7 to 10) English Essay ? 6 13000 varies per prompt (with fine-grained traits)
Kaggle ELL Feedback Prize Dataset Competition high school students (grade 8 to 12) English Essay ? 1 2700 1-5 for 6 trait scores
ASAP-SAS Dataset high school students (mainly grade 10) English Short-answers (SLA, biology, sciences) ? 10 0-2/0-3
ASAP-DE Paper crowdworkers (unclear) German Short-answers (SLA, biology, sciences) typewritten 3 903 0-2/0-3
Powergrading Paper unclear English short-answers (US immigration test) typewritten 10 6980 binary
SRA beetle Paper Paper students (native) English Short-answers (sciences) typewritten 56 3000 2/3/5-way entailment labels
SRA SciEntsBank Paper Paper students (native) English Short-answers (sciences) handwritten 197 10000 2/3/5-way entailment labels
AR-ASAG Paper Dataset university students (native?) Arabic Short-answers (cybercrime) ? 48 2133 0-5 points
PT-SAG Paper school students (12-14 years) Portuguese Short-answers (biology) typewritten, some handwritten 15 3675 0-3 points
CSSAG Paper university students (mostly native) German short answers (computer science) typewritten 31 1926 mostly 0-1 or 0-2 in 0.5 steps
CS (Mohler) Paper CS studentsin the US English Short-answers (computer science) typewritten 21 630 0-5 points (0.5 point steps)
CREG Paper language learners German Short-answers (Reading comprehension) handwritten 177 1032 binary +5 diagnostic labels
CREE Paper language learners English Short-answers (Reading comprehension) handwritten 62 566 binary + 6 diagnostic labels
Indonesian SAS/Ukara Dataset Paper native? Indonesian Short-answers (opinion questions) unknown 2 1032 binary
SweLL Paper language learners Swedish Essay handwritten unknown growing CEFR levels
Falko Homepage language learners (+ native controll group) German Essay (argumentative) typewritten 4 248 (+95 from natives) B2/C1/C2
COPLE-2 Paper language learners Portuguese Essay (various genres) handwritten multiple 966 A1/A2/B1/B2/C1
CESA Paper university students Chinese Short-Answer (physics & computer science) typewritten 5 1800 0-2 points
ASAP-ZH Paper high school students Chinese Short-Answer (sciences) handwritten 3 942 0-2/0-3
SAF Paper Dataset college students, job applicants English, German Short-Answer (tech, pre-job training) typewritten 54 4519 0.0-1.0
GLUPS Paper Dataset school children 11 to 12 years Arabic Short-Answer (religious) typewritten 18 1276 0-2 points
MindReading Paper Dataset school children 7 to 14 years English short-answers (explain behaviour) handwritten 10 11311 0-2 points
Essay-BR Paper Dataset high school Portuguese essay typewritten 86 4570 0-1000
AES-ENEM Paper Dataset high school Portuguese essay typewritten 127 3,586 5 traits (0-200 points)

Contact

This page is maintained by CATALPA; FernUniversität in Hagen, Germany.
Did we miss your dataset? Is there an error in how we represented your data in the table? Please contact Andrea Horbach or Torsten Zesch under firstname.lastname@fernuni-hagen.de.