Datasets for Automatic Scoring of Educational Text Data
This page provides an overview of freely available automatic free-text scoring datasets from the areas of automatic essay grading and content/short-answer scoring.
Corpus_Name | Link/Paper | Population | Language | Task | Modality | prompts | answers | labels |
---|---|---|---|---|---|---|---|---|
ASAP-AES | Dataset | high school students (grade 7 to 10) | English | Essay | ? | 7 | 13000 | varies per prompt (holistic only) |
ASAP-AES++ | Dataset Paper | high school students (grade 7 to 10) | English | Essay | ? | 6 | 13000 | varies per prompt (with fine-grained traits) |
Kaggle ELL Feedback Prize | Dataset Competition | high school students (grade 8 to 12) | English | Essay | ? | 1 | 2700 | 1-5 for 6 trait scores |
ASAP-SAS | Dataset | high school students (mainly grade 10) | English | Short-answers (SLA, biology, sciences) | ? | 10 | 0-2/0-3 | |
ASAP-DE | Paper | crowdworkers (unclear) | German | Short-answers (SLA, biology, sciences) | typewritten | 3 | 903 | 0-2/0-3 |
Powergrading | Paper | unclear | English | short-answers (US immigration test) | typewritten | 10 | 6980 | binary |
SRA beetle | Paper Paper | students (native) | English | Short-answers (sciences) | typewritten | 56 | 3000 | 2/3/5-way entailment labels |
SRA SciEntsBank | Paper Paper | students (native) | English | Short-answers (sciences) | handwritten | 197 | 10000 | 2/3/5-way entailment labels |
AR-ASAG | Paper Dataset | university students (native?) | Arabic | Short-answers (cybercrime) | ? | 48 | 2133 | 0-5 points |
PT-SAG | Paper | school students (12-14 years) | Portuguese | Short-answers (biology) | typewritten, some handwritten | 15 | 3675 | 0-3 points |
CSSAG | Paper | university students (mostly native) | German | short answers (computer science) | typewritten | 31 | 1926 | mostly 0-1 or 0-2 in 0.5 steps |
CS (Mohler) | Paper | CS studentsin the US | English | Short-answers (computer science) | typewritten | 21 | 630 | 0-5 points (0.5 point steps) |
CREG | Paper | language learners | German | Short-answers (Reading comprehension) | handwritten | 177 | 1032 | binary +5 diagnostic labels |
CREE | Paper | language learners | English | Short-answers (Reading comprehension) | handwritten | 62 | 566 | binary + 6 diagnostic labels |
Indonesian SAS/Ukara | Dataset Paper | native? | Indonesian | Short-answers (opinion questions) | unknown | 2 | 1032 | binary |
SweLL | Paper | language learners | Swedish | Essay | handwritten | unknown | growing | CEFR levels |
Falko | Homepage | language learners (+ native controll group) | German | Essay (argumentative) | typewritten | 4 | 248 (+95 from natives) | B2/C1/C2 |
COPLE-2 | Paper | language learners | Portuguese | Essay (various genres) | handwritten | multiple | 966 | A1/A2/B1/B2/C1 |
CESA | Paper | university students | Chinese | Short-Answer (physics & computer science) | typewritten | 5 | 1800 | 0-2 points |
ASAP-ZH | Paper | high school students | Chinese | Short-Answer (sciences) | handwritten | 3 | 942 | 0-2/0-3 |
SAF | Paper Dataset | college students, job applicants | English, German | Short-Answer (tech, pre-job training) | typewritten | 54 | 4519 | 0.0-1.0 |
GLUPS | Paper Dataset | school children 11 to 12 years | Arabic | Short-Answer (religious) | typewritten | 18 | 1276 | 0-2 points |
MindReading | Paper Dataset | school children 7 to 14 years | English | short-answers (explain behaviour) | handwritten | 10 | 11311 | 0-2 points |
Essay-BR | Paper Dataset | high school | Portuguese | essay | typewritten | 86 | 4570 | 0-1000 |
AES-ENEM | Paper Dataset | high school | Portuguese | essay | typewritten | 127 | 3,586 | 5 traits (0-200 points) |
Contact
This page is maintained by CATALPA; FernUniversität in Hagen, Germany.Did we miss your dataset? Is there an error in how we represented your data in the table? Please contact Andrea Horbach or Torsten Zesch under firstname.lastname@fernuni-hagen.de.