Language Technology Lab

2022

German Medical Natural Language Processing–A Data-centric Survey

Zesch, Torsten, and Bewersdorff, Jeanette

Applications in Medicine and Manufacturing Nov 2022

PDF
CNN-based Ruled Line Removal in Handwritten Documents

Gold, Christian, and Zesch, Torsten

In Proceedings of the 18th International Conference on Frontiers of Handwriting Recognition (ICFHR 2022) Dec 2022

PDF
Similarity-Based Content Scoring - How to Make S-BERT Keep Up With BERT

Bexte, Marie, Horbach, Andrea, and Zesch, Torsten

In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) Jul 2022

Abstract PDF

The dominating paradigm for content scoring is to learn an instance-based model, i.e. to use lexical features derived from the learner answers themselves. An alternative approach that receives much less attention is however to learn a similarity-based model. We introduce an architecture that efficiently learns a similarity model and find that results on the standard ASAP dataset are on par with a BERT-based classification approach.
Don’t Drop the Topic - The Role of the Prompt in Argument Identification in Student Writing

Ding, Yuning, Bexte, Marie, and Horbach, Andrea

In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) Jul 2022

Abstract PDF

In this paper, we explore the role of topic information in student essays from an argument mining perspective. We cluster a recently released corpus through topic modeling into prompts and train argument identification models on different data settings. Results show that, given the same amount of training data, prompt-specific training performs better than cross-prompt training. However, the advantage can be overcome by introducing large amounts of cross-prompt training data.
‘Meet me at the ribary’ – Acceptability of spelling variants in free-text answers to listening comprehension prompts

Laarmann-Quante, Ronja, Schwarz, Leska, Horbach, Andrea, and Zesch, Torsten

In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) Jul 2022

Abstract PDF

When listening comprehension is tested as a free-text production task, a challenge for scoring the answers is the resulting wide range of spelling variants. When judging whether a variant is acceptable or not, human raters perform a complex holistic decision. In this paper, we present a corpus study in which we analyze human acceptability decisions in a high stakes test for German. We show that for human experts, spelling variants are harder to score consistently than other answer variants.Furthermore, we examine how the decision can be operationalized using features that could be applied by an automatic scoring system. We show that simple measures like edit distance and phonetic similarity between a given answer and the target answer can model the human acceptability decisions with the same inter-annotator agreement as humans, and discuss implications of the remaining inconsistencies.
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) Jul 2022

PDF
LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language

Bexte, Marie, Laarmann-Quante, Ronja, Horbach, Andrea, and Zesch, Torsten

In Proceedings of the Language Resources and Evaluation Conference Jun 2022

Abstract PDF

Spellchecking text written by language learners is especially challenging because errors made by learners differ both quantitatively and qualitatively from errors made by already proficient learners. We introduce LeSpell, a multi-lingual (English, German, Italian, and Czech) evaluation data set of spelling mistakes in context that we compiled from seven underlying learner corpora. Our experiments show that existing spellcheckers do not work well with learner data. Thus, we introduce a highly customizable spellchecking component for the DKPro architecture, which improves performance in many settings.

2021

VL-BERT+: Detecting Protected Groups in Hateful Multimodal Memes

Aggarwal, Piush, Liman, Michelle Espranita, Gold, Darina, and Zesch, Torsten

In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Aug 2021

Abstract PDF

This paper describes our submission (winning solution for Task A) to the Shared Task on Hateful Meme Detection at WOAH 2021. We build our system on top of a state-of-the-art system for binary hateful meme classification that already uses image tags such as race, gender, and web entities. We add further metadata such as emotions and experiment with data augmentation techniques, as hateful instances are underrepresented in the data set.
Implicit Phenomena in Short-answer Scoring Data

Bexte, Marie, Horbach, Andrea, and Zesch, Torsten

In Proceedings of the First Workshop on Understanding Implicit and Underspecified Language Aug 2021

PDF
Künstliche Intelligenz in der Bildung

Zesch, Torsten, Horbach, Andrea, and Laarmann-Quante, Ronja

Unikate: Berichte aus Forschung und Lehre Aug 2021

PDF
Personalizing Handwriting Recognition Systems with Limited User-Specific Samples

Gold, Christian, Boom, Dario, and Zesch, Torsten

In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021) Aug 2021

PDF
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021) Apr 2021

PDF
C-Test Collector: A Proficiency Testing Application to Collect Training Data for C-Tests

Haring, Christian, Lehmann, Rene, Horbach, Andrea, and Zesch, Torsten

In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications Apr 2021

PDF
Fully vs. Weakly Supervised Caries Localization in Smartphone Images with CNNs

Pham, Duc Duy, Müller, Jonas, Aggarwal, Piush, Khatri, Amit, Sharma, Mayank, Zesch, Torsten, and Pauli, Josef

In Artificial Intelligence for Healthcare Applications International Workshop - ICPR 2020 Workshop Proceedings Jan 2021

2020

Don’t take "nswvtnvakgxpm" for an answer - The surprising vulnerability of automatic content scoring systems to adversarial input

Ding, Yuning, Riordan, Brian, Horbach, Andrea, Cahill, Aoife, and Zesch, Torsten

In Proceedings of the 28th International Conference on Computational Linguistics(COLING 2020) Jan 2020

PDF
Chinese Content Scoring: Open-Access Datasets and Features on Different Segmentation Levels

Ding, Yuning, Horbach, Andrea, Wang, Haoshi, Song, Xuefeng, and Zesch, Torsten

In Proceedings of the 1st conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing(AACL-IJCNLP 2020) Jan 2020

PDF
Digital Transformation: A unique Chance to Shape the Future

Yekta, Semire

In Proceedings of the 1st conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing(AACL-IJCNLP 2020) Jan 2020
Automated Scoring of Teachers’ Pedagogical Content Knowledge - A Comparison between Human and Machine Scoring

Wahlen, Andreas, Kuhn, Christiane, Zlatkin-Troitschanskaia, Olga, Gold, Christian, Zesch, Torsten, and Horbach, Andrea

Frontiers in Education Jan 2020

PDF
Appropriateness and Pedagogic Usefulness of Reading Comprehension Questions

Horbach, Andrea, Aldabe, Itziar, Bexte, Marie, Lacalle, Oier, and Maritxalar, Montse

In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC-2020) Jan 2020

PDF
Exploring the Impact of Handwriting Recognition on the Automated Scoring of Handwritten Student Answers

Gold, Christian, and Zesch, Torsten

In Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition (ICFHR 2020) Jan 2020

PDF
Decomposing and Comparing Meaning Relations: Paraphrasing, Textual Entailment, Contradiction, and Specificity

Kovatchev, Venelin, Gold, Darina, Marti, M. Antonia, Salamo, Maria, and Zesch, Torsten

In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC-2020) Jan 2020

PDF

2019

A survey of semantic relatedness evaluation datasets and procedures

Taieb, Mohamed Ali Hadj, Zesch, Torsten, and Aouicha, Mohamed Ben

Artificial Intelligence Review Jan 2019

PDF
Identification of Good and Bad News on Twitter

Aggarwal, Piush, and Aker, Ahmet

In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) Sep 2019

Abstract PDF

Social media plays a great role in news dissemination which includes good and bad news. However, studies show that news, in general, has a significant impact on our mental stature and that this influence is more in bad news. An ideal situation would be that we have a tool that can help to filter out the type of news we do not want to consume. In this paper, we provide the basis for such a tool. In our work, we focus on Twitter. We release a manually annotated dataset containing 6,853 tweets from 5 different topical categories. Each tweet is annotated with good and bad labels. We also investigate various machine learning systems and features and evaluate their performance on the newly generated dataset. We also perform a comparative analysis with sentiments showing that sentiment alone is not enough to distinguish between good and bad news.
Classification Approaches to Identify Informative Tweets

Aggarwal, Piush

In Proceedings of the Student Research Workshop Associated with RANLP 2019 Sep 2019

Abstract PDF

Social media platforms have become prime forums for reporting news, with users sharing what they saw, heard or read on social media. News from social media is potentially useful for various stakeholders including aid organizations, news agencies, and individuals. However, social media also contains a vast amount of non-news content. For users to be able to draw on benefits from news reported on social media it is necessary to reliably identify news content and differentiate it from non-news. In this paper, we tackle the challenge of classifying a social post as news or not. To this end, we provide a new manually annotated dataset containing 2,992 tweets from 5 different topical categories. Unlike earlier datasets, it includes postings posted by personal users who do not promote a business or a product and are not affiliated with any organization. We also investigate various baseline systems and evaluate their performance on the newly generated dataset. Our results show that the best classifiers are the SVM and BERT models.
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Yannakoudakis, Helen, Kochmar, Ekaterina, Leacock, Claudia, Madnani, Nitin, Pilán, Ildikó, and Zesch, Torsten

In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications Sep 2019

PDF
German End-to-end Speech Recognition based on DeepSpeech

Agarwal, Aashish, and Zesch, Torsten

In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers Sep 2019

PDF
Annotating and analyzing the interactions between meaning relations

Gold, Darina, Kovatchev, Venelin, and Zesch, Torsten

In Proceedings of the 13th Linguistic Annotation Workshop Sep 2019

PDF
Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests

Hamed, Osama, and Zesch, Torsten

In Proceedings of the 3rd International Conference on Natural Language and Speech Processing Sep 2019

PDF
RELATIONS-Workshop on meaning relations between phrases and sentences

Kovatchev, Venelin, Gold, Darina, and Zesch, Torsten

In RELATIONS-Workshop on meaning relations between phrases and sentences Sep 2019

PDF
The Influence of Variance in Learner Answers on Automatic Content Scoring

Horbach, Andrea, and Zesch, Torsten

Frontiers in Education Sep 2019

PDF
From legal to technical concept: Towards an automated classification of German political Twitter postings as criminal offenses

Zufall, Frederike, Horsmann, Tobias, and Zesch, Torsten

In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) Sep 2019

PDF
ltl.uni-due at SemEval 2019 Task 5: Simple but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter

Zhang, Huangpan, Wojatzki, Michael, Horsmann, Tobias, and Zesch, Torsten

In Proceedings of the International Workshop on Semantic Evaluation (SemEval) Sep 2019

PDF
LTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness

Aggarwal, Piush, Horsmann, Tobias, Wojatzki, Michael, and Zesch, Torsten

In Proceedings of the International Workshop on Semantic Evaluation (SemEval) Sep 2019

PDF
Computer-assisted Understanding of Stance in Social Media: Formalizations, Data Creation, and Prediction Models

Wojatzki, Michael

In Proceedings of the International Workshop on Semantic Evaluation (SemEval) Sep 2019

PDF

2018

Corpus of Aspect-based Sentiment in Political Debates

Gold, Darina, Bexte, Marie, and Zesch, Torsten

In KONVENS Sep 2018
Do Women Perceive Hate Differently: Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments

Wojatzki, Michael, Horsmann, Tobias, Gold, Darina, and Zesch, Torsten

In Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018) Sep 2018

PDF
The Role of Diacritics in Increasing the Difficulty of Arabic Lexical Recognition Tests

Hamed, Osama, and Zesch, Torsten

In Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning at SLTC 2018 (NLP4CALL 2018) Sep 2018

PDF
A flexible online system for curating reduced redundancy language exercises and tests

Zesch, Torsten, Horbach, Andrea, Goggin, Melanie, and Wrede-Jackes, Jennifer

In Future-proof CALL: language learning as exploration and encounters â€“ short papers from EUROCALL 2018 Sep 2018

PDF
ESCRITO-An NLP-Enhanced Educational Scoring Toolkit

Zesch, Torsten, and Horbach, Andrea

In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) Sep 2018

PDF
Quantifying qualitative data for understanding controversial issues

Wojatzki, Michael, Mohammad, Saif, Zesch, Torsten, and Kiritchenko, Svetlana

In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) Sep 2018

PDF
DeepTC–An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments

Horsmann, Tobias, and Zesch, Torsten

In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) Sep 2018

PDF
Robust Part-of-Speech Tagging of Social Media Text

Horsmann, Tobias

In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) Sep 2018

PDF
Semi-Supervised Clustering for Short Answer Scoring

Horbach, Andrea, and Pinkal, Manfred

In LREC Sep 2018

PDF
Exploring the Effects of Diacritization on Arabic Frequency Counts

Hamed, Osama, and Zesch, Torsten

In International Conference on Natural Language and Speech Processing (ICNLSP 2018) Sep 2018

PDF
Cross-lingual Content Scoring

Horbach, Andrea, Stennmanns, Sebastian, and Zesch, Torsten

In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications Sep 2018

PDF

2017

A Survey and Comparative Study of Arabic Diacritization Tools

Hamed, Osama, and Zesch, Torsten

JLCL: Special Issue-NLP for Perso-Arabic Alphabets Sep 2017

PDF
The Influence of Spelling Error on Content Scoring Performance

Horbach, Andrea, Ding, Yuning, and Zesch, Torsten

In Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications Sep 2017

PDF
GermEval 2017: Shared Task on Aspect-based Sentiment in Social Media Customer Feedback

Wojatzki, Michael, Ruppert, Eugen, Holschneider, Sarah, Zesch, Torsten, and Biemann, Chris

In Proceedings of the GermEval 2017 Shared Task on Aspect-based Sentiment in Social Media Customer Feedback Sep 2017

PDF
Same same, but different: Compositionality of paraphrase granularity levels

Benikova, Darina, and Zesch, Torsten

In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2017) Sep 2017

PDF
The Role of Diacritics in Designing Lexical Recognition Tests for Arabic

Hamed, Osama, and Zesch, Torsten

In 3rd International Conference on Arabic Computational Linguistics (ACLing 2017) Sep 2017

PDF
Part-of-speech tagging for corpora of computer-mediated communication: A case study on finding rare phenomena

Beißwenger, Michael, Horsmann, Tobias, and Zesch, Torsten

In 3rd International Conference on Arabic Computational Linguistics (ACLing 2017) Sep 2017

PDF
Investigating neural architectures for short answer scoring

Riordan, Brian, Horbach, Andrea, Cahill, Aoife, Zesch, Torsten, and Lee, Chong Min

In Proceedings of the Building Educational Applications Workshop at EMNLP Sep 2017

PDF
Neural, Non-neural and Hybrid Stance Detection in Tweets on Catalan Independence

Wojatzki, Michael, and Zesch, Torsten

In Stance and Gender Detection in Tweets on Catalan Independence at Ibereval 2017 Sep 2017

PDF
Fine-grained essay scoring of a complex writing task for native speakers

Horbach, Andrea, Scholten-Akoun, Dirk, Ding, Yuning, and Zesch, Torsten

In Proceedings of the Building Educational Applications Workshop at EMNLP Sep 2017

PDF
Reliable Part-of-Speech Tagging of Low-frequency Phenomena in the Social Media Domain

Horsmann, Tobias, Beißwenger, Michael, and Zesch, Torsten

In Proceedings of the Conference on CMC and Social Media Corpora for the Humanities Sep 2017

PDF
Do LSTMs really work so well for PoS tagging? – A replication study

Horsmann, Tobias, and Zesch, Torsten

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) Sep 2017

PDF
What does this imply? Examining the Impact of Implicitness on the Perception of Hate Speech

Benikova, Darina, Wojatzki, Michael, and Zesch, Torsten

In Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL-2017) Sep 2017

PDF

2016

Bridging the gap between computable and expressive event representations in Social Media

Benikova, Darina, and Zesch, Torsten

In Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods Sep 2016

Abstract PDF

An important goal in text understanding is making sense of events. However, there is a gap between computable representations on the one hand and expressive representations on the other hand. We aim to bridge this gap by inducing distributional semantic clusters as labels in a frame structural representation.
LTL-UDE at EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text

Horsmann, Tobias, and Zesch, Torsten

In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task Sep 2016

PDF
Stance-based Argument Mining – Modeling Implicit Argumentation Using Stance

Wojatzki, Michael, and Zesch, Torsten

In Proceedings of the KONVENS Sep 2016

PDF
ltl.uni-due at SemEval-2016 Task 6: Stance Detection in Social Media Using Stacked Classifiers

Wojatzki, Michael, and Zesch, Torsten

In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016) Sep 2016

PDF
Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging

Horsmann, Tobias, and Zesch, Torsten

In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers Sep 2016

PDF
Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks

Pilán, Ildikó, Volodina, Elena, and Zesch, Torsten

In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers Sep 2016

PDF
Predicting the Spelling Difficulty of Words for Language Learners

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the Building Educational Applications Workshop at NAACL Sep 2016

PDF
Bundled Gap Filling: A New Paradigm for Unambiguous Cloze Exercises.

Wojatzki, Michael, Melamud, Oren, and Zesch, Torsten

In Proceedings of the Building Educational Applications Workshop at NAACL Sep 2016

PDF
FlexTag: A Highly Flexible Pos Tagging Framework

Zesch, Torsten, and Horsmann, Tobias

In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) Sep 2016

PDF
Building a Social Media Adapted PoS Tagger Using FlexTag – A Case Study on Italian Tweets

Horsmann, Tobias, and Zesch, Torsten

In Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian - EVALITA 2016 Sep 2016

PDF
Validating Bundled Gap Filling – Empirical Evidence for Ambiguity Reduction and Language Proficiency Testing Capabilities

Meyer, Niklas, Wojatzki, Michael, and Zesch, Torsten

In Proceedings of the NLP4CALL at SLTC 2016 Sep 2016

PDF
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis

Ross, Björn, Rist, Michael, Carbonell, Guillermo, Cabrera, Ben, Kurowsky, Nils, and Wojatzki, Michael

In Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication Sep 2016

PDF

2015

Candidate evaluation strategies for improved difficulty prediction of language tests

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the Building Educational Applications Workshop at NAACL Sep 2015

PDF
Fast or Accurate ? – A Comparative Evaluation of PoS Tagging Models

Horsmann, Tobias, Erbs, Nicolai, and Zesch, Torsten

In Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL-2015) Sep 2015

PDF
Reducing Annotation Efforts in Supervised Short Answer Scoring

Zesch, Torsten, Heilman, Michael, and Cahill, Aoife

In Proceedings of the Building Educational Applications Workshop at NAACL Sep 2015

PDF
Task-Independent Features for Automated Essay Grading

Zesch, Torsten, Wojatzki, Michael, and Scholten-Akoun, Dirk

In Proceedings of the Building Educational Applications Workshop at NAACL Sep 2015

PDF
Counting What Counts: Decompounding for Keyphrase Extraction

Erbs, Nicolai, Santos, Pedro Bispo, Zesch, Torsten, and Gurevych, Iryna

In Counting What Counts: Decompounding for Keyphrase Extraction Sep 2015

PDF
Effectiveness of Domain Adaptation Approaches for Social Media PoS Tagging

Horsmann, Tobias, and Zesch, Torsten

In Proceeding of the Second Italian Conference on Computational Linguistics Sep 2015

PDF
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Nakov, Preslav, Zesch, Torsten, Cer, Daniel, and Jurgens, David

In Proceeding of the Second Italian Conference on Computational Linguistics Sep 2015

PDF
Composing Measures for Computing Text Similarity

Bär, Daniel, Zesch, Torsten, and Gurevych, Iryna

In Proceeding of the Second Italian Conference on Computational Linguistics Sep 2015

PDF
Generating Nonwords for Vocabulary Proficiency Testing

Hamed, Osama, and Zesch, Torsten

In Proceeding of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Sep 2015

PDF

2014

DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments

Erbs, Nicolai, Santos, Pedro Bispo, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. System Demonstrations Sep 2014

PDF
Readability for foreign language learning: The importance of cognates

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

International Journal of Applied Linguistics Sep 2014

PDF
Predicting the Difficulty of Language Proficiency Tests

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

Transactions of the Association for Computational Linguistics Sep 2014

Abstract PDF

Language proficiency tests are used to evaluate and compare the progress of language learners. We present an approach for automatic difficulty prediction of C-tests that performs on par with human experts. On the basis of detailed analysis of newly collected data, we develop a model for C-test difficulty introducing four dimensions: solution difficulty, candidate ambiguity, inter-gap dependency, and paragraph difficulty. We show that cues from all four dimensions contribute to C-test difficulty.
Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules

Zesch, Torsten, and Melamud, Oren

In Proceedings of the 9th Workshop on Innovative Use of NLP for Building Educational Applications at ACL Sep 2014

PDF
DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data

Daxenberger, Johannes, Ferschke, Oliver, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. System Demonstrations Sep 2014

PDF
Sense and Similarity: A Study of Sense-level Similarity Measures

Erbs, Nicolai, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics (*SEM 2014) Sep 2014

PDF
Towards Automatic Scoring of Cloze Items by Selecting Low-Ambiguity Contexts

Horsmann, Tobias, and Zesch, Torsten

In 3rd workshop on NLP for computer-assisted language learning Sep 2014

PDF

2013

Recognizing Partial Textual Entailment

Levy, Omer, Zesch, Torsten, Dagan, Ido, and Gurevych, Iryna

In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Sep 2013

PDF
UKP-BIU: Similarity and Entailment Metrics for Student Response Analysis

Levy, Omer, Zesch, Torsten, Dagan, Ido, and Gurevych, Iryna

In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) Sep 2013

PDF
DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation

Miller, Tristan, Erbs, Nicolai, Zorn, Hans-Peter, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) (ACL 2013) Sep 2013

Abstract PDF

Implementations of word sense disambiguation (WSD) algorithms tend to be tied to a particular test corpus format and sense inventory. This makes it difficult to test their performance on new data sets, or to compare them against past algorithms implemented for different data sets. In this paper we present DKPro WSD, a freely licensed, general-purpose framework for WSD which is both modular and extensible. DKPro WSD abstracts the WSD process in such a way that test corpora, sense inventories, and algorithms can be freely swapped. Its UIMA-based architecture makes it easy to add support for new resources and algorithms. Related tasks such as word sense induction and entity linking are also supported.
SemEval-2013 Task 5: Evaluating Phrasal Semantics

Korkontzelos, Ioannis, Zesch, Torsten, Zanzotto, Fabio Massimo, and Biemann, Chris

Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2013) Sep 2013

PDF
DKPro Similarity: An Open Source Framework for Text Similarity

Bär, Daniel, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) Sep 2013

PDF
Detecting Malapropisms Using Measures of Contextual Fitness

Zesch, Torsten

Special Issue of the TAL Journal on "Managing Noise in the Signal: Error Handling in Natural Language Processing"‚ Sep 2013

PDF
Scalable Construction of High-Quality Web Corpora

Biemann, Chris, Bildhauer, Felix, Evert, Stefan, Goldhahn, Dirk, Quasthoff, Uwe, Schäfer, Roland, Simon, Johannes, Swiezinski, Leonard, and Zesch, Torsten

JLCL Sep 2013

PDF
Hierarchy Identification for Automatically Generating Table-of-Contents

Erbs, Nicolai, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) Sep 2013

Abstract PDF

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.
Language Resources and Evaluation Journal - Special Issue on Collaboratively Constructed Language Resources

Gurevych, Iryna, and Zesch, Torsten

In Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) Sep 2013
Cognate Production using Character-based Machine Translation

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 6th International Joint Conference on Natural Language Processing Sep 2013

PDF

2012

Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History

Zesch, Torsten

In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) Sep 2012

PDF
HOO 2012 Shared Task: UKP Lab System Description

Zesch, Torsten, and Haase, Jens

In Proceedings of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications at NAACL-HLT Sep 2012

PDF
UKP-UBC Entity Linking at TAC-KBP

Erbs, Nicolai, Agirre, Eneko, Soroa, Aitor, Barrena, Ander, Etxebarria, Ugaitz, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 5th Text Analysis Conference Sep 2012

PDF
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures

Bär, Daniel, Biemann, Chris, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 6th International Workshop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computational Semantics Sep 2012

PDF
Towards fine-grained readability measures for self-directed language learning

Beinborn, Lisa, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the SLTC 2012 workshop on NLP for CALL Sep 2012

PDF
Text Reuse Detection Using a Composition of Text Similarity Measures

Bär, Daniel, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) Sep 2012

PDF
Collective Intelligence and Language Resources: Introduction to the Special Issue on Collaboratively Constructed Language Resources

Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) Sep 2012

PDF
Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation

Miller, Tristan, Biemann, Chris, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) Sep 2012

PDF

2011

Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia’s Edit History

Ferschke, Oliver, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. System Demonstrations Sep 2011

PDF
Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

Bär, Daniel, Erbs, Nicolai, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. System Demonstrations Sep 2011

PDF
A Reflective View on Text Similarity

Bär, Daniel, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the International Conference on Recent Advances in Natural Language Processing Sep 2011

PDF
First Aid for Information Chaos in Wikis: Collaborative Information Management Enhanced Through Language Technology

Erbs, Nicolai, Bär, Daniel, Gurevych, Iryna, and Zesch, Torsten

In Proceedings of the International Conference on Recent Advances in Natural Language Processing Sep 2011

PDF
Aufbereitung und Strukturierung von Information mittels automatischer Sprachverarbeitung

Stille, Wolfgang, Erbs, Nicolai, Zesch, Torsten, Gurevych, Iryna, and Weihe, Karsten

In Proceedings of KnowTech Sep 2011

PDF
Link Discovery: A Comprehensive Analysis

Erbs, Nicolai, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 5th IEEE International Conference on Semantic Computing (IEEE-ICSC) Sep 2011

Abstract PDF

We present a comprehensive analysis of link discovery approaches. We classify them with regard to the type of knowledge being used, and identify three commonly used sources of knowledge: The text of a document, the document title, and already existing links. We analyze the influence of the knowledge source as well as of the amount of training data used. Results show that the link-based approach performs best if the amount of training data is huge. In a more realistic setting with fewer training data, the text-based approach yields better results.
Combining heterogeneous knowledge resources for improved distributional semantic models

Szarvas, György, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 5th IEEE International Conference on Semantic Computing (IEEE-ICSC) Sep 2011

Abstract PDF

The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term cooccurrence to improve over the Wikipedia-based measure. Exploiting the improved robustness and coverage of the proposed combination, we report improved performance over single resources in word semantic relatedness, solving word choice problems, classification of semantic relations between nominals, and text similarity.
Helping Our Own 2011: UKP Lab System Description

Zesch, Torsten

In Proceedings of the Helping Our Own Working Group Session at the 13th European Workshop on Natural Language Generation Sep 2011

PDF

2010

The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures

Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the Seventh International Conference on Language Resources and Evaluation Sep 2010

PDF
2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

Gurevych, Iryna, and Zesch, Torsten

In Sep 2010

PDF
Wisdom of Crowds versus Wisdom of Linguists - Measuring the Semantic Relatedness of Words

Zesch, Torsten, and Gurevych, Iryna

Journal of Natural Language Engineering Sep 2010

PDF
Effektivere Informationssuche im World Wide Web

Schwarz, Christopher Kim, Keith, Nina, Gurevych, Iryna, Erbs, Nicolai, and Zesch, Torsten

Journal of Natural Language Engineering Sep 2010

PDF

2009

Proceedings of the Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

Gurevych, Iryna, and Zesch, Torsten

Journal of Natural Language Engineering Sep 2009
Discovering Links Using Semantic Relatedness

Hoffart, Johannes, Bär, Daniel, Zesch, Torsten, and Gurevych, Iryna

In INEX 2009 Workshop Preproceedings Sep 2009

Abstract PDF

We present our approaches for link discovery in document collections with or without existing links. In collections containing links, we discover links using measures of link anchor ranking based on existing links. In collections without links, we gather noun phrases as anchor candidates. To discover targets, we use a measure of semantic relatedness between texts. We nd that semantic relatedness is useful to identify targets for ambiguous link anchors. In collections that contain no existing links, using only document titles as anchor candidates can be enhanced by using arbitrary noun phrases extracted from documents.
Semantic relations in a bilingual corpus of different registers

Čulo, Oliver, Kunz, Kerstin, and Zesch, Torsten

In Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Workshop on Corpus, Colligation, Register Variation Sep 2009

PDF
An Architecture to Support Intelligent User Interfaces for Wikis by Means of Natural Language Processing

Hoffart, Johannes, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the International Symposium on Wikis and Open Collaboration (WikiSym ’09) Sep 2009

PDF
Approximate Matching for Evaluating Keyphrase Extraction

Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing Sep 2009

PDF
Study of Semantic Relatedness of Words Using Collaboratively Constructed Semantic Resources

Zesch, Torsten

In Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing Sep 2009

PDF

2008

Flexible UIMA Components for Information Retrieval Research

Müller, Christof, Zesch, Torsten, Müller, Mark-Christoph, Bernhard, Delphine, Ignatova, Kateryna, Gurevych, Iryna, and Mühlhäuser, Max

In Proceedings of the LREC 2008 Workshop ’Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP’ Sep 2008

PDF
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary

Zesch, Torsten, Müller, Christof, and Gurevych, Iryna

In Proceedings of the 6th International Conference on Language Resources and Evaluation Sep 2008

PDF
Selbstorganisierende Wikis

Gurevych, Iryna, and Zesch, Torsten

In Proceedings of KnowTech Sep 2008

PDF
Representational Interoperability of Linguistic and Collaborative Knowledge Bases

Garoufi, Konstantina, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the KONVENS Workshop on Lexical-Semantic and Ontological Resources – Maintenance, Representation, and Standards Sep 2008

PDF
Graph-Theoretic Analysis of Collaborative Knowledge Bases in Natural Language Processing

Garoufi, Konstantina, Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the Poster Session of the 7th International Semantic Web Conference Sep 2008

PDF
Using Wiktionary for Computing Semantic Relatedness

Zesch, Torsten, Müller, Christof, and Gurevych, Iryna

In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence Sep 2008

PDF
Using Similarity Measures for Context-Aware User Interfaces

Hartmann, Melanie, Zesch, Torsten, Mühlhäuser, Max, and Gurevych, Iryna

In Proceedings of the 2nd IEEE International Conference on Semantic Computing Sep 2008

PDF

2007

Analysis of the Wikipedia Category Graph for NLP Applications

Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) Sep 2007

PDF
Cross-lingual Distributional Profiles of Concepts for Measuring Semantic Distance

Mohammad, Saif, Gurevych, Iryna, Hirst, Graeme, and Zesch, Torsten

In Proceedings of EMNLP-CoNLL Sep 2007

PDF
Darmstadt Knowledge Processing Repository Based on UIMA

Gurevych, Iryna, Mühlhäuser, Max, Müller, Christof, Steimle, Jürgen, Weimer, Markus, and Zesch, Torsten

In Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology Sep 2007

PDF
Teaching “Unstructured Information Management: Theory and Applications” to Computational Linguistics Students

Gurevych, Iryna, Müller, Christof, and Zesch, Torsten

In Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology Sep 2007

PDF
Analyzing and Accessing Wikipedia as a Lexical Semantic Resource

Zesch, Torsten, Gurevych, Iryna, and Mühlhäuser, Max

In Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology Sep 2007

PDF
Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets

Zesch, Torsten, Gurevych, Iryna, and Mühlhäuser, Max

In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2007) Sep 2007

PDF
What to be? - Electronic Career Guidance Based on Semantic Relatedness

Gurevych, Iryna, Müller, Christof, and Zesch, Torsten

In Proceedings of ACL Sep 2007

PDF

2006

Automatically Creating Datasets for Measures of Semantic Relatedness

Zesch, Torsten, and Gurevych, Iryna

In Proceedings of the COLING/ACL Workshop on Linguistic Distances Sep 2006

PDF