Kompetenzstufe: Fortgeschrittene*r
Sprache: Englisch
Format: Tutorial, Selbstlerneinheit, Quellcode
Medientyp: Textmedien, Daten
Veröffentlichung: 08.01.2026
ID:
® 10.71627/subdata
SubData
Siyu Zhang, Leon Fröhling
Mitwirkende: This tool was developed as part of the KODAQS project - a partnership between GESIS and the University of Mannheim and LMU Munich.
The SubData tool, implemented as a Python library, evaluates the alignment between large language models (LLMs) and human perspectives in subjective annotation tasks. It is particularly relevant during data preprocessing, where it helps to improve data quality by harmonizing heterogeneous datasets, providing standardized keyword mappings and taxonomies, and enabling theory-driven analyses of perspective alignment. In addition, SubData includes ten curated datasets that can be used to identify model biases, test generalizability, and contextualize hate speech datasets within the broader literature.
Diese Ressource steht unter folgender Lizenz:
Creative Commons Attribution NonCommercial 4.0 International (CC-BY-NC-4.0)