Kompetenzstufe: Fortgeschrittene*r
Sprache: Englisch
Format: Code, Tutorial, Selbstlerneinheit
Medientyp: Code Snippets, Daten, Textmedien
Veröffentlichung: 15.08.2025
ID:
® 10.71627/tubecleanR
tubecleanR
Johannes Breuer
, Julian Kohne
Mitwirkende: This tool was developed as part of the KODAQS project - a partnership between GESIS and the University of Mannheim and LMU Munich.
The tubecleanR tool implemented as R package provides functions for cleaning and preprocessing YouTube comment data collected using the R packages tuber or vosonSML. It addresses potential measurement errors by offering structured routines for handling typical challenges, such as separating text, emoticons, and paradata. This helps researchers prepare high-quality datasets for analysis. A tutorial demonstrates its use on a synthetic dataset generated with Google Gemini, replicating the structure of real YouTube comment data.
Diese Ressource steht unter folgender Lizenz:
Creative Commons Attribution NonCommercial 4.0 International (CC-BY-NC-4.0)