At the Conference, hosted by GESIS Leibniz Institute for the Social Sciences, July 10-13, 2017, Cologne, Cornelius Puschmann
will co-present a Tutorial "Topic modeling European political debates with the EUSpeech dataset"
"Topic modeling European political debates with the EUSpeech dataset"
TATJANA SCHEFFLER, University of Potsdam
DAMIAN TRILLING, University of Amsterdam
CORNELIUS PUSCHMANN, Hans Bredow Institute for Media Research
The tutorial provides a concise and hands-on introduction to topic modeling, an increasingly popular method in computational social science (Blei, 2012; DiMaggio, 2015; Puschmann & Scheffler, 2016). While a growing number of packages in widely used programming languages are at the disposal of researchers, there are a number of caveats to consider when deploying topic models in the research process, both on the technical level and in terms of research design, such as which algorithm to rely on, how to set parameters, and in which ways to preprocess data. Interpreting the output generated by popular algorithms such as Latent Dirichlet Allocation (LDA; Blei, Ng & Jordan, 2003), and evaluating the validity of topic model statistics are key challenges for social scientists interested in using topic modeling, as is the question of successfully embedding topic models in a research design in a fashion that allows the testing of concrete hypotheses. In a series of eight compact segments, we will both provide a user-friendly description of the workflow in both Python and R for the practical application of topic models to our example data, the EUspeech corpus (Schumacher et al., 2016), and discuss the conceptual basis of topic models along with approaches for the evaluation of topic model fit and the validity of the results generated with them. We expect an audience of both social scientists and computational researchers, mostly at the PhD student and postdoctoral level. Our approach will be research-oriented, involving an overview of relevant packages in both Python and R, in addition to our own scripts (also in both languages) which will be shared via Github. We aim to be both highly practical and language-agnostic by focusing on what topic models do, how they do it, and what questions can be studied effectively using them. We expect some familiarity with programming for those participants who want to apply topic modeling in their own research, but the segments on how to interpret topic models should be equally relevant for those with and without programming knowledge.