Matthew Tyler - Getting More out of Human Coders with Statistical Models (Job Talk Practice)
Human-coded data is the basis for research and decisions in the social sciences, industry, government, and beyond. But coders often make mistakes, are bi-
ased, or are unmotivated, and such errors render standard coding methods unreliable and inefficient. In this paper, I explain how explicitly modeling coder mistakes helps
to overcome errors in human coding — increasing the efficiency and reducing the bias of the final analysis. I introduce a new model, the biased-annotator competence
estimation (BACE), as a default coding model for typical social science coding tasks. I prove the conditions for identification of the key parameters of interest for this new
model and clarify the identification conditions for several models widely used across computer science and statistics. In simulations, I show that BACE serves as a viable
default model for human coders. In an application, I show that, once corrected with BACE, there is 2-3 times more partisan polarization in the discussion of presidential
candidates than would have been found using conventional hand-coding methods. I provide an easy-to-use R package that makes these models immediately applicable to
coding tasks.
Matthew Tyler is a Ph.D. Candidate in political science at Stanford University. He researches media polarization and statistical methods.