Unifying deep learning with item response theory: interval measurement, annotator debiasing, efficiency, and explainability

Colloquium | December 2 | 4-5:30 p.m. | Berkeley Way West, Room 1102, Berkeley Way West (2121 Berkeley Way, Berkeley, CA 94720)

 Claudia von Vacano, Executive Director, D-Lab

 Graduate School of Education

Outcome phenomena are typically measured at the binary level: a comment is toxic or not, an image has sexual content or it doesn’t, a patient is healthy or deceased. But the real world is more complex: most target variables are inherently continuous, just as physical quantities such as temperature and weight can be measured on continuous scales. How can we achieve the same kind of meaningful, continuous scales for arbitrary outcomes?

We propose a method for measuring phenomena as continuous variables by unifying deep learning with the Constructing Measures approach to Rasch item response theory (IRT). The crux of our method is decomposing the target construct into multiple components measured as ordinal survey items, which are then transformed into a continuous measure of unprecedented quality. In particular, we estimate first-order labeler bias and eliminate its influence on the final construct, which renders obsolete the notion of inter-rater reliability as a quality metric. To our knowledge, this IRT bias adjustment has never before been implemented in machine learning, but it is critical for algorithmic fairness. We further estimate the response quality of each individual labeler, allowing responses from low-quality labelers to be removed.

Our IRT scaling procedure fits naturally into multi-task, weight-sharing deep learning architectures in which our theorized components of the target outcome are used as latent variables for the neural networks’ internal representation learning. This approach improves sample efficiency and promotes generalizability. Built-in explainability is an inherent advantage of our method because the final numeric prediction can be directly explained by the predictions on the components.

We demonstrate our method on a new dataset of 50,000 online comments labeled to measure a spectrum from hate speech to counter-speech, and sourced from YouTube, Twitter, and Reddit. We evaluate Universal Sentence Encoders, RoBERTa, and XLNet as contextual representation models for the comment text, and benchmark our predictive accuracy against Google Jigsaw’s Perspective API models.

About the speaker. Dr. Claudia von Vacano is the Executive Director of the D-Lab and the Digital Humanities at Berkeley, and is on the boards of the Social Science Matrix and Berkeley Center for New Media. She has worked in policy and educational administration since 2000, and at the UC Office of the President and UC Berkeley since 2008.

While working at various educational institutions, she managed multi-million dollar budgets and successfully implemented large-scale projects. She is also the lead online course developer of the SAGE Campus Introduction to Applied Data Science Methods for Social Scientists. Claudia has created a meta-organization at UC Berkeley: Data Science for Social Good. Under this rubric, she is leading an online hate speech research project with the support of the Anti-defamation League that employs machine learning. She is deeply committed and invested in supporting diversity in data science through a partnership with the Data Science Division and the Data Scholars program. The D-Lab is also working closely with the College Futures Foundation on college going patterns and career success. Career Pathways is an area that Claudia has worked on at the UC Office of the President and within the context of committee work investigating Next Generation opportunities for PhDs within and beyond academia. The Career Pathways work has been undertaken with the leadership of Dean Anthony J. Cascardi, Dean AnnaLee Saxenian, and Data Science Division Faculty Lead Cathryn Carson.

She received a Master’s degree from Stanford University in Learning, Design, and Technology. Her doctorate is in Policy, Organizations, Measurement, and Evaluation from UC Berkeley. Her expertise is in organizational theory and behavior and in educational and language policy implementation. The Phi Beta Kappa Society, the Andrew W. Mellon Foundation, the Rockefeller Brothers Foundation, and the Thomas J. Watson Foundation, among others, have recognized her scholarly work and service contributions.