Turning big corpora into big data – Digital Humanities at scale

Lecture | September 7 | 12-1 p.m. | 200C Warren Hall

 Adam Anderson, Mellon Postdoctoral Fellow in Digital Humanities

 Social Sciences Data Lab

Adam Anderson, a Mellon Postdoctoral Fellow in Digital Humanities at UC Berkeley, has spent his career scanning texts in order to draw upon secondary literature in archaeology and computational linguistics. He'll describe the OCR project he is working on (described in this article), as well as how he intends to utilize the generated data: "My plan is to use a combination of analytical tools, including network analysis and text analysis to 'clean' and 'tidy' the data, and to extract the primary sources using the Hit-list that Maurice Manning set up. I will then use the networks I generate to build content-specific datasets within the sub-fields of Near Eastern studies. The secondary sources can then be anonymized for each author and used openly for teaching and research, similar to the methods Underwood has described."