This talk presents a new tool for exploratory text analysis that attempts to provide frictionless access to text and its metadata. The design of the tool was motivated by the unmet need for text analysis tools in the humanities and social sciences. In these fields, it is common for scholars to have hundreds, even thousands of text-based source documents of interest from which they extract evidence for complex arguments about society and culture. These collections are difficult to make sense of and navigate. Unlike numerical data, they cannot be condensed, overviewed, and summarized in an automated fashion without losing significant information. And the metadata that accompanies the documents -- often from library records -- does not capture the varied content of the of the text within.
Further, adoption of computational tools remains low among these scholars despite such tools having existed for decades. A recent study found that the main culprits were poor user interfaces and lack of communication between tool builders and tool users. We therefore took an iterative, user-centered approach to the development of the tool. We collaborated with active scholars and students working on text analysis problems of existing interest to them, and let their needs, behaviors, and feedback drive development.
The system we designed, called WordSeer, supports highly flexible slicing and dicing, as well as frictionless transitions between visual analyses, drill-downs, lateral explorations and overviews of slices in a text collection. The tool uses computational linguistics, information retrieval and data visualization. As a measure of success, we report how it has enabled humanities and social science scholars to conduct analyses yielding otherwise inaccessible results useful to their research.
This talk will describe WordSeer, a fully functional, proven useful exploratory text analysis system, present a tool-builder's model of how humanities and social science scholars undertake exploratory text analysis during the course of their work, and give design guidelines for future systems.