RISE Seminar: Strategies for integrating people and machine learning in online systems

Seminar | September 21 | 12:30-1:30 p.m. | Soda Hall, Wozniak Lounge

 Jason Laska, Clara Labs


Clara Labs is an email-based scheduling service for busy people. Simply CC Clara on an email to a person you want to meet with, and we'll handle the back and forth game of email-tag for you in accordance with your preferences. To build a robust and accurate system that gracefully handles nuanced requests, we've combined machine learning (ML) with a distributed human labor force. This system enables a single person to schedule consistently for an unbounded number of customers, regardless of worker location or lack of a priori customer context.

A partially-automated system has clear benefits, such as increased accuracy and decreased cost (i.e., increased scalability). Further, human input to the system leads to new annotations for retraining algorithms. We’ve also found that there are great advantages to vertically integrating the ML annotation process directly with the product, e.g., the fidelity of labelled data increases when the annotator understands what actions will be derived directly from their work.

Despite these advantages, there are several distinct challenges to building such a system: annotators are noisy and may be biased by bad ML predictions (if displayed), there tends to be an inverse relationship between speed of data entry and annotator accuracy, and the learning curve for using a unique data-entry system may be high. In fact, simply measuring accuracy in the system may be challenging depending on time and cost constraints.

In this talk we'll discuss incentives and algorithms for increasing both the accuracy and speed of human operators, measuring their performance, strategies for dealing with task ambiguity, and tricks for building an effective ramping system to onboard workers. These topics will be covered in the context of bounded time and cost resource constraints. We will further discuss the "automation spectrum," i.e., the automation subtasks that can be surfaced to people and how they can be leveraged for progressive cost and speed gains over time.

 sanjay@eecs.berkeley.edu, 4082210207