Holistic Optimization of Data-Intensive Applications

Seminar | March 7 | 1-2 p.m. | Soda Hall, 430-438 Wozniak Lounge

 Alvin Cheung, Assistant Professor, University of Washington

 Electrical Engineering and Computer Sciences (EECS)

From social networking websites to bank transactions, we interact with data-intensive applications every day. Such applications rely on databases for persistent storage, but the strong separation between the application and the database layer makes it difficult to satisfy end-to-end goals such as performance and correctness. For instance, each large-scale data processing system (e.g., relational databases, Spark, TensorFlow, etc) specializes on different workloads and exposes different domain-specific language or programming interface for applications to utilize. It is very difficult for application developers to decide which system to use for their application, and making the wrong choice will result in drastic performance hit. Building new data processing systems is not easy either, as the lack of tools often results in developers repeating the same work (and bugs!) when building new ones.

In this talk, I will show how examining the programming system and the database management system in tandem allows developers to build data-intensive applications and systems that are both performant and correct. To illustrate, I will discuss three projects: verified lifting, a methodology that enables applications to leverage optimization exposed by different programming interfaces; Cosette, a tool for system developers to validate the correctness of their query transformations; and I will briefly mention Cuttlefish, a system that adaptively chooses among different data operator implementations using machine learning. Using real-world examples, I will show that these tools enable system builders to reason about the correctness of their optimization, and allow orders of magnitude performance improvement while preserving the same programming interface to the developer. Our work is currently being deployed in the industry and has been used by hundreds of students at the University of Washington. To conclude, I will describe how these projects open up new opportunities towards generating application-specific database systems.

Alvin Cheung is an assistant professor in the Allen School of Computer Science & Engineering at the University of Washington, affiliated with the programming languages and database research groups. His research focuses on designing new techniques to solve systems and end-user programming problems. Alvin earlier received the George M. Sprowls Award for outstanding dissertation in computer science at MIT. Since joining Washington in 2015, he has received various awards, including the Department of Energy Early Career Research Award and the NSF CAREER Award. He and his students have also won best paper and demo awards at multiple conferences and workshops.