Dissertation Talk: Building Interactive Query Systems at Scale
Seminar | May 9 | 10-11 a.m. | 465H Soda Hall
Anurag Khandelwal, University of California at Berkeley
Modern cloud data services aim to support increasingly sophisticated queries with interactive response times. These services can be broadly divided into two categories: read-intensive applications such as web services, and write-intensive applications such as real-time monitoring of event streams. In both cases, supporting sophisticated queries interactively and at scale raises significant challenges. As a result, existing systems either compromise on functionality or interactivity when deployed at scale. In this talk, I will present two systems that address these challenges for read-intensive and write-intensive applications, respectively.
First, I will talk about BlowFish, a distributed data store that admits a smooth tradeoff between storage and performance, and is able to dynamically navigate this tradeoff at fine-grained timescales. This unique ability allows BlowFish not only to perform interactive queries on data sizes larger than memory capacity, but also leads to previously unachievable operating points in the system design space. In fact, BlowFish can efficiently handle skew in query workloads, and even dynamically adapt to changes in skew. I will then talk about Confluo, a system for real-time monitoring and diagnosis of high throughput data streams. Confluo exploits workload characteristics to design a new data structure Atomic MultiLog that supports efficiently updating a collection of lock-free concurrent logs as a single atomic operation, while supporting rich online and offline queries for monitoring and diagnosis. Confluo also supports a wide range of real-time streaming applications, ranging from network monitoring and diagnosis to distributed messaging and time-series databases.
Both BlowFish and Confluo are open sourced. While BlowFish already being used in production, Confluo is in early stages of adoption.