Tiny functions for codecs, compilation, and (maybe) soon everything
Colloquium | January 25 | 12-1 p.m. | 430 Soda Hall
Keith Winstein, Stanford
Networks, applications, and media codecs frequently treat one another as strangers. By expressing large systems as compositions of small, pure functions, we've found it's possible to achieve tighter couplings between these components, improving performance without giving up modularity or the ability to debug. I'll discuss our experience with systems that demonstrate this basic idea: ExCamera (NSDI 2017) parallelizes video encoding into thousands of tiny tasks, each handling a fraction of a second of video, much shorter than the interval between key frames, and executing in parallel on AWS Lambda. This was the first system to demonstrate "burst-parallel" thousands-way computation on functions-as-a-service infrastructure. Salsify (NSDI 2018) is a low-latency network video system that uses a purely functional video codec to explore execution paths of the encoder without committing to them, allowing it to closely match the capacity estimates from a video-aware transport protocol. This architecture outperforms more loosely-coupled applications -- Skype, Facetime, Hangouts, WebRTC -- in delay and visual quality, and suggests that while improvements in video codecs may have reached the point of diminishing returns, video systems still have low-hanging fruit. Lepton (NSDI 2017) uses a purely functional JPEG/VP8 transcoder to compress images in parallel across a distributed network filesystem with arbitrary block boundaries. This free-software system is in production at Dropbox and has compressed, by 23%, more than 200 petabytes of user JPEGs.
Based on our experience, we propose a general abstraction for outsourced morsels of computation, called cloud "thunks" -- stateless closures that describe their data dependencies by content-hash. We have created a tool that uses this abstraction to capture off-the-shelf Makefiles and other build systems, letting the user treat a FaaS service like an outsourced build farm with global memoization of results. The bottom line: expressing systems and protocols as compositions of small, pure functions will lead to a new wave of "general-purpose" lambda computing, permitting us to transform many time-consuming operations into large numbers of functions executing with massive parallelism for short durations in the cloud.