Sketching Big Data

Lecture: Theory of Computing: CS: Data Science | November 5 | 4-5 p.m. | Sutardja Dai Hall, Banatao Auditorium

 Jelani Nelson, Harvard Universty

 Simons Institute for the Theory of Computing

A "sketch" is a data structure supporting some pre-specified set of queries and updates to a database while consuming space substantially (often exponentially) less than the information theoretic minimum required to store everything seen, and thus can also be seen as some form of functional compression. The advantages of sketching include less memory consumption, faster algorithms, and reduced bandwidth requirements in distributed computing environments.

This talk will touch on some of the magic made possible by sketching techniques, such as:

• (Approximately) counting up to an integer N in exponentially less memory than what's required to actually write the digits of N down.

• (Approximately) computing the number of distinct words ever appearing in any of Shakespeare's works, via a method that reads through them all once while only remembering 3 lines' worth of text in memory at any given time.

• Detecting trending keywords queried to a search engine, such as 'covfefe', or 'cambridge analytica', while never remembering more than a negligible fraction of the query stream seen thus far.

Light refreshments will be served before the lecture at 3:30 p.m.

 simonsevents@berkeley.edu