Sketching Data and Other Magic Tricks

Published

September 24, 2019

Presented at Strata Data NYC (New York, New York)

This hands-on tutorial explores ways to answer interesting queries about truly massive datasets almost instantly and with a fixed amount of space. It sounds like magic, but you’ll go hands-on to learn about the sketching data structures that work this magic and the key trick that makes them possible. Sophie and William introduce truly scalable techniques for several fundamental problems like set membership, set and document similarity, counting kinds of events, and counting distinct elements. You’ll learn how and when to use these structures as well as how they work. You’ll see how the same techniques work for parallel, distributed, and stream processing at scale. You’ll leave able to put these techniques to work in real data engineering and machine learning applications like join processing, document classification, and content personalization.

Slides Handout