Sketching data and other magic tricks

Materials from a tutorial on some very cool data structures.
data science
sketching
python
Published

September 25, 2019

I had a lot of fun presenting a tutorial at Strata Data NYC with my teammate Sophie Watson yesterday. In just over three hours, we covered a variety of hash-based data structures for answering interesting queries about large data sets or streams. These structures all have the following properties:

I’ve been interested in these sorts of structures for a while and it was great to have a chance to develop a tutorial covering the magic of hashing and some fun applications like Sophie’s recent work on using MinHash for recommendation engines.

If you’re interested in the tutorial, you can run through our notebooks at your own pace.