While the scale of data our monitoring systems produce has been increasing, the techniques our systems use to process it has to evolve. We need to start leveraging more sophisticated data structures, and sketches are one option.
Sketching data structures are probabilistic structures that store a summary of the full dataset. They’re specialized to answer specific questions — how many unique values a large dataset contains, or what the p95 of the dataset is. By leveraging some neat mathematical properties, sketching data structures trade off some accuracy for a significant increase in both computational and storage efficiency.
In this talk, Kiran will cover the workings of a few basic sketching data structures, and will provide a few examples of how Stripe uses them in our metrics pipeline.
Kiran is a software engineer at Stripe. At work, she’s thinks a lot about distributed systems fallacies and how we can observe what our software is doing. A normal day working with Kiran involves conversations about operating distributed systems and learning that she made that awesome space dress she’s wearing.