Analyzing Log Data With Apache Spark

Published

June 8, 2016

Presented at Spark Summit (San Francisco, CA)

Contemporary applications and infrastructure software leave behind a tremendous volume of metric and log data. This aggregated “digital exhaust” is inscrutable to humans and difficult for computers to analyze, since it is vast, complex, and not explicitly structured. This session will introduce the log processing domain and provide practical advice for analyzing log data with Apache Spark, including:

You’ll have a better understanding of the unique challenges posed by infrastructure log data after this session. You’ll also learn the most important lessons from our efforts both to develop analytic capabilities for an open-source log aggregation service and to evaluate these at enterprise scale.

Talk video Slides