Dimensionality reduction in Spark

spark
pca
tsne
machine learning
logs
Published

February 16, 2016

Here’s a quick video I put together introducing infrastructure log processing in Spark. At the end, there are a couple of nice graphs contrasting PCA and t-SNE for embedding high-dimensional log metadata into two dimensions.

Log processing demo from William Benton on Vimeo.