15th August 2015
Spark is currently one of the hottest open source projects in the big data
space, even eclipsing Hadoop in terms of excitement. Originally a Berkley AMPlab
project, Spark became a top-level Apache project early last year and has b…
14th July 2015
This post is part of a series covering the exercises from Andrew Ng's machine
learning [https://www.coursera.org/course/ml] class on Coursera. The original
code, exercise text, and data files for this post are available here
[…
16th June 2015
In this blog post we'll examine some common techniques used in time series
analysis by applying them to a data set containing daily closing values for the
S&P 500 stock market index from 1950 up to present day. The objecti…
23rd May 2015
The topic of this post is perhaps a bit mundane, but after spending a
considerable amount of time getting this right, I decided to put together a
step-by-step guide so I would remember how to do it next time. And since I
already w…
2nd May 2015
One of the deep, dark secrets of machine learning is that beneath all of the
math, statistics, and algorithms, there's sort of a "black art" to actually
building useful models. I think the reason for this stems prima…