Learning with no labels!
K-Means: Cluster data into k clusters
Step 1: Randomly initialize k clusters ($\mu_{k}$)
Step 2: Fix the cluster centers, and assign points to clusters
Step 3: Fix the cluster membership, and update cluster centers
Repeat steps 2 and 3 until convergence/stopping criteria met
K is a hyperparameter
Prior knowledge
Visualize your data (if possible)
NOTE: If let k increase without penalty, error will always decrease
K-Means guarantees a local optima if run to convergence
Better cluster initialization
Shown to converge more quickly than K-Means
Instead of randomly initializing the k clusters, tries to spread out the clusters
The rest of the algorithm is same as K-Means
Spark implements a parallel version of cluster initialization called K-Means||