DevOps for ML
- How is it different from traditional Software Pipelines?
- Containers - with GPUs and Cuda
- Multiple code bases - Train, Learning etc
- Skills - Data Science vs DevOps
- ROI and success of machine learning projects
Cluster Analysis for Customer Segments
Jul 31 2019
Kmeans, Silhoutte method, Elbow method, Cosine Distance, MiniBatchKMeans, BIRCH, DBSCAN, OPTICS, PCA. t-distributed Stochastic Neighbor Embedding (t-SNE)
One thing to note, since k-Means typically uses Euclidean distance to calculate the distances it does not work well with high dimensional data sets due to the curse of dimensionality. This curse, in part, states that Euclidean distances at high dimensionality have very little meaning since they are often very close together.
Using Cosine distance when dimensionality is high?
Faster MiniBatchKMeans and BIRCH, not very accurate.
minPts should be at least the number of features
OPTICS does is similar to DBSCAN but does not need eps to be set
Visualizing clusters use PCA or t-distributed Stochastic Neighbor Embedding (t-SNE)