DevOps for ML
 How is it different from traditional Software Pipelines?
 Containers  with GPUs and Cuda
 Multiple code bases  Train, Learning etc
 Skills  Data Science vs DevOps
 ROI and success of machine learning projects
DevOps for ML and other HalfTruths: Processes and Tools for the ML Life Cycle
Cluster Analysis for Customer Segments
Jul 31 2019
Kmeans, Silhoutte method, Elbow method, Cosine Distance, MiniBatchKMeans, BIRCH, DBSCAN, OPTICS, PCA. tdistributed Stochastic Neighbor Embedding (tSNE)
One thing to note, since kMeans typically uses Euclidean distance to calculate the distances it does not work well with high dimensional data sets due to the curse of dimensionality. This curse, in part, states that Euclidean distances at high dimensionality have very little meaning since they are often very close together.

Using Cosine distance when dimensionality is high?

Faster MiniBatchKMeans and BIRCH, not very accurate.

https://medium.com/predict/threepopularclusteringmethodsandwhentouseeach4227c80ba2b6

minPts should be at least the number of features

OPTICS does is similar to DBSCAN but does not need eps to be set

Visualizing clusters use PCA or tdistributed Stochastic Neighbor Embedding (tSNE)

PCA https://builtin.com/datascience/stepstepexplanationprincipalcomponentanalysis