Excerpts in Data Science

DevOps for ML

  • How is it different from traditional Software Pipelines?
  • Containers - with GPUs and Cuda
  • Multiple code bases - Train, Learning etc
  • Skills - Data Science vs DevOps
  • ROI and success of machine learning projects

DevOps for ML and other Half-Truths: Processes and Tools for the ML Life Cycle

TDS Team| Kenny Daniel


Cluster Analysis for Customer Segments

Jul 31 2019

Kmeans, Silhoutte method, Elbow method, Cosine Distance, MiniBatchKMeans, BIRCH, DBSCAN, OPTICS, PCA. t-distributed Stochastic Neighbor Embedding (t-SNE)

One thing to note, since k-Means typically uses Euclidean distance to calculate the distances it does not work well with high dimensional data sets due to the curse of dimensionality. This curse, in part, states that Euclidean distances at high dimensionality have very little meaning since they are often very close together.

Cluster Analysis: Create, Visualize and Interpret Customer Segments ยท

Maarten Grootendorst |Maarten Grootendorst