Introduction to Clustering & Unsupervised Learning
Clustering is an unsupervised learning method that groups data points into clusters based on their similarity.
Unsupervised Learning: Unlike supervised learning, there’s no “label” or “answer” given. The model learns the structure from the data.
K-means Clustering
A clustering method that divides a dataset into ‘k’ number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
Steps:
Choose the number ‘k’ of clusters.
Select random centroids for each cluster.
Assign each data point to the nearest centroid.
Recalculate the centroid for each cluster.
Repeat steps 3-4 until there are no changes in the assigned clusters or a set number of iterations is reached.
Fast and efficient for large datasets.
Produces tighter clusters than hierarchical clustering.