Introduction to Clustering & Unsupervised Learning

  • ¬†Clustering is an unsupervised learning method that groups data points into clusters based on their similarity.
  • Unsupervised Learning: Unlike supervised learning, there’s no “label” or “answer” given. The model learns the structure from the data.

K-means Clustering

  • A clustering method that divides a dataset into ‘k’ number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
  • Steps:
    1. Choose the number ‘k’ of clusters.
    2. Select random centroids for each cluster.
    3. Assign each data point to the nearest centroid.
    4. Recalculate the centroid for each cluster.
    5. Repeat steps 3-4 until there are no changes in the assigned clusters or a set number of iterations is reached.
    • Fast and efficient for large datasets.
    • Produces tighter clusters than hierarchical clustering.
    • Applications: Market segmentation, image compression, anomaly detection.
  1. Differences between supervised and unsupervised learning.
  2. Explore the impact of ‘k’ value in K-means

Leave a Reply

Your email address will not be published. Required fields are marked *