5.2 Fuzzy C-Means Clustering

  1. Motivations
    • Classification vs. clustering
    • Here is an example.
  2. Clustering    
    • Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
    • It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
  3. Fuzzy C-Means Clustering
    • Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster. You may think of memberships to clusters.
    • Algorithm
      let m be in [1, ∞);  // Note m is usually 2. m controls the level of fuzziness.
      initialize the means (oac centroids or prototypes) vector C=[cj];
      while the change is not insignificant
          Update membership matrix U=[uij] with C:
              for point i and cluster j,
                  uij = 1 / (Σk((∥xi-cj∥ / ∥xi-ck∥)2/(m-1))
          Update C with U:
              for cluster j,
                  cj = Σi(uijm · xi) / Σi(uijm);
      
  4. Exercise



    • Exercise 1: Let's generate NO_POINTS=40 points in the above canvas, using draw().



    • Exercise 2: Let's decide 3 random means using a random generator. The x-value should be in [0, MAP_WIDTH), and y-value should be in [0, MAP_HEIGHT). This exercise should be done after Exercise 1.



    • Exercise 3: Assignment step. This exercise should be done after Exercise 2.


  5. How to determine k? How to determine m?
  6. Other clustering algorithms?