Fuzzy C-Means Clustering

Motivations
- Classification vs. clustering
- Here is an example.
Clustering
- Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
- It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster. You may think of memberships to clusters.

Algorithm

let m be in [1, ∞);  // Note m is usually 2. m controls the level of fuzziness.
initialize the means (oac centroids or prototypes) vector C=[c_j];
while the change is not insignificant
    Update membership matrix U=[u_ij] with C:
        for point i and cluster j,
            u_ij = 1 / (Σ_k((∥x_i-c_j∥ / ∥x_i-c_k∥)^2/(m-1))
    Update C with U:
        for cluster j,
            c_j = Σ_i(u_ij^m · x_i) / Σ_i(u_ij^m);

Exercise
- Exercise 1: Let's generate NO_POINTS=40 points in the above canvas, using draw().
  <script> NO_POINTS = 40; draw(); </script>
- Exercise 2: Let's decide 3 random means using a random generator. The x-value should be in [0, MAP_WIDTH), and y-value should be in [0, MAP_HEIGHT). This exercise should be done after Exercise 1.
  <script> function initialize_means() { var means = []; means[0] = {x: Math.random() * MAP_WIDTH, y: Math.random() * MAP_HEIGHT}; means[1] = {x: Math.random() * MAP_WIDTH, y: Math.random() * MAP_HEIGHT}; means[2] = ??? return means; } means = initialize_means(); draw_means(); </script>
- Exercise 3: Assignment step. This exercise should be done after Exercise 2.
  <script> function distance(point1, point2) { return ???(Math.pow(point1.x - point2.x, 2) + ???); } function update_memberships() { for (var i = 0; i < NO_POINTS; i++) for (var j = 0; j < 3; j++) { var sum = 0; for (var k = 0; k < 3; k++) sum += Math.pow(distance(points[i], means[j]) / ???, 2); memberships[i][j] = ??? } } function update_means() { for (var j = 0; j < 3; j++) { var sum_upper_x = 0; var sum_upper_y = 0; var sum_lower = 0; for (var i = 0; i < NO_POINTS; i++) { sum_upper_x += Math.pow(memberships[i][j], 2) * points[i].x; sum_upper_y += ??? } for (var i = 0; i < NO_POINTS; i++) sum_lower += Math.pow(???, 2); means[j].x = sum_upper_x / sum_lower; means[j].y = ??? } } update_memberships(); update_means(); draw_clusters(); draw_means(); </script>
How to determine k? How to determine m?
- How to determine the number of clusters?
- Read "Choosing K" from Introduction to K-means Clustering
Other clustering algorithms?
- Expectation-maximization algorithm and an example
- Hierarchical clustering
- The 5 Clustering Algorithms Data Scientists Need to Know includes interesting examples with explations.

5.2 Fuzzy C-Means Clustering