The k-means clustering algorithm is an unsupervised machine studying approach that seeks to group related knowledge into distinct clusters to uncover patterns within the knowledge that is probably not obvious to the bare eye.

It’s presumably essentially the most extensively identified algorithm for knowledge clustering and is applied within the OpenCV library.

On this tutorial, you’ll discover ways to apply OpenCV’s k-means clustering algorithm for coloration quantization of photographs.

After finishing this tutorial, you’ll know:

What knowledge clustering is inside the context of machine studying.

Making use of the k-means clustering algorithm in OpenCV to a easy two-dimensional dataset containing distinct knowledge clusters.

Easy methods to apply the k-means clustering algorithm in OpenCV for coloration quantization of photographs.

Kick-start your undertaking with my guide Machine Studying in OpenCV. It offers self-study tutorials with working code.

Let’s get began.

## Tutorial Overview

This tutorial is split into three components; they’re:

Clustering as an Unsupervised Machine Studying Activity

Discovering k-Means Clustering in OpenCV

Coloration Quantization Utilizing k-Means

## Clustering as an Unsupervised Machine Studying Activity

Cluster evaluation is an unsupervised studying approach.

It entails mechanically grouping knowledge into distinct teams (or clusters), the place the info inside every cluster are related however totally different from these within the different clusters. It goals to uncover patterns within the knowledge that is probably not obvious earlier than clustering.

There are numerous totally different clustering algorithms, as defined on this tutorial, with k-means clustering being some of the extensively identified.

The k-means clustering algorithm takes unlabelled knowledge factors. It seeks to assign them to ok clusters, the place every knowledge level belongs to the cluster with the closest cluster middle, and the middle of every cluster is taken because the imply of the info factors that belong to it. The algorithm requires that the person present the worth of ok as an enter; therefore, this worth must be identified a priori or tuned in line with the info.

## Discovering k-Means Clustering in OpenCV

Let’s first contemplate making use of k-means clustering to a easy two-dimensional dataset containing distinct knowledge clusters earlier than shifting on to extra advanced duties.

For this objective, we will be producing a dataset consisting of 100 knowledge factors (specified by n_samples), that are equally divided into 5 Gaussian clusters (recognized by facilities) having an ordinary deviation set to 1.5 (decided by cluster_std). To have the ability to replicate the outcomes, let’s additionally outline a price for random_state, which we’re going to set to 10:

# Producing a dataset of 2D knowledge factors and their floor fact labels

x, y_true = make_blobs(n_samples=100, facilities=5, cluster_std=1.5, random_state=10)

# Plotting the dataset

scatter(x[:, 0], x[:, 1])

present()

# Producing a dataset of 2D knowledge factors and their floor fact labels

x, y_true = make_blobs(n_samples=100, facilities=5, cluster_std=1.5, random_state=10)

# Plotting the dataset

scatter(x[:, 0], x[:, 1])

present()

The code above ought to generate the next plot of information factors:

If we have a look at this plot, we might already be capable of visually distinguish one cluster from one other, which signifies that this ought to be a sufficiently easy process for the k-means clustering algorithm.

In OpenCV, the k-means algorithm isn’t a part of the ml module however may be known as straight. To have the ability to use it, we have to specify values for its enter arguments as follows:

The enter, unlabelled knowledge.

The quantity, Ok, of required clusters.

The termination standards, TERM_CRITERIA_EPS and TERM_CRITERIA_MAX_ITER, defining the specified accuracy and the utmost variety of iterations, respectively, which, when reached, the algorithm iteration ought to cease.

The variety of makes an attempt, denoting the variety of occasions the algorithm can be executed with totally different preliminary labeling to search out the perfect cluster compactness.

How the cluster facilities can be initialized, whether or not random, user-supplied, or by a middle initialization technique akin to kmeans++, as specified by the parameter flags.

The k-means clustering algorithm in OpenCV returns:

The compactness of every cluster, computed because the sum of the squared distance of every knowledge level to its corresponding cluster middle. A smaller compactness worth signifies that the info factors are distributed nearer to their corresponding cluster middle and, therefore, the cluster is extra compact.

The expected cluster labels y_pred, affiliate every enter knowledge level with its corresponding cluster.

The facilities coordinates of every cluster of information factors.

Let’s now apply the k-means clustering algorithm to the dataset generated earlier. Observe that we’re type-casting the enter knowledge to float32, as anticipated by the kmeans() operate in OpenCV:

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the enter knowledge

compactness, y_pred, facilities = kmeans(knowledge=x.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Plot the info clusters, every having a unique coloration, along with the corresponding cluster facilities

scatter(x[:, 0], x[:, 1], c=y_pred)

scatter(facilities[:, 0], facilities[:, 1], c=”purple”)

present()

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the enter knowledge

compactness, y_pred, facilities = kmeans(knowledge=x.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Plot the info clusters, every having a unique coloration, along with the corresponding cluster facilities

scatter(x[:, 0], x[:, 1], c=y_pred)

scatter(facilities[:, 0], facilities[:, 1], c=’purple’)

present()

The code above generates the next plot, the place every knowledge level is now coloured in line with its assigned cluster, and the cluster facilities are marked in purple:

The entire code itemizing is as follows:

from cv2 import kmeans, TERM_CRITERIA_MAX_ITER, TERM_CRITERIA_EPS, KMEANS_RANDOM_CENTERS

from numpy import float32

from matplotlib.pyplot import scatter, present

from sklearn.datasets import make_blobs

# Generate a dataset of 2D knowledge factors and their floor fact labels

x, y_true = make_blobs(n_samples=100, facilities=5, cluster_std=1.5, random_state=10)

# Plot the dataset

scatter(x[:, 0], x[:, 1])

present()

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the enter knowledge

compactness, y_pred, facilities = kmeans(knowledge=x.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Plot the info clusters, every having a unique color, along with the corresponding cluster facilities

scatter(x[:, 0], x[:, 1], c=y_pred)

scatter(facilities[:, 0], facilities[:, 1], c=”purple”)

present()

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

from cv2 import kmeans, TERM_CRITERIA_MAX_ITER, TERM_CRITERIA_EPS, KMEANS_RANDOM_CENTERS

from numpy import float32

from matplotlib.pyplot import scatter, present

from sklearn.datasets import make_blobs

# Generate a dataset of 2D knowledge factors and their floor fact labels

x, y_true = make_blobs(n_samples=100, facilities=5, cluster_std=1.5, random_state=10)

# Plot the dataset

scatter(x[:, 0], x[:, 1])

present()

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the enter knowledge

compactness, y_pred, facilities = kmeans(knowledge=x.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Plot the info clusters, every having a unique color, along with the corresponding cluster facilities

scatter(x[:, 0], x[:, 1], c=y_pred)

scatter(facilities[:, 0], facilities[:, 1], c=’purple’)

present()

## Coloration Quantization Utilizing k-Means

One of many functions for k-means clustering is the colour quantization of photographs.

Coloration quantization refers back to the technique of decreasing the variety of distinct colours which might be used within the illustration of a picture.

Coloration quantization is crucial for displaying photographs with many colours on units that may solely show a restricted variety of colours, often resulting from reminiscence limitations, and permits environment friendly compression of sure kinds of photographs.

Coloration quantization, 2023.

On this case, the info factors that we are going to present to the k-means clustering algorithm are the RGB values of every picture pixel. As we will be seeing, we are going to present these values within the type of an $M occasions 3$ array, the place $M$ denotes the variety of pixels within the picture.

Let’s check out the k-means clustering algorithm on this picture, which I’ve named bricks.jpg:

The dominant colours that stand out on this picture are purple, orange, yellow, inexperienced, and blue. Nevertheless, many shadows and glints introduce extra shades and colours to the dominant ones.

We’ll begin by first studying the picture utilizing OpenCV’s imread operate.

Do not forget that OpenCV hundreds this picture in BGR reasonably than RGB order. There isn’t any have to convert it to RGB earlier than feeding it to the k-means clustering algorithm as a result of the latter will nonetheless group related colours regardless of by which order the pixel values are specified. Nevertheless, since we’re making use of Matplotlib to show the pictures, we’ll convert it to RGB in order that we might show the quantized end result appropriately afterward:

# Learn picture

img = imread(‘Photographs/bricks.jpg’)

# Convert it from BGR to RGB

img_RGB = cvtColor(img, COLOR_BGR2RGB)

# Learn picture

img = imread(‘Photographs/bricks.jpg’)

# Convert it from BGR to RGB

img_RGB = cvtColor(img, COLOR_BGR2RGB)

As we have now talked about earlier, the subsequent step entails reshaping the picture to an $M occasions 3$ array, and we might then proceed to use k-means clustering to the ensuing array values utilizing a number of clusters that correspond to the variety of dominant colours we have now talked about above.

Within the code snippet beneath, I’ve additionally included a line that prints out the variety of distinctive RGB pixel values from the overall variety of pixels within the picture. We discover that we have now 338,742 distinctive RGB values out of 14,155,776 pixels, which is substantial:

# Reshape picture to an Mx3 array

img_data = img_RGB.reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(img_data, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the pixel values

compactness, labels, facilities = kmeans(knowledge=img_data.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Reshape picture to an Mx3 array

img_data = img_RGB.reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(img_data, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the pixel values

compactness, labels, facilities = kmeans(knowledge=img_data.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

At this level, we will proceed to use the precise RGB values of the cluster facilities to the anticipated pixel labels and reshape the ensuing array to the form of the unique picture earlier than displaying it:

# Apply the RGB values of the cluster facilities to all pixel labels

colors = facilities[labels].reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(colors, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Reshape array to the unique picture form

img_colours = colors.reshape(img_RGB.form)

# Show the quantized picture

imshow(img_colours.astype(uint8))

present()

# Apply the RGB values of the cluster facilities to all pixel labels

colors = facilities[labels].reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(colors, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Reshape array to the unique picture form

img_colours = colors.reshape(img_RGB.form)

# Show the quantized picture

imshow(img_colours.astype(uint8))

present()

Printing once more the variety of distinctive RGB values within the quantized picture, we discover that these have now lessened to the variety of clusters that we had specified to the k-means algorithm:

5 distinctive RGB values out of 14155776 pixels

5 distinctive RGB values out of 14155776 pixels

If we have a look at the colour quantized picture, we discover that the pixels belonging to the yellow and orange bricks have been grouped into the identical cluster, presumably because of the similarity of their RGB values. In distinction, one of many clusters aggregates pixels belonging to areas of shadow:

Now attempt altering the worth specifying the variety of clusters for the k-means clustering algorithm and examine its impact on the quantization end result.

The entire code itemizing is as follows:

from cv2 import kmeans, TERM_CRITERIA_MAX_ITER, TERM_CRITERIA_EPS, KMEANS_RANDOM_CENTERS, imread, cvtColor, COLOR_BGR2RGB

from numpy import float32, uint8, distinctive

from matplotlib.pyplot import present, imshow

# Learn picture

img = imread(‘Photographs/bricks.jpg’)

# Convert it from BGR to RGB

img_RGB = cvtColor(img, COLOR_BGR2RGB)

# Reshape picture to an Mx3 array

img_data = img_RGB.reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(img_data, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the pixel values

compactness, labels, facilities = kmeans(knowledge=img_data.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Apply the RGB values of the cluster facilities to all pixel labels

colors = facilities[labels].reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(colors, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Reshape array to the unique picture form

img_colours = colors.reshape(img_RGB.form)

# Show the quantized picture

imshow(img_colours.astype(uint8))

present()

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

from cv2 import kmeans, TERM_CRITERIA_MAX_ITER, TERM_CRITERIA_EPS, KMEANS_RANDOM_CENTERS, imread, cvtColor, COLOR_BGR2RGB

from numpy import float32, uint8, distinctive

from matplotlib.pyplot import present, imshow

# Learn picture

img = imread(‘Photographs/bricks.jpg’)

# Convert it from BGR to RGB

img_RGB = cvtColor(img, COLOR_BGR2RGB)

# Reshape picture to an Mx3 array

img_data = img_RGB.reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(img_data, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Specify the algorithm’s termination standards

standards = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the pixel values

compactness, labels, facilities = kmeans(knowledge=img_data.astype(float32), Ok=5, bestLabels=None, standards=standards, makes an attempt=10, flags=KMEANS_RANDOM_CENTERS)

# Apply the RGB values of the cluster facilities to all pixel labels

colors = facilities[labels].reshape(-1, 3)

# Discover the variety of distinctive RGB values

print(len(distinctive(colors, axis=0)), ‘distinctive RGB values out of’, img_data.form[0], ‘pixels’)

# Reshape array to the unique picture form

img_colours = colors.reshape(img_RGB.form)

# Show the quantized picture

imshow(img_colours.astype(uint8))

present()

## Additional Studying

This part offers extra assets on the subject if you wish to go deeper.

### Books

### Web sites

## Abstract

On this tutorial, you discovered tips on how to apply OpenCV’s k-means clustering algorithm for coloration quantization of photographs.

Particularly, you discovered:

What knowledge clustering is inside the context of machine studying.

Making use of the k-means clustering algorithm in OpenCV to a easy two-dimensional dataset containing distinct knowledge clusters.

Easy methods to apply the k-means clustering algorithm in OpenCV for coloration quantization of photographs.

Do you will have any questions?

Ask your questions within the feedback beneath, and I’ll do my greatest to reply.

## Get Began on Machine Studying in OpenCV!

Discover ways to use machine studying methods in picture processing initiatives

…utilizing OpenCV in superior methods and work past pixels

Uncover how in my new E book:

Machine Learing in OpenCV

It offers self-study tutorials with all working code in Python to show you from a novice to professional. It equips you with

logistic regression, random forest, SVM, k-means clustering, neural networks,

and far more…all utilizing the machine studying module in OpenCV

Kick-start your deep studying journey with hands-on workouts

See What’s Inside