The Naive Bayes algorithm is an easy however highly effective method for supervised machine studying. Its Gaussian variant is applied within the OpenCV library.

On this tutorial, you’ll learn to apply OpenCV’s regular Bayes algorithm, first on a customized two-dimensional dataset and subsequently for segmenting a picture.

After finishing this tutorial, you’ll know:

A number of of a very powerful factors in making use of the Bayes theorem to machine studying.

Easy methods to use the conventional Bayes algorithm on a customized dataset in OpenCV.

Easy methods to use the conventional Bayes algorithm to section a picture in OpenCV.

Kick-start your challenge with my e book Machine Studying in OpenCV. It gives self-study tutorials with working code.

Let’s get began.

## Tutorial Overview

This tutorial is split into three elements; they’re:

Reminder of the Bayes Theorem As Utilized to Machine Studying

Discovering Bayes Classification in OpenCV

Picture Segmentation Utilizing a Regular Bayes Classifier

## Reminder of the Bayes Theorem As Utilized to Machine Studying

This tutorial by Jason Brownlee offers an in-depth clarification of Bayes Theorem for machine studying, so let’s first begin with brushing up on a few of the most necessary factors from his tutorial:

The Bayes Theorem is beneficial in machine studying as a result of it gives a statistical mannequin to formulate the connection between knowledge and a speculation.

Expressed as $P(h | D) = P(D | h) * P(h) / P(D)$, the Bayes Theorem states that the likelihood of a given speculation being true (denoted by $P(h | D)$ and generally known as the posterior likelihood of the speculation) may be calculated when it comes to:

The likelihood of observing the information given the speculation (denoted by $P(D | h)$ and generally known as the chance).

The likelihood of the speculation being true, independently of the information (denoted by $P(h)$ and generally known as the prior likelihood of the speculation).

The likelihood of observing the information independently of the speculation (denoted by $P(D)$ and generally known as the proof).

The Bayes Theorem assumes that each variable (or function) making up the enter knowledge, $D$, is determined by all the opposite variables (or options).

Inside the context of knowledge classification, the Bayes Theorem could also be utilized to the issue of calculating the conditional likelihood of a category label given a knowledge pattern: $P(class | knowledge) = P(knowledge | class) * P(class) / P(knowledge)$, the place the category label now substitutes the speculation. The proof, $P(knowledge)$, is a continuing and may be dropped.

Within the formulation of the issue as outlined within the bullet level above, the estimation of the chance, $P(knowledge | class)$, may be tough as a result of it requires that the variety of knowledge samples is sufficiently giant to comprise all attainable mixtures of variables (or options) for every class. That is seldom the case, particularly with high-dimensional knowledge with many variables.

The formulation above may be simplified into what is named Naive Bayes, the place every enter variable is handled individually: $P(class | X_1, X_2, dots, X_n) = P(X_1 | class) * P(X_2 | class) * dots * P(X_n | class) * P(class)$

The Naive Bayes estimation adjustments the formulation from a dependent conditional likelihood mannequin to an unbiased conditional likelihood mannequin, the place the enter variables (or options) at the moment are assumed to be unbiased. This assumption hardly ever holds with real-world knowledge, therefore the identify naive.

## Discovering Bayes Classification in OpenCV

Suppose the enter knowledge we’re working with is steady. In that case, it might be modeled utilizing a steady likelihood distribution, akin to a Gaussian (or regular) distribution, the place the information belonging to every class is modeled by its imply and customary deviation.

The Bayes classifier applied in OpenCV is a standard Bayes classifier (additionally generally generally known as Gaussian Naive Bayes), which assumes that the enter options from every class are usually distributed.

This easy classification mannequin assumes that function vectors from every class are usually distributed (although, not essentially independently distributed).

– OpenCV, Machine Studying Overview, 2023.

To find easy methods to use the conventional Bayes classifier in OpenCV, let’s begin by testing it on a easy two-dimensional dataset, as we did in earlier tutorials.

For this objective, let’s generate a dataset consisting of 100 knowledge factors (specified by n_samples), that are equally divided into 2 Gaussian clusters (recognized by facilities) having an ordinary deviation set to 1.5 (specified by cluster_std). Let’s additionally outline a price for random_state to have the ability to replicate the outcomes:

# Producing a dataset of 2D knowledge factors and their floor reality labels

x, y_true = make_blobs(n_samples=100, facilities=2, cluster_std=1.5, random_state=15)

# Plotting the dataset

scatter(x[:, 0], x[:, 1], c=y_true)

present()

# Producing a dataset of 2D knowledge factors and their floor reality labels

x, y_true = make_blobs(n_samples=100, facilities=2, cluster_std=1.5, random_state=15)

# Plotting the dataset

scatter(x[:, 0], x[:, 1], c=y_true)

present()

The code above ought to generate the next plot of knowledge factors:

We will then cut up this dataset, allocating 80% of the information to the coaching set and the remaining 20% to the check set:

# Cut up the information into coaching and testing units

x_train, x_test, y_train, y_test = ms.train_test_split(x, y_true, test_size=0.2, random_state=10)

# Cut up the information into coaching and testing units

x_train, x_test, y_train, y_test = ms.train_test_split(x, y_true, test_size=0.2, random_state=10)

Following this, we’ll create the conventional Bayes classifier and proceed with coaching and testing it on the dataset values after having kind solid to 32-bit float:

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the coaching knowledge

norm_bayes.prepare(x_train.astype(float32), ml.ROW_SAMPLE, y_train)

# Generate a prediction from the educated classifier

ret, y_pred, y_probs = norm_bayes.predictProb(x_test.astype(float32))

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the coaching knowledge

norm_bayes.prepare(x_train.astype(float32), ml.ROW_SAMPLE, y_train)

# Generate a prediction from the educated classifier

ret, y_pred, y_probs = norm_bayes.predictProb(x_test.astype(float32))

By making use of the predictProb technique, we’ll receive the anticipated class for every enter vector (with every vector being saved on every row of the array fed into the conventional Bayes classifier) and the output possibilities.

Within the code above, the anticipated lessons are saved in y_pred, whereas y_probs is an array with as many columns as lessons (two on this case) that holds the likelihood worth of every enter vector belonging to every class into consideration. It might make sense that the output likelihood values the classifier returns for every enter vector sum as much as one. Nonetheless, this may occasionally not essentially be the case as a result of the likelihood values the classifier returns are usually not normalized by the proof, $P(knowledge)$, which now we have faraway from the denominator, as defined within the earlier part.

As a substitute, what’s being reported is a chance, which is mainly the numerator of the conditional likelihood equation, p(C) p(M | C). The denominator, p(M), doesn’t should be computed.

– Machine Studying for OpenCV, 2017.

Nonetheless, whether or not the values are normalized or not, the category prediction for every enter vector could also be discovered by figuring out the category with the very best likelihood worth.

The code itemizing to date is the next:

from sklearn.datasets import make_blobs

from sklearn import model_selection as ms

from numpy import float32

from matplotlib.pyplot import scatter, present

from cv2 import ml

# Generate a dataset of 2D knowledge factors and their floor reality labels

x, y_true = make_blobs(n_samples=100, facilities=2, cluster_std=1.5, random_state=15)

# Plot the dataset

scatter(x[:, 0], x[:, 1], c=y_true)

present()

# Cut up the information into coaching and testing units

x_train, x_test, y_train, y_test = ms.train_test_split(x, y_true, test_size=0.2, random_state=10)

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the coaching knowledge

norm_bayes.prepare(x_train.astype(float32), ml.ROW_SAMPLE, y_train)

# Generate a prediction from the educated classifier

ret, y_pred, y_probs = norm_bayes.predictProb(x_test.astype(float32))

# Plot the category predictions

scatter(x_test[:, 0], x_test[:, 1], c=y_pred)

present()

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

from sklearn.datasets import make_blobs

from sklearn import model_selection as ms

from numpy import float32

from matplotlib.pyplot import scatter, present

from cv2 import ml

# Generate a dataset of 2D knowledge factors and their floor reality labels

x, y_true = make_blobs(n_samples=100, facilities=2, cluster_std=1.5, random_state=15)

# Plot the dataset

scatter(x[:, 0], x[:, 1], c=y_true)

present()

# Cut up the information into coaching and testing units

x_train, x_test, y_train, y_test = ms.train_test_split(x, y_true, test_size=0.2, random_state=10)

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the coaching knowledge

norm_bayes.prepare(x_train.astype(float32), ml.ROW_SAMPLE, y_train)

# Generate a prediction from the educated classifier

ret, y_pred, y_probs = norm_bayes.predictProb(x_test.astype(float32))

# Plot the category predictions

scatter(x_test[:, 0], x_test[:, 1], c=y_pred)

present()

We might even see that the category predictions produced by the conventional Bayes classifier educated on this easy dataset are right:

## Picture Segmentation Utilizing a Regular Bayes Classifier

Amongst their many purposes, Bayes classifiers have been often used for pores and skin segmentation, which separates pores and skin pixels from non-skin pixels in a picture.

We will adapt the code above for segmenting pores and skin pixels in photos. For this objective, we’ll use the Pores and skin Segmentation dataset, consisting of fifty,859 pores and skin samples and 194,198 non-skin samples, to coach the conventional Bayes classifier. The dataset presents the pixel values in BGR order and their corresponding class label.

After loading the dataset, we will convert the BGR pixel values into HSV (denoting Hue, Saturation, and Worth) after which use the hue values to coach a standard Bayes classifier. Hue is usually most popular over RGB in picture segmentation duties as a result of it represents the true shade with out modification and is much less affected by lighting variations than RGB. Within the HSV shade mannequin, the hue values are organized radially and span between 0 and 360 levels:

from cv2 import ml,

from numpy import loadtxt, float32

from matplotlib.colours import rgb_to_hsv

# Load knowledge from textual content file

knowledge = loadtxt(“Knowledge/Skin_NonSkin.txt”, dtype=int)

# Choose the BGR values from the loaded knowledge

BGR = knowledge[:, :3]

# Convert to RGB by swapping the array columns

RGB = BGR.copy()

RGB[:, [2, 0]] = RGB[:, [0, 2]]

# Convert RGB values to HSV

HSV = rgb_to_hsv(RGB.reshape(RGB.form[0], -1, 3) / 255)

HSV = HSV.reshape(RGB.form[0], 3)

# Choose solely the hue values

hue = HSV[:, 0] * 360

# Choose the labels from the loaded knowledge

labels = knowledge[:, -1]

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the hue values

norm_bayes.prepare(hue.astype(float32), ml.ROW_SAMPLE, labels)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

from cv2 import ml,

from numpy import loadtxt, float32

from matplotlib.colours import rgb_to_hsv

# Load knowledge from textual content file

knowledge = loadtxt(“Knowledge/Skin_NonSkin.txt”, dtype=int)

# Choose the BGR values from the loaded knowledge

BGR = knowledge[:, :3]

# Convert to RGB by swapping the array columns

RGB = BGR.copy()

RGB[:, [2, 0]] = RGB[:, [0, 2]]

# Convert RGB values to HSV

HSV = rgb_to_hsv(RGB.reshape(RGB.form[0], -1, 3) / 255)

HSV = HSV.reshape(RGB.form[0], 3)

# Choose solely the hue values

hue = HSV[:, 0] * 360

# Choose the labels from the loaded knowledge

labels = knowledge[:, -1]

# Create a brand new Regular Bayes Classifier

norm_bayes = ml.NormalBayesClassifier_create()

# Practice the classifier on the hue values

norm_bayes.prepare(hue.astype(float32), ml.ROW_SAMPLE, labels)

Observe 1: The OpenCV library gives the cvtColor technique to transform between shade areas, as seen on this tutorial, however the cvtColor technique expects the supply picture in its authentic form as an enter. The rgb_to_hsv technique in Matplotlib, then again, accepts a NumPy array within the type of (…, 3) as enter, the place the array values are anticipated to be normalized throughout the vary of 0 to 1. We’re utilizing the latter right here since our coaching knowledge consists of particular person pixels, which aren’t structured within the traditional type of a three-channel picture.

Observe 2: The conventional Bayes classifier assumes that the information to be modeled follows a Gaussian distribution. Whereas this isn’t a strict requirement, the classifier’s efficiency could degrade if the information is distributed in any other case. We could examine the distribution of the information we’re working with by plotting its histogram. If we take the hue values of the pores and skin pixels for example, we discover {that a} Gaussian curve can describe their distribution:

from numpy import histogram

from matplotlib.pyplot import bar, title, xlabel, ylabel, present

# Select the skin-labelled hue values

pores and skin = x[labels == 1]

# Compute their histogram

hist, bin_edges = histogram(pores and skin, vary=[0, 360], bins=360)

# Show the computed histogram

bar(bin_edges[:-1], hist, width=4)

xlabel(‘Hue’)

ylabel(‘Frequency’)

title(‘Histogram of the hue values of pores and skin pixels’)

present()

from numpy import histogram

from matplotlib.pyplot import bar, title, xlabel, ylabel, present

# Select the skin-labelled hue values

pores and skin = x[labels == 1]

# Compute their histogram

hist, bin_edges = histogram(pores and skin, vary=[0, 360], bins=360)

# Show the computed histogram

bar(bin_edges[:-1], hist, width=4)

xlabel(‘Hue’)

ylabel(‘Frequency’)

title(‘Histogram of the hue values of pores and skin pixels’)

present()

As soon as the conventional Bayes classifier has been educated, we could check it out on a picture (let’s take into account this instance picture for testing):

from cv2 import imread

from matplotlib.pyplot import present, imshow

# Load a check picture

face_img = imread(“Photographs/face.jpg”)

# Reshape the picture right into a three-column array

face_BGR = face_img.reshape(-1, 3)

# Convert to RGB by swapping the array columns

face_RGB = face_BGR.copy()

face_RGB[:, [2, 0]] = face_RGB[:, [0, 2]]

# Convert from RGB to HSV

face_HSV = rgb_to_hsv(face_RGB.reshape(face_RGB.form[0], -1, 3) / 255)

face_HSV = face_HSV.reshape(face_RGB.form[0], 3)

# Choose solely the hue values

face_hue = face_HSV[:, 0] * 360

# Show the hue picture

imshow(face_hue.reshape(face_img.form[0], face_img.form[1]))

present()

# Generate a prediction from the educated classifier

ret, labels_pred, output_probs = norm_bayes.predictProb(face_hue.astype(float32))

# Reshape array into the enter picture dimension and select the skin-labelled pixels

skin_mask = labels_pred.reshape(face_img.form[0], face_img.form[1], 1) == 1

# Show the segmented picture

imshow(skin_mask, cmap=’grey’)

present()

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

from cv2 import imread

from matplotlib.pyplot import present, imshow

# Load a check picture

face_img = imread(“Photographs/face.jpg”)

# Reshape the picture right into a three-column array

face_BGR = face_img.reshape(-1, 3)

# Convert to RGB by swapping the array columns

face_RGB = face_BGR.copy()

face_RGB[:, [2, 0]] = face_RGB[:, [0, 2]]

# Convert from RGB to HSV

face_HSV = rgb_to_hsv(face_RGB.reshape(face_RGB.form[0], -1, 3) / 255)

face_HSV = face_HSV.reshape(face_RGB.form[0], 3)

# Choose solely the hue values

face_hue = face_HSV[:, 0] * 360

# Show the hue picture

imshow(face_hue.reshape(face_img.form[0], face_img.form[1]))

present()

# Generate a prediction from the educated classifier

ret, labels_pred, output_probs = norm_bayes.predictProb(face_hue.astype(float32))

# Reshape array into the enter picture dimension and select the skin-labelled pixels

skin_mask = labels_pred.reshape(face_img.form[0], face_img.form[1], 1) == 1

# Show the segmented picture

imshow(skin_mask, cmap=’grey’)

present()

The ensuing segmented masks shows the pixels labeled as belonging to the pores and skin (with a category label equal to 1).

By qualitatively analyzing the end result, we might even see that a lot of the pores and skin pixels have been appropriately labeled as such. We may additionally see that some hair strands (therefore, non-skin pixels) have been incorrectly labeled as belonging to pores and skin. If we had to have a look at their hue values, we would discover that these are similar to these belonging to pores and skin areas, therefore the mislabelling. Moreover, we may additionally discover the effectiveness of utilizing the hue values, which stay comparatively fixed in areas of the face that in any other case seem illuminated or in shadow within the authentic RGB picture:

Are you able to consider extra assessments to check out with a standard Bayes classifier?

## Additional Studying

This part gives extra assets on the subject if you wish to go deeper.

### Books

## Abstract

On this tutorial, you realized easy methods to apply OpenCV’s regular Bayes algorithm, first on a customized two-dimensional dataset and subsequently for segmenting a picture.

Particularly, you realized:

A number of of a very powerful factors in making use of the Bayes theorem to machine studying.

Easy methods to use the conventional Bayes algorithm on a customized dataset in OpenCV.

Easy methods to use the conventional Bayes algorithm to section a picture in OpenCV.

Do you’ve gotten any questions?

Ask your questions within the feedback beneath, and I’ll do my finest to reply.

## Get Began on Machine Studying in OpenCV!

Discover ways to use machine studying strategies in picture processing tasks

…utilizing OpenCV in superior methods and work past pixels

Uncover how in my new E book:

Machine Learing in OpenCV

It gives self-study tutorials with all working code in Python to show you from a novice to skilled. It equips you with

logistic regression, random forest, SVM, k-means clustering, neural networks,

and way more…all utilizing the machine studying module in OpenCV

Kick-start your deep studying journey with hands-on workout routines

See What’s Inside