One of many pre-processing steps which might be typically carried out on pictures earlier than feeding them right into a machine studying algorithm is to transform them right into a function vector. As we are going to see on this tutorial, there are a number of benefits to changing a picture right into a function vector that makes the latter extra environment friendly.

Among the many totally different methods for changing a picture right into a function vector, two of the preferred methods used together with totally different machine studying algorithms are the Histogram of Oriented Gradients and the Bag-of-Phrases methods.

On this tutorial, you’ll uncover the Histogram of Oriented Gradients (HOG) and the Bag-of-Phrases (BoW) methods for picture vector illustration.

After finishing this tutorial, you’ll know:

What are the benefits of utilizing the Histogram of Oriented Gradients and the Bag-of-Phrases methods for picture vector illustration.

Easy methods to use the Histogram of Oriented Gradients approach in OpenCV.

Easy methods to use the Bag-of-Phrases approach in OpenCV.

Kick-start your challenge with my ebook Machine Studying in OpenCV. It gives self-study tutorials with working code.

Let’s get began.

## Tutorial Overview

This tutorial is split into 4 components; they’re:

What are the Benefits of Utilizing HOG or BoW for Picture Vector Illustration?

The Histogram of Oriented Gradients Method

The Bag-of-Phrases Method

Placing the Strategies to Take a look at

## What are the Benefits of Utilizing HOG or BoW for Picture Vector Illustration?

When working with machine studying algorithms, the picture information sometimes undergoes a knowledge pre-processing step, which is structured in order that the machine studying algorithms can work with it.

In OpenCV, as an example, the ml module requires that the picture information is fed into the machine studying algorithms within the type of function vectors of equal size.

Every coaching pattern is a vector of values (in Laptop Imaginative and prescient it’s typically known as function vector). Normally all of the vectors have the identical variety of parts (options); OpenCV ml module assumes that.

– OpenCV, 2023.

A method of structuring the picture information is to flatten it out right into a one-dimensional vector, the place the vector’s size would equal the variety of pixels within the picture. For instance, a $20times 20$ pixel picture would end in a one-dimensional vector of size 400 pixels. This one-dimensional vector serves because the function set fed into the machine studying algorithm, the place the depth worth of every pixel represents each function.

Nevertheless, whereas that is the best function set we are able to create, it’s not the best one, particularly when working with bigger pictures that may end in too many enter options to be processed successfully by a machine studying algorithm.

This will dramatically influence the efficiency of machine studying algorithms match on information with many enter options, typically known as the “curse of dimensionality.”

– Introduction to Dimensionality Discount for Machine Studying, 2020.

Slightly, we need to cut back the variety of enter options that signify every picture in order that, in flip, the machine studying algorithm can generalize higher to the enter information. In additional technical phrases, it’s fascinating to carry out dimensionality discount that transforms the picture information from a high-dimensional house to a decrease one.

A method of doing so is to use function extraction and illustration methods, such because the Histogram of Oriented Gradients (HOG) or the Bag-of-Phrases (BoW), to signify a picture in a extra compact method and, in flip, cut back the redundancy within the function set and the computational necessities to course of it.

One other benefit to changing the picture information right into a function vector utilizing the aforementioned methods is that the vector illustration of the picture turns into extra strong to variations in illumination, scale, or viewpoint.

Within the following sections, we are going to discover utilizing the HOG and BoW methods for picture vector illustration.

## The Histogram of Oriented Gradients Method

The HOG is a function extraction approach that goals to signify the native form and look of objects contained in the picture house by a distribution of their edge instructions.

In a nutshell, the HOG approach performs the next steps when utilized to a picture:

Computes the picture gradients in horizontal and vertical instructions utilizing, for instance, a Prewitt operator. The magnitude and course of the gradient are then computed for each pixel within the picture.

Divide the picture into non-overlapping cells of fastened measurement and compute a histogram of gradients for every cell. This histogram illustration of each picture cell is extra compact and extra strong to noise. The cell measurement is usually set in line with the dimensions of the picture options we need to seize.

Concatenates the histograms over blocks of cells into one-dimensional function vectors and normalizes them. This makes the descriptor extra strong to lighting variations.

Lastly, it concatenates all normalized function vectors representing the blocks of cells to acquire a closing function vector illustration of the whole picture.

The HOG implementation in OpenCV takes a number of enter arguments that correspond to the aforementioned steps, together with:

The window measurement (winSize) that corresponds to the minimal object measurement to be detected.

The cell measurement (cellSize) sometimes captures the dimensions of the picture options of curiosity.

The block measurement (blockSize) tackles the issue of variation in illumination.

The block stride (blockStride) controls how a lot neighboring blocks overlap.

The variety of histogram bins (nbins) to seize gradients between 0 and 180 levels.

Let’s create a perform, hog_descriptors()that computes function vectors for a set of pictures utilizing the HOG approach:

def hog_descriptors(imgs):

# Create a listing to retailer the HOG function vectors

hog_features = []

# Set parameter values for the HOG descriptor primarily based on the picture information in use

winSize = (20, 20)

blockSize = (10, 10)

blockStride = (5, 5)

cellSize = (10, 10)

nbins = 9

# Set the remaining parameters to their default values

derivAperture = 1

winSigma = -1.

histogramNormType = 0

L2HysThreshold = 0.2

gammaCorrection = False

nlevels = 64

# Create a HOG descriptor

hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,

histogramNormType, L2HysThreshold, gammaCorrection, nlevels)

# Compute HOG descriptors for the enter pictures and append the function vectors to the listing

for img in imgs:

hist = hog.compute(img.reshape(20, 20).astype(uint8))

hog_features.append(hist)

return array(hog_features)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

def hog_descriptors(imgs):

# Create a listing to retailer the HOG function vectors

hog_features = []

# Set parameter values for the HOG descriptor primarily based on the picture information in use

winSize = (20, 20)

blockSize = (10, 10)

blockStride = (5, 5)

cellSize = (10, 10)

nbins = 9

# Set the remaining parameters to their default values

derivAperture = 1

winSigma = -1.

histogramNormType = 0

L2HysThreshold = 0.2

gammaCorrection = False

nlevels = 64

# Create a HOG descriptor

hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,

histogramNormType, L2HysThreshold, gammaCorrection, nlevels)

# Compute HOG descriptors for the enter pictures and append the function vectors to the listing

for img in imgs:

hist = hog.compute(img.reshape(20, 20).astype(uint8))

hog_features.append(hist)

return array(hog_features)

Notice: You will need to word that how the photographs are being reshaped right here corresponds to the picture dataset that shall be later used on this tutorial. When you use a special dataset, don’t forget to tweak this a part of the code accordingly.

## The Bag-of-Phrases Method

The BoW approach has been launched on this tutorial as utilized to modeling textual content with machine studying algorithms.

Nonetheless, this system may also be utilized to pc imaginative and prescient, the place pictures are handled as visible phrases from which options may be extracted. For that reason, when utilized to pc imaginative and prescient, the BoW approach is usually referred to as the Bag-of-Visible-Phrases approach.

In a nutshell, the BoW approach performs the next steps when utilized to a picture:

Extracts function descriptors from a picture utilizing algorithms such because the Scale-Invariant Characteristic Remodel (SIFT) or Speeded Up Strong Options (SURF). Ideally, the extracted options must be invariant to depth, scale, rotation, and affine variations.

Generates codewords from the function descriptors the place every codeword is consultant of comparable picture patches. A method of producing these codewords is to make use of k-means clustering to combination comparable descriptors into clusters, the place the facilities of the clusters would then signify the visible phrases, whereas the variety of clusters represents the vocabulary measurement.

Maps the function descriptors to the closest cluster within the vocabulary, primarily assigning a codeword to every function descriptor.

Bins the codewords right into a histogram and makes use of this histogram as a function vector illustration of the picture.

Let’s create a perform, bow_descriptors(), that applies the BoW approach utilizing SIFT to a set of pictures:

def bow_descriptors(imgs):

# Create a SIFT descriptor

sift = SIFT_create()

# Create a BoW descriptor

# The variety of clusters equal to 50 (analogous to the vocabulary measurement) has been chosen empirically

bow_trainer = BOWKMeansTrainer(50)

bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_, descriptors = sift.detectAndCompute(img, None)

# Add the SIFT descriptors to the BoW vocabulary coach

if descriptors is just not None:

bow_trainer.add(descriptors)

# Carry out k-means clustering and return the vocabulary

voc = bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a listing to retailer the BoW function vectors

bow_features = []

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Compute the BoW function vector

hist = bow_extractor.compute(img, sift.detect(img))

# Append the function vectors to the listing

if hist is just not None:

bow_features.append(hist[0])

return array(bow_features)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

def bow_descriptors(imgs):

# Create a SIFT descriptor

sift = SIFT_create()

# Create a BoW descriptor

# The variety of clusters equal to 50 (analogous to the vocabulary measurement) has been chosen empirically

bow_trainer = BOWKMeansTrainer(50)

bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_, descriptors = sift.detectAndCompute(img, None)

# Add the SIFT descriptors to the BoW vocabulary coach

if descriptors is just not None:

bow_trainer.add(descriptors)

# Carry out k-means clustering and return the vocabulary

voc = bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a listing to retailer the BoW function vectors

bow_features = []

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Compute the BoW function vector

hist = bow_extractor.compute(img, sift.detect(img))

# Append the function vectors to the listing

if hist is just not None:

bow_features.append(hist[0])

return array(bow_features)

Notice: You will need to word that how the photographs are being reshaped right here corresponds to the picture dataset that shall be later used on this tutorial. When you use a special dataset, don’t forget to tweak this a part of the code accordingly.

## Placing the Strategies to Take a look at

There isn’t essentially a single greatest approach for all circumstances, and the selection of approach for the picture information you might be working with typically requires managed experiments.

On this tutorial, for instance, we are going to apply the HOG approach to the digits dataset that comes with OpenCV, and the BoW approach to photographs from the CIFAR-10 dataset. For this tutorial, we are going to solely be contemplating a subset of pictures from these two datasets to cut back the required processing time. Nonetheless, the identical code may be simply prolonged to the total datasets.

We are going to begin by loading the datasets we shall be working with. Recall that we had seen find out how to extract the photographs from every dataset on this tutorial. The digits_dataset and the cifar_dataset are Python scripts that I’ve created and which comprise the code for loading the digits and the CIFAR-10 datasets, respectively:

from digits_dataset import split_images, split_data

from cifar_dataset import load_images

# Load the digits picture

img, sub_imgs = split_images(‘Photos/digits.png’, 20)

# Receive a dataset from the digits picture

digits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)

# Load a batch of pictures from the CIFAR dataset

cifar_imgs = load_images(‘Photos/cifar-10-batches-py/data_batch_1’)

# Contemplate solely a subset of pictures

digits_subset = digits_imgs[0:100, :]

cifar_subset = cifar_imgs[0:100, :]

from digits_dataset import split_images, split_data

from cifar_dataset import load_images

# Load the digits picture

img, sub_imgs = split_images(‘Photos/digits.png’, 20)

# Receive a dataset from the digits picture

digits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)

# Load a batch of pictures from the CIFAR dataset

cifar_imgs = load_images(‘Photos/cifar-10-batches-py/data_batch_1’)

# Contemplate solely a subset of pictures

digits_subset = digits_imgs[0:100, :]

cifar_subset = cifar_imgs[0:100, :]

We might then proceed to cross on the datasets to the hog_descriptors() and the bow_descriptors() features that we have now created earlier on this tutorial:

digits_hog = hog_descriptors(digits_subset)

print(‘Dimension of HOG function vectors:’, digits_hog.form)

cifar_bow = bow_descriptors(cifar_subset)

print(‘Dimension of BoW function vectors:’, cifar_bow.form)

digits_hog = hog_descriptors(digits_subset)

print(‘Dimension of HOG function vectors:’, digits_hog.form)

cifar_bow = bow_descriptors(cifar_subset)

print(‘Dimension of BoW function vectors:’, cifar_bow.form)

The entire code itemizing appears to be like as follows:

from cv2 import (imshow, waitKey, HOGDescriptor, SIFT_create, BOWKMeansTrainer,

BOWImgDescriptorExtractor, BFMatcher, NORM_L2, cvtColor, COLOR_RGB2GRAY)

from digits_dataset import split_images, split_data

from cifar_dataset import load_images

from numpy import uint8, array, reshape

# Load the digits picture

img, sub_imgs = split_images(‘Photos/digits.png’, 20)

# Receive a dataset from the digits picture

digits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)

# Load a batch of pictures from the CIFAR dataset

cifar_imgs = load_images(‘Photos/cifar-10-batches-py/data_batch_1’)

# Contemplate solely a subset of pictures

digits_subset = digits_imgs[0:100, :]

cifar_subset = cifar_imgs[0:100, :]

def hog_descriptors(imgs):

# Create a listing to retailer the HOG function vectors

hog_features = []

# Set parameter values for the HOG descriptor primarily based on the picture information in use

winSize = (20, 20)

blockSize = (10, 10)

blockStride = (5, 5)

cellSize = (10, 10)

nbins = 9

# Set the remaining parameters to their default values

derivAperture = 1

winSigma = -1.

histogramNormType = 0

L2HysThreshold = 0.2

gammaCorrection = False

nlevels = 64

# Create a HOG descriptor

hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,

histogramNormType, L2HysThreshold, gammaCorrection, nlevels)

# Compute HOG descriptors for the enter pictures and append the function vectors to the listing

for img in imgs:

hist = hog.compute(img.reshape(20, 20).astype(uint8))

hog_features.append(hist)

return array(hog_features)

def bow_descriptors(imgs):

# Create a SIFT descriptor

sift = SIFT_create()

# Create a BoW descriptor

# The variety of clusters equal to 50 (analogous to the vocabulary measurement) has been chosen empirically

bow_trainer = BOWKMeansTrainer(50)

bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_, descriptors = sift.detectAndCompute(img, None)

# Add the SIFT descriptors to the BoW vocabulary coach

if descriptors is just not None:

bow_trainer.add(descriptors)

# Carry out k-means clustering and return the vocabulary

voc = bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a listing to retailer the BoW function vectors

bow_features = []

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Compute the BoW function vector

hist = bow_extractor.compute(img, sift.detect(img))

# Append the function vectors to the listing

if hist is just not None:

bow_features.append(hist[0])

return array(bow_features)

digits_hog = hog_descriptors(digits_subset)

print(‘Dimension of HOG function vectors:’, digits_hog.form)

cifar_bow = bow_descriptors(cifar_subset)

print(‘Dimension of BoW function vectors:’, cifar_bow.form)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

from cv2 import (imshow, waitKey, HOGDescriptor, SIFT_create, BOWKMeansTrainer,

BOWImgDescriptorExtractor, BFMatcher, NORM_L2, cvtColor, COLOR_RGB2GRAY)

from digits_dataset import split_images, split_data

from cifar_dataset import load_images

from numpy import uint8, array, reshape

# Load the digits picture

img, sub_imgs = split_images(‘Photos/digits.png’, 20)

# Receive a dataset from the digits picture

digits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)

# Load a batch of pictures from the CIFAR dataset

cifar_imgs = load_images(‘Photos/cifar-10-batches-py/data_batch_1’)

# Contemplate solely a subset of pictures

digits_subset = digits_imgs[0:100, :]

cifar_subset = cifar_imgs[0:100, :]

def hog_descriptors(imgs):

# Create a listing to retailer the HOG function vectors

hog_features = []

# Set parameter values for the HOG descriptor primarily based on the picture information in use

winSize = (20, 20)

blockSize = (10, 10)

blockStride = (5, 5)

cellSize = (10, 10)

nbins = 9

# Set the remaining parameters to their default values

derivAperture = 1

winSigma = -1.

histogramNormType = 0

L2HysThreshold = 0.2

gammaCorrection = False

nlevels = 64

# Create a HOG descriptor

hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,

histogramNormType, L2HysThreshold, gammaCorrection, nlevels)

# Compute HOG descriptors for the enter pictures and append the function vectors to the listing

for img in imgs:

hist = hog.compute(img.reshape(20, 20).astype(uint8))

hog_features.append(hist)

return array(hog_features)

def bow_descriptors(imgs):

# Create a SIFT descriptor

sift = SIFT_create()

# Create a BoW descriptor

# The variety of clusters equal to 50 (analogous to the vocabulary measurement) has been chosen empirically

bow_trainer = BOWKMeansTrainer(50)

bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_, descriptors = sift.detectAndCompute(img, None)

# Add the SIFT descriptors to the BoW vocabulary coach

if descriptors is just not None:

bow_trainer.add(descriptors)

# Carry out k-means clustering and return the vocabulary

voc = bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a listing to retailer the BoW function vectors

bow_features = []

for img in imgs:

# Reshape every RGB picture and convert it to grayscale

img = reshape(img, (32, 32, 3), ‘F’)

img = cvtColor(img, COLOR_RGB2GRAY).transpose()

# Compute the BoW function vector

hist = bow_extractor.compute(img, sift.detect(img))

# Append the function vectors to the listing

if hist is just not None:

bow_features.append(hist[0])

return array(bow_features)

digits_hog = hog_descriptors(digits_subset)

print(‘Dimension of HOG function vectors:’, digits_hog.form)

cifar_bow = bow_descriptors(cifar_subset)

print(‘Dimension of BoW function vectors:’, cifar_bow.form)

The code above returns the next output:

Dimension of HOG function vectors: (100, 81)

Dimension of BoW function vectors: (100, 50)

Dimension of HOG function vectors: (100, 81)

Dimension of BoW function vectors: (100, 50)

Based mostly on our selection of parameter values, we might even see that the HOG approach returns function vectors of measurement $1times 81$ for every picture. This implies every picture is now represented by factors in an 81-dimensional house. The BoW approach, however, returns vectors of measurement $1times 50$ for every picture, the place the vector size has been decided by the variety of k-means clusters of selection, which can be analogous to the vocabulary measurement.

Therefore, we might even see that, as an alternative of merely flattening out every picture right into a one-dimensional vector, we have now managed to signify every picture extra compactly by making use of the HOG and BoW methods.

Our subsequent step shall be to see how we are able to exploit this information utilizing totally different machine studying algorithms.

## Additional Studying

If you wish to go deeper, this part gives extra assets on the subject.

### Books

### Web sites

## Abstract

On this tutorial, you’ll uncover the Histogram of Oriented Gradients and the Bag-of-Phrases methods for picture vector illustration.

Particularly, you realized:

What are the benefits of utilizing the Histogram of Oriented Gradients and the Bag-of-Phrases methods for picture vector illustration

Easy methods to use the Histogram of Oriented Gradients approach in OpenCV.

Easy methods to use the Bag-of-Phrases approach in OpenCV.

Do you’ve any questions?

Ask your questions within the feedback beneath, and I’ll do my greatest to reply.

## Get Began on Machine Studying in OpenCV!

Discover ways to use machine studying methods in picture processing tasks

…utilizing OpenCV in superior methods and work past pixels

Uncover how in my new E-book:

Machine Learing in OpenCV

It gives self-study tutorials with all working code in Python to show you from a novice to skilled. It equips you with

logistic regression, random forest, SVM, k-means clustering, neural networks,

and far more…all utilizing the machine studying module in OpenCV

Kick-start your deep studying journey with hands-on workout routines

See What’s Inside