辅导人工智能 Face Detection 程序、辅导辅导python Face Detection留学生

2018.08.03 - 首页 >> Python编程

Project Tutorial - Week 2: Face Detection

Face detection is a computer vision technology to identify and locate the human faces in an

image. It is a special case in object detection. The basic idea of object detection is to find the

object’s own specific features and use these features to identify objects in other images. These

specific features are usually called as feature descriptors. A feature descriptor is a

representation of an image or an image patch that simplifies the image by extracting useful

information. Commonly used feature descriptors include HOG, SIFT and SURF.

In week 2, we will build our own face detector from scratch. The basic steps include

1. Calculate a feature descriptor. (No external libraries)

2. Train a classifier with the feature descriptor

3. Test the classifier on other images

1. Calculate a Feature Descriptor

You can choose to calculate any feature descriptor you would like. Commonly used

feature descriptors include histogram of oriented gradients (HOG), Scale-invariant feature

transform (SIFT), Speeded up robust features(SURF) and Haar-like features. Do not use any

prewritten feature extraction functions in libraries.

Histogram of oriented gradients (HOG) uses the distribution (histograms) of directions of

gradients (oriented gradients) are used as features. Gradients of an image are useful because

the magnitude of gradients is large around edges and corners and we know that edges and

corners pack in a lot more information about object shape than flat regions.

Scale Invariant Feature Transform (SIFT) can robustly identify objects even among

clutter and under partial occlusion, because the SIFT feature descriptor is invariant to uniform

scaling, orientation, illumination changes, and partially invariant to affine distortion. The main

steps include scale-space extrema detection, keypoint localization, orientation assignment,

keypoint descriptor and keypoint matching.

Speeded up robust features (SURF) is based on the same principles and steps as SIFT,

but details in each step are different. The standard version of SURF is several times faster than

SIFT and claimed by its authors to be more robust against different image transformations than

SIFT.

The following is an example of how to calculate the HOG feature of a human face.

● Preprocessing

Choose any image from your dataset and crop out the face from the image according to

the label. Resize the image to 128 * 128. Of course, an image may be of any size. Typically

image patches of faces at multiple scales are analyzed at many image locations. The only

constraint is that the patches being analyzed have a fixed aspect ratio. In our case, the patches

need to have an aspect ratio of 1:1. For example, they can be 100 * 100, 256 * 256, or 1000 *

100 but not 128 * 256. Resizing the image will guarantee we get the same length of HOG

features for different images, which will be easier for classification.

Fig 1. Preprocessing

● Calculate the Gradient Images

To calculate a HOG descriptor, we need to first calculate the horizontal and vertical

gradients. This is easily achieved by filtering the image with the sobel operator with following

kernel size 1. The kernels are as follows

Fig 2. Kernel

Next, we can find the magnitude and direction of gradient using the following formula

Before moving to the next step, we need to transform the gradient into “unsigned”

gradients,meaning transforming the angles from 0-360 to 0-180. Empirically it has been shown

that unsigned gradients work better than signed gradients for face detection. Also, this would be

easier to calculate the histograms.

● Calculate Histogram of Gradients in 8×8 cells

The histogram contains 9 bins corresponding to angles 0, 20, 40 … 160. A bin is

selected based on the direction, and the vote (the value that goes into the bin) is selected based

on the magnitude. Notice that the gradient at the pixel encircled by red has an angle of 10

degrees and magnitude of 4. Since 10 degrees is halfway between 0 and 20, the vote by the

pixel splits evenly into the two bins.

Fig 3. HOG 1

There is one more detail to be aware of. If the angle is greater than 160 degrees, it is

between 160 and 180, and we know the angle wraps around making 0 and 180 equivalent. So

in the example below, the pixel with angle 165 degrees contributes proportionally to the 0

degree bin and the 160 degree bin.

Fig 4. HOG 2

● 16×16 Block Normalization

Set a 16 * 16 sliding window with a step size of 8. As the window moves across the

image patch, concatenate the four histograms into a 36 * 1 vector and normalize the vector. The

window movement is shown in the following figure.

● Calculate the HOG feature vector

Concatenate the vectors of each block into one giant vector. The vector size should be

1762 * 1.

Questions:

● If choose to calculate the HOG descriptor, visualize x-gradient, y-gradient and

gradient of the image.

● Try to visualize the feature descriptor.

● Compare the feature descriptor you calculated with a feature extracted from

functions in libraries.

2. Train the detection classifier

● Randomly split the dataset into 90 training and 10 testing photos.

● Use the bounding box image patch to extract the features from training data.

● Train the classifier. Try multiple classifiers for each feature. (SVM, Naive Bayes,

KNN, Logistic Regressor, etc)

Questions:

● What kind of features did you extract from the data? Try at least 3 different kinds

of features. Please show a sample of the features extracted.

● What kind of classifiers did you train for each feature? Try at least 3 different

kinds of classifiers for each feature.

3. Test the detector

1. Use the previously selected training and testing photos

2. Set the width, height and step size of a sliding window. The aspect ratio of the

sliding window should be the same as the aspect ratio of the training patches.

3. Slide the window across the image. Extract the feature within the sliding window.

4. Test the feature in the pre-trained classifiers

5. Repeat the step 3 and 4 until the window slides across the whole image.

6. Compute the testing and training accuracy of the detectors.

Questions:

● How do you measure the accuracy of the detector?

● How did you deal with multiple positive classification in the near face region?

● What kind of window size and step size did you try? Which gave the best result?

Try at least three combinations.

● What are the training and testing accuracy of the 9 pre-trained models?

Comment on the results.

● Plot the training and testing accuracy of different training/testing split ratio of the

best model. Comment on the results.

Homework submission:

Handin a written report answering question in each step. Append the python code in the report.

Additional reading materials: