辅导人工智能 Face Detection 程序、辅导辅导python Face Detection留学生
- 首页 >> Python编程Project Tutorial - Week 2: Face Detection
Face detection is a computer vision technology to identify and locate the human faces in an
image. It is a special case in object detection. The basic idea of object detection is to find the
object’s own specific features and use these features to identify objects in other images. These
specific features are usually called as feature descriptors. A feature descriptor is a
representation of an image or an image patch that simplifies the image by extracting useful
information. Commonly used feature descriptors include HOG, SIFT and SURF.
In week 2, we will build our own face detector from scratch. The basic steps include
1. Calculate a feature descriptor. (No external libraries)
2. Train a classifier with the feature descriptor
3. Test the classifier on other images
1. Calculate a Feature Descriptor
You can choose to calculate any feature descriptor you would like. Commonly used
feature descriptors include histogram of oriented gradients (HOG), Scale-invariant feature
transform (SIFT), Speeded up robust features(SURF) and Haar-like features. Do not use any
prewritten feature extraction functions in libraries.
Histogram of oriented gradients (HOG) uses the distribution (histograms) of directions of
gradients (oriented gradients) are used as features. Gradients of an image are useful because
the magnitude of gradients is large around edges and corners and we know that edges and
corners pack in a lot more information about object shape than flat regions.
Scale Invariant Feature Transform (SIFT) can robustly identify objects even among
clutter and under partial occlusion, because the SIFT feature descriptor is invariant to uniform
scaling, orientation, illumination changes, and partially invariant to affine distortion. The main
steps include scale-space extrema detection, keypoint localization, orientation assignment,
keypoint descriptor and keypoint matching.
Speeded up robust features (SURF) is based on the same principles and steps as SIFT,
but details in each step are different. The standard version of SURF is several times faster than
SIFT and claimed by its authors to be more robust against different image transformations than
SIFT.
The following is an example of how to calculate the HOG feature of a human face.
● Preprocessing
Choose any image from your dataset and crop out the face from the image according to
the label. Resize the image to 128 * 128. Of course, an image may be of any size. Typically
image patches of faces at multiple scales are analyzed at many image locations. The only
constraint is that the patches being analyzed have a fixed aspect ratio. In our case, the patches
need to have an aspect ratio of 1:1. For example, they can be 100 * 100, 256 * 256, or 1000 *
100 but not 128 * 256. Resizing the image will guarantee we get the same length of HOG
features for different images, which will be easier for classification.
Fig 1. Preprocessing
● Calculate the Gradient Images
To calculate a HOG descriptor, we need to first calculate the horizontal and vertical
gradients. This is easily achieved by filtering the image with the sobel operator with following
kernel size 1. The kernels are as follows
Fig 2. Kernel
Next, we can find the magnitude and direction of gradient using the following formula
Before moving to the next step, we need to transform the gradient into “unsigned”
gradients,meaning transforming the angles from 0-360 to 0-180. Empirically it has been shown
that unsigned gradients work better than signed gradients for face detection. Also, this would be
easier to calculate the histograms.
● Calculate Histogram of Gradients in 8×8 cells
The histogram contains 9 bins corresponding to angles 0, 20, 40 … 160. A bin is
selected based on the direction, and the vote (the value that goes into the bin) is selected based
on the magnitude. Notice that the gradient at the pixel encircled by red has an angle of 10
degrees and magnitude of 4. Since 10 degrees is halfway between 0 and 20, the vote by the
pixel splits evenly into the two bins.
Fig 3. HOG 1
There is one more detail to be aware of. If the angle is greater than 160 degrees, it is
between 160 and 180, and we know the angle wraps around making 0 and 180 equivalent. So
in the example below, the pixel with angle 165 degrees contributes proportionally to the 0
degree bin and the 160 degree bin.
Fig 4. HOG 2
● 16×16 Block Normalization
Set a 16 * 16 sliding window with a step size of 8. As the window moves across the
image patch, concatenate the four histograms into a 36 * 1 vector and normalize the vector. The
window movement is shown in the following figure.
● Calculate the HOG feature vector
Concatenate the vectors of each block into one giant vector. The vector size should be
1762 * 1.
Questions:
● If choose to calculate the HOG descriptor, visualize x-gradient, y-gradient and
gradient of the image.
● Try to visualize the feature descriptor.
● Compare the feature descriptor you calculated with a feature extracted from
functions in libraries.
2. Train the detection classifier
● Randomly split the dataset into 90 training and 10 testing photos.
● Use the bounding box image patch to extract the features from training data.
● Train the classifier. Try multiple classifiers for each feature. (SVM, Naive Bayes,
KNN, Logistic Regressor, etc)
Questions:
● What kind of features did you extract from the data? Try at least 3 different kinds
of features. Please show a sample of the features extracted.
● What kind of classifiers did you train for each feature? Try at least 3 different
kinds of classifiers for each feature.
3. Test the detector
1. Use the previously selected training and testing photos
2. Set the width, height and step size of a sliding window. The aspect ratio of the
sliding window should be the same as the aspect ratio of the training patches.
3. Slide the window across the image. Extract the feature within the sliding window.
4. Test the feature in the pre-trained classifiers
5. Repeat the step 3 and 4 until the window slides across the whole image.
6. Compute the testing and training accuracy of the detectors.
Questions:
● How do you measure the accuracy of the detector?
● How did you deal with multiple positive classification in the near face region?
● What kind of window size and step size did you try? Which gave the best result?
Try at least three combinations.
● What are the training and testing accuracy of the 9 pre-trained models?
Comment on the results.
● Plot the training and testing accuracy of different training/testing split ratio of the
best model. Comment on the results.
Homework submission:
Handin a written report answering question in each step. Append the python code in the report.
Additional reading materials: