Building Face Detector Using Principal Component Analysis (PCA) From Scratch in Python

Hands on coding from scratch without using inbuilt libraries.

3 min readApr 7, 2019

Introduction

PCA or the Principal Component Analysis is a technique that is used for data reduction. The data is compressed in a way such that the main features of the data are preserved. Consider an image of size mXn, where each pixel is a feature for the image. However, not all the features are facial features. With the help of PCA, we extract the facial features, thus, reducing the dimensions. This article focuses more on building the face detector, rather than understanding the underlying concepts of using PCA for Face Detection. To know more about the theoretical concepts, read Muyi Tao’s medium post on the same here, https://medium.com/@monicatmy777/face-recognition-using-pca-b15e934b5b64

Step 1: Create a database of images

Collect approximately 6 pictures of 8–9 people via the camera of your mobile phone or through the internet. All the images must of the same size (I am using all images to be of 720X1280 pixels). Once your database is ready, read all the images using the cv2 library and create a face vector with dimensions total pixels in one image X total images. The following code will do the same:

face_vector = []for i in range(total_images):
    face_image = cv2.cvtColor(cv2.imread(path), cv2.COLOR_RGB2GRAY)
    face_image = face_image.reshape(total_pixels,)
    face_vector.append(face_image)face_vector = np.asarray(face_vector)
face_vector = face_vector.transpose()

Step 2: Normalising the Face Vectors

Once the face_vector is formed, we need to calculate the mean image of all the images and subtract it. This being done because we need to eliminate the features that are common to all the images.

avg_face_vector = face_vector.mean(axis=1)
avg_face_vector = avg_face_vector.reshape(face_vector.shape[0], 1)
normalized_face_vector = face_vector - avg_face_vector

Step 3: Calculate the Co-variance Matrix

A co-variance is a matrix whose element in the i, j position is the co-variance between the i-th and j-th elements of a random vector.

covariance_matrix = np.cov(np.transpose(normalized_face_vector))

Step 4: Calculate the Eigen Values and Eigen Vectors

Eigenvalues are a special set of scalars associated with a linear system of equations (i.e., a matrix equation) that are sometimes also known as characteristic roots, characteristic values (Hoffman and Kunze 1971), proper values, or latent roots (Marcus and Minc 1988, p. 144). They can be calculated using inbuilt function.

eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)

Step 5: Select K-Best Eigen Vectors

The eigen vectors are sorted based on their eigen values, and the K-Best among them are chosen.

eigen_vectors = sort(eigen_vectors)
k_eigen_vectors = eigen_vectors[0:k, :]

Step 6: Convert Lower Dimensional K Eigen Vectors to Original Dimensional

This step can be performed by multiplying the K-best eigen vectors with the transpose of normalized face vector.

eigen_faces = k_eigen_vectors.dot(normalized_face_vector.T)

Step 7: Represent Each Eigen Face as combination of the K-Eigen Vectors

We now need to find the weights by multiplying eigen faces with normalized face vectors.

weights = (normalized_face_vector.T).dot(eigen_faces.T)

This step completes our Training Part.

Step 8: Testing

Convert input test image into face vector.

test_img = test_img.reshape(total_pixels, 1)

2. Normalize the face vector.

test_normalized_face_vector = test_img - avg_face_vector

3. Projecction of the vector onto eigen space.

test_weight = (test_normalized_face_vector.T).dot(eigen_faces.T)

4. Calculate the index for which the distance is minimum.

index =  np.argmin(np.linalg.norm(test_weight - weights, axis=1))

The index belongs to the subject with whom the testing picture matches the most. This completes our Testing Part.

The images which are captured in the same lighting conditions are detected very well, however the algorithm fails to identify the images with the different lighting conditions and increase in the background noise. There is another algorithm called Linear Discriminant Analysis or the LDA, which does a slightly better job for the same. We shall build Face Detector using LDA in the next post.

Stay tuned!

Update: Link to the source code https://github.com/xanmolx/FaceDetectorUsingPCA/blob/master/PCA_Face_Recognition_IIT2016040.ipynb