related publications


Human face analysis has been recognized as a crucial part in intelligent systems. However, there are several challenges to designing a robust and reliable face analysis system before their deployment in real-world environments. One of the main difficulties is associated with the detection of faces with variations in illumination conditions and viewing perspectives. In this paper we present the development of a computational framework for robust detection, tracking and pose estimation of faces captured by video arrays. We discuss development of a multi-primitive skin-tone and edge-based detection module integrated with a tracking module for efficient and robust detection and tracking. A multi-state continuous density Hidden Markov Model based pose estimation module is developed for providing an accurate estimate of the orientation of the face. A systematic evaluation of these algorithms and the overall framework is performed with an extensive set of experiments. Results of these experiments suggest the validity of the proposed framework and its computational modules.



Skin color and elliptical edge features are used in this algorithm. The proposed closed-loop face detection and tracking is illustrated below.

On the skin color track, skin blobs are labeled if the area is above a threshold. Face cropping windows are derived from the blob moments. On the edge track, the face is detected by matching an ellipse to the face contour. Since elliptical regressions from edge pixels are slow and do not always yield valid head ellipses, we use a combination of two template methods that find the best match in a set of pre-defined head ellipses to the edges.

The resulting sin-tone and elliptic face-cropping windows are then fused by averaging. The face candidates are scaled to 64x64 size and compensated for uneven illumination by subtracting a least-squares fit intensity grade plane. Then they are verified by distance from feature space (DFFS) in PCA subspace to reject non-face candidates. Each positive face window is associated to an existing face window track by nearest neighborhood and used to update the constant velocity Kalman filter of the track. The Kalman filter interpolates detection gaps and predicts the face location in the next frame. For each track prediction, an ellipse search mask is derived for the next frame to speed up ellipse detection by minimizing the ellipse search area. A face track is initialized when a face is detected for some consecutive frames. The track is terminated if the predicted face window is non-face for some frames.



The face orientations in the video can be estimated for active camera control and assessing the attentive direction of the person. We compare two face orientation estimation schemes

ML-Kalman face orientation estimation.

CDHMM face orientation estimation.


Evaluation of head tracking and face orientation estimation is accomplished using an extensive array of indoor, outdoor, and mobile videos.

The images below are some results of the multi-primitive face detection and tracking. Top row shows various backgrounds and lightings, middle row shows combined skin-color and edge detections, and lower row shows the cropped faces. This shows that the multi-primitive face detection and tracking is very robust to extreme cases of illumination change, highly cluttered background, skin-tone object interference, and a colored dim lighting in dark room. The standard deviation of face alignment within the 64x64 face video after Kalman tracking is approximately 8 pixels.

The two face orientation estimation schemes are compared using a mobile video of 2300 frames, where the ground truth facing angles are estimated manually frame by frame. The video is processed twice to extract the training and testing face videos of the same length but of different face alignments, due to current hardship of obtaining ground truth.

The CDHMM and ML-Kalman schemes produced horizontal face orientation estimates of 12 degree sand 19 degrees standard deviation from ground truth. The CDHMM scheme is preferable for the reason that it is a dealye decision approach, whereas in the ML-Kalman case, ML decision is made before Kalman filtering and blocks useful cues.


Poster (1.3MB ppt).

For more information, see related publications.