Welcome to the VIVA head pose estimation benchmark! The goal of this challenge is to robustly and accurately estimate the vehicle occupant’s head pose under varying illumination, in the presence of common partially occluding objects or actions, and under different camera configurations and across varying drivers [1].

Participants are strongly encouraged to train their algorithms using only publicly available databases as training data. Should you use any of the databases please cite the paper presenting the corresponding database.

Participants will have their algorithms tested on a newly collected data set of 417 images from looking inside the vehicle cabin. Some of the data has been captured in our test beds, while some was kindly provided by YouTube. Each image in our data set contains at least one face up to a maximum of four faces. Sample images and annotations which represent the challenges in our test set are shown here: Sample Annotations. When submitting the results, we expect the results of each image to be saved as text files with (.pose) as extension and with the same name as their respective image name. In the text file, each detected face is written into separate lines. The format of each line is the (x,y) coordinates of all four corners followed by the head pose (pitch,yaw,roll), resulting in 11 entries per line with comma delimiter. If no faces are detected, no text file should be written. Zip all the results from head pose estimation into one file and submit here.

The test data can be downloaded here: Head Pose Test Data

For evaluation, we compute the detection rate (DR), success rate (SR) and error statistics. DR is the percentage of face instances where there is exists at least one face region estimation with at least 50% overlap with ground truth. SR is the percentage of face instances where of the detected faces, the estimated yaw angle is within 15 degrees of the annotation. For evaluation purposes, we use only the detected face with highest overlap with the ground truth. Mean Absolute Error (MAE) and Standard Deviation (STD) of the yaw angle of the head pose (in degrees) is calculated and shown in the table below. These measures are evaluated on three levels:

  • Level-0 (L0): all face instances (total 556 faces).
  • Level-1 (L1): face instances with no face parts occluded (total 290 faces).
  • Level-2 (L2): face instances with at least one face part occluded (total 266 faces).

Below, we display the current results of DR, SR, MAE and STD for the L0, L1 and L2 evaluation settings. For the evaluation metrics, higher DR and SR is better while lower MAE and STD is better. The results are sorted by the MAE on L0.

Result (Yaw Angle)
 Rank  Method L0 L1 L2
1 Anonymous_1 90.0 89.0 6.8/8.5 96.9 97.5 4.5/4.0 82.3 78.1 9.8/11.3
2 Anonymous_2 60.8 66.9 13.9/12.6 72.4 68.6 13.6/12.8 48.1 64.1 14.4/12.3
3 Mixture of Tree Structures (with all independent models) 67.3 63.1 16.0/16.5 70.0 68.5 14.0/13.5 64.3 56.7 18.3/19.1
1 Not Available
2 Not Available
3 Zhu, X., and Deva R., “Face detection, pose estimation, and landmark localization in the wild,” Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, 2012.

Pitch and roll angles of the head pose are to be evaluated in the near future.

Important Dates:
Sample annotations released: May 30th, 2015
Test data released: ___

Any questions related to the VIVA-Face Challenge should be sent to VIVA.Faces@gmail.com.

[1] S. Martin, A. Tawari, E. Murphy-Chutorian, S. Y. Cheng, M. M. Trivedi, “On the Design and Evaluation of Robust Head Pose for Visual User Interfaces: Algorithms, Databases, and Comparisons,” 4th ACM SIGCHI International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AUTO-UI), October, 2012.
[2] A. Tawari, S. Martin and M. M. Trivedi, “Continuous Head Movement Estimator (CoHMET) for Driver Assistance: Issues, Algorithms and On-Road Evaluations,” IEEE Transactions on Intelligent Transportation Systems, 2014.
[3] E. Murphy-Chutorian and M. M. Trivedi, “Head Pose Estimation in Computer Vision: A Survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 607-626, April 2009.
[4] P. Belhumeur, D. Jacobs,D. Kriegman and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” In IEEE Int’l Conf. on Computer Vision and Pattern Recognition (CVPR), June 2011.
[5] V. Le, J. Brandt, Z. Lin, L. Boudev and T. S. Huang, “Interactive Facial Feature Localization,” European Conference on Computer Vision (ECCV), October 2012.
[6] J. Vidit and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Settings,” Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 2010.