Welcome to the VIVA face detection benchmark! The goal of this challenge is to robustly and accurately localize the occupants’ face under varying illumination, in the presence of common partially occluding objects or actions, and under different camera configurations and across varying drivers [1].

Participants are strongly encouraged to train their algorithms using only publicly available databases (e.g. LFPW [2], HELEN [3], Faces in the Wild [4]) as training data. Should you use any of the databases please cite the paper presenting the corresponding database.

Participants will have their algorithms tested on a newly collected data set of 417 images from looking inside the vehicle cabin. Some of the data has been captured in our test beds, while some was kindly provided by YouTube. Each image in our data set contains at least one face up to a maximum of four faces. Sample images and annotations which represent the challenges in our test set are shown here: Sample Annotations. When submitting the results, we expect the results of each image to be saved as text files with (.corners) as extension and with the same name as their respective image name. In the text file, each detected face is written into separate lines. The format of each line is the (x,y) coordinates of all four corners, resulting in 8 entries per line with comma delimiter. If no faces are detected, no text file should be written. Zip all the results from face detection into one file and submit here.

The test data can be downloaded here: Face Detection Test Data

For evaluation, we compute the area under the precision-recall curve (AP) and average recall (AR) rate. AR is calculated over 9 evenly sampled points in log space between 10^-2 and 10^0 false positives per image. Detection evaluation is done using the PASCAL overlap requirement of 50%. The face detection challenge is evaluated on three levels:

  • Level-0 (L0): all face instances (total 556 faces).
  • Level-1 (L1): face instances with no face parts occluded (total 290 faces).
  • Level-2 (L2): face instances with at least one face part occluded (total 266 faces).

Below, we display the current results of AP/AR for the L0, L1 and L2 evaluation settings. For both of the evaluation metrics, higher is better. The results are sorted by AP on L0.

Rank Method L0 L1 L2
1 Anonymous_1 88.3/76.2 94.8/92.9 78.7/66.2
2 Anonymous_2 85.8/76.0 93.7/90.2 75.8/68.1
3 Mixture of Tree Structures (with all independent models) 58.5/49.2 62.6/58.2 52.1/45.5
4 ACF 53.1/38.3 66.1/54.2 35.6/29.0
5 Viola-Jones N/A N/A N/A
Rank Method Info
1 Not Available
2 Not Available
3 Zhu, X., and Deva R., “Face detection, pose estimation, and landmark localization in the wild,” Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, 2012.
4 Dollár, P., Appel, R., Belongie, S., & Perona, P., “Fast feature pyramids for object detection,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2014.
5 Not Available

Want the full curves of the methods? Please email us as the website is still being updated for viewing these.




Important Dates:
Sample annotations released: May 30, 2015
Test data released: ___


Any questions related to the VIVA-Face Challenge should be sent to VIVA.Faces@gmail.com.

[1] S. Martin, K. Yuen and M. M. Trivedi, “Vision for Intelligent Vehicles & Applications (VIVA): Face Detection and Head Pose Challenge,” IEEE Intelligent Vehicles Symposium, June 2016.
[2] P. Belhumeur, D. Jacobs, D. Kriegman and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” In IEEE Int’l Conf. on Computer Vision and Pattern Recognition (CVPR), June 2011.
[3] V. Le, J. Brandt, Z. Lin, L. Boudev and T. S. Huang, “Interactive Facial Feature Localization,” European Conference on Computer Vision (ECCV), October 2012.
[4] J. Vidit and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Settings,” Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 2010.