Welcome to the VIVA Hand Detection Challenge!

exampleAlex2

examplePujitha2


Welcome to the VIVA hand detection benchmark! The dataset consists of 2D bounding boxes around driver and passenger hands from 54 videos collected in naturalistic driving settings of illumination variation, large hand movements, and common occlusion. There are 7 possible viewpoints, including first person view. Some of the data has been captured in our testbeds, while some was kindly provided by YouTube.

Download the dataset! (6.3 GB)

If you want to work with temporally adjacent frames, download the additional dataset below.
Download here! (24.0 GB)

Please use the following kit for evaluating your method or for plotting curves of methods in the table below: Download here!.

For evaluation, we compute the area under the precision-recall curve (AP) and average recall (AR) rate. AR is calculated over 9 evenly sampled points in log space between 10^-2 and 10^0 false positives per image. Detection evaluation is done using the PASCAL overlap requirement of 50%. As a minimum requirement for submission, you must provide predicted hand bounding boxes for each image. Classification of left/right driver/passenger hands and identification of the number of hands on the wheel are optional, but encouraged, challenges.

The hand detection challenge is evaluated on two levels:

  • Level-1 (L1): hand instances with minimum height of 70 pixels, only over the shoulder (back) camera view.
  • Level-2 (L2): hand instances with minimum height of 25 pixels, all camera views.

Below, we display the current results of AP/AR for the L1 and L2 evaluation settings. For both of the evaluation metrics, higher is better. The results are sorted by AR on L2. Left-right (L-R) hand classification, driver-passenger (D-P) hand classification, and number of hands on the wheel (#HANDS) (using mAP/mAR) are shown in the second table. You are encouraged to submit labels for both hand detection and hand type classification in your submission.

Detection Results
Rank Method Static Only L1 L2 Runtime Environment
1 MS-RFCN Yes 95.1/94.5 86.0/83.4 0.215s 6 cores@3.5GHz, 32GB RAM, Titan X GPU
2 SCUT Augmented FRCNN Yes 93.9/91.5 85.2/77.8 6.32 fps 6 cores@3.5GHz, 32GB RAM, Titan X GPU
3 MS-RFCN Yes 94.2/91.1 86.9/77.3 0.215s 6 cores@3.5GHz, 32GB RAM, Titan X GPU
4 Multi-scale fast RCNN Yes 92.8/82.8 84.7/66.5 0.3 6 cores@3.5GHz, 64GB RAM, Titan X GPU
5 MS-FRCNN Yes 90.8/84.1 77.6/65.1
6 SCUT Incremental FRCNN Yes 89.5/86.0 73.4/64.4 4.9 fps 6 cores@3.5GHz, 32GB RAM, Titan X GPU
7 FRCNN Yes 90.7/55.9 86.5/53.3
8 ACF_Depth4 Yes 70.1/53.8 60.1/40.4
9 YOLO Yes 76.4/46.0 69.5/39.1 35 6 cores@3.5GHz, 16GB RAM, Titan X GPU
10 CNN with Spatial Region Sampling (all Hands) Yes 66.8/48.1 57.8/36.6 0.783 fps RAM 32GB, NVIDIA Tesla K20 GPU
11 ACF Yes 62.4/36.9 52.3/27.5 11.6 RAM
Rank Method Info
1 ?. Robust Hand Detection in Vehicles. In to be submited to CVPRW2017, 2016.
2 ?. SCUT HCII Lab. In We use FRCNN as basic framework to achieve a relatively good performance. By visualising feature we found that some unwanted features are learned due to imbalanced data distribution (i.e. strong linear relevance of left and right hand). We do data augmentation according to bad features.
3 ?. Robust Hand Detection in Vehicles and in the Wild. In to be submited to CVPRW2017, 2016.
4 Not Available
5 Le, T.H.N., Zhu, C., Zheng, Y., Luu, K. & Savvides, M. Robust Hand Detection in Vehicles. In ICPR, 2016.
6 ?. SCUT HCII Lab. In We use Faster RCNN framework to achieve a relatively good performance. Then we apply specific data augmentation as well as incremental learning to further enhance performance.
7 Zhou, T., Pillai, P.J., Yalla, V.G. & Oguchi, K. Hierarchical Context-Aware Hand Detection Algorithm for Naturalistic Driving. In ITSC, 2016.
8 Das, N., Ohn-Bar, E. & Trivedi, M.M. On Performance Evaluation of Driver Hand Detection Algorithms: Challenges, Dataset, and Metrics. In IEEE Conf. Intelligent Transportation Systems, 2015.
9 Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In CVPR, 2016.
10 Bambach, S., Crandall, D. & Yu, C. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. In ICCV, 2015.
11 Ohn-Bar, E. & Trivedi, M.M. Beyond just keeping hands on the wheel: Towards visual interpretation of driver hand motion patterns. In Intelligent Transportation Systems (ITSC),2014 IEEE 17th International Conference on, pages 1245-1250, 2014.
Dollár, P., Appel, R., Belongie, S. & Perona, P. Fast feature pyramids for object detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(8):1532-1545, IEEE, 2014.
(code)
Classification Results
Rank Method Static Only L-R D-P #Hands Runtime Environment
1 MS-RFCN Yes 75.3/69.8 70.9/65.6 NA 0.215s 6 cores@3.5GHz, 32GB RAM, Titan X GPU
2 SCUT Incremental FRCNN Yes 68.6/63.0 57.7/53.3 NA 4.9 fps 6 cores@3.5GHz, 32GB RAM, Titan X GPU
3 CNN with Spatial Region Sampling (all Hands) Yes 52.7/42.3 57.3/47.3 NA 0.783 fps RAM 32GB, NVIDIA Tesla K20 GPU
4 ACF Yes 47.5/33.7 43.1/30.5 NA 11.6 RAM
Rank Method Info
1 ?. Robust Hand Detection in Vehicles and in the Wild. In to be submited to CVPRW2017, 2016.
2 ?. SCUT HCII Lab. In We use Faster RCNN framework to achieve a relatively good performance. Then we apply specific data augmentation as well as incremental learning to further enhance performance.
3 Bambach, S., Crandall, D. & Yu, C. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. In ICCV, 2015.
4 Ohn-Bar, E. & Trivedi, M.M. Beyond just keeping hands on the wheel: Towards visual interpretation of driver hand motion patterns. In Intelligent Transportation Systems (ITSC),2014 IEEE 17th International Conference on, pages 1245-1250, 2014.
Dollár, P., Appel, R., Belongie, S. & Perona, P. Fast feature pyramids for object detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(8):1532-1545, IEEE, 2014.
(code)

Related datasets: