Please see the end of the page for a result table on the dataset. Feel free to email me for submitting yours

The CVRR-HANDS 3D dataset was designed in order to study natural human activity under dif´Čücult settings of cluttered background, volatile illumination, and frequent occlusion. The dataset was captured using a Kinect under real-world driving settings. The approach is motivated by studying actions-as well as semantic elements in the scene and the driver's interaction with them-which may be used to infer driver inattentiveness. For more information see related publications below. The dataset contains three subsets: Hand localization, hand and objects localization, and 19 hand gestures for occupant-vehicle interaction.

Interactive Hand Gestures

Example Video


Using the approach in E. Ohn-Bar and M. M. Trivedi, "The Power is in Your Hands: 3D Analysis of Hand Gestures in Naturalistic Video", CVPRW, 2013, for hand activity recognition in five regions.



  • Interactive hand gesture RGBD dataset - 19 gestures performed by 8 subjects as driver and passenger users.
  • Hand-only part of the detection dataset: 858 frames training and 1065 for testing.
  • Hand and objects part of the detection dataset: 2437 training and 3113 testing samples.
  • 320x240 RGB and Depth images.
  • Raw data with videos.
  • Annotations as hand-object interaction type, number of hands on the wheel, and hand location among the five defined regions.


Click on the link below to download (~2GB):


Gesture recognition average accuracy result: (8-fold cross validation with testing on one subject and training on rest)

Rank Method Modality Accuracy Runtime Environment Code
1 HOG+HOG² [1] RGBD 64.5% 50fps 8GB RAM, Intel Core i7 950 @ 3.07 GHz – 4 cores link
2 HON4D [2] D 58.7% 25fps 8GB RAM, Intel Core i7 950 @ 3.07 GHz – 4 cores link
3 Dense Trajectories [3] RGBD 54.0% 18fps 8GB RAM, Intel Core i7 950 @ 3.07 GHz – 4 cores link
4 HOG3D [4] RGBD 44.6% 3fps 8GB RAM, Intel Core i7 950 @ 3.07 GHz – 4 cores link
5 Harris-3.5D [5] RGBD 36.4% 0.2fps 8GB RAM, Intel Core i7 950 @ 3.07 GHz – 4 cores link
Index Method Info
1 E. Ohn-Bar and M. M. Trivedi, Hand Gesture Recognition in Real-Time for Automotive Interfaces: A Multimodal Vision-based Approach and Evaluations, IEEE Trans. on Intelligent Transportation Systems, 2014. (pdf)
2 O. Oreifej and Z. Liu, HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences, CVPR, 2013. (pdf)
3 H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision, 2013. (pdf)
4 A. Kläser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, BMVC, 2008. (pdf)
5 S. Hadfield and R. Bowden, Recognizing actions in 3D natural scenes, CVPR, 2013. (pdf)