Intelligent Vehicle Experimental Test-beds


Human-Centered Driver Assistance Systems


Vision Systems for Occupant Posture Analysis

Vision Systems for Driver Analysis and Interface


Related Research

Shape-from-Silhouette Based Occupant Posture Estimation

The feasibility of a multi-camera voxel based occupant posture estimation system is investigated. Several new considerations are made to allow this tested human body modeling system to work reliably in the passenger seat of a vehicle, including camera position, segmentation, and body modeling with voxel reconstructions, all from a constrained 4 camera setup. To describe the occupant posture, a partial human body model consisting of a head and torso is proposed. The accuracy of the estimation of this model is compared against ground truth.



The parameters to the pinhole camera model are estimated using OpenCV and Matlab Calibration toolboxes. These parameters are collectively called the intrinsic, extrinsic and radial distortion parameters. The calibration uses planar points on a checkerboard as data for the parameter estimation. Calibration is done for all four cameras.

With knowledge of the parameters, points in the scene space can be related to a pixel on the image plane using the pinhole camera model.


Silhouettes are generated using a statistical background subtraction with shadow suppression. The variance in each of the three color channels for a sequence of background images is learned. Shadows and highlights are assumed to be pixel values along the "chromaticity line" in the vicinity of the mean value. Pixel values outside of this line is classified as foreground. The result is shown in the black and white image below.


Shape-from-silhouette (SFS) has been studied to a good extent in the recent past. Several real-time algorithms were found. They all entail an efficient determination of voxel occupancy that involves projecting the voxel onto the silhouette image planes and determining whether the silhouette and imaged voxel overlap.

The video below illustrates the raw voxel reconstruction results from the silhouette images captured in the LISA-M. The boundaries for in-position, out-of-position, and critically-out-of-position regions are also shown. Voxels in the respective regions are colored green/blue, yellow and red.


Following voxel reconstruction, the head and torso is found. A spherical crust kernel is convolved over the surface voxels of the reconstruction to locate the head. The neck voxels are found by calculating the centroid of voxels surrounding the outer sphere of the crust kernel. The neck point serve as an anchor point to iteratively align the torso cylinder over the remaining voxel data.


The position of the head is tracked with the Polhemus FasTrak electromagnetic motion tracking device. Two receivers are placed on either side of the head of the occupant and their positions tracked. The average of the two measured receiver locations is taken to be the ground-truth location of the head at that instant in time. The difference between estimated head location and ground-truth are shown in the figure below.

The occupant was asked to assume a set of poses. For 50 frames each, the following 6 poses was asked of the occupant.

Pos Lower Body Upper Body
1 Seated back Move head forwards and backwards
2 Move whole body forward and backwards
3 Seated forward Move head forwards and backwards
4 Seated back Move head left and right
5 Seated back Move head counterclockwise
6 Seated back Move head clockwise

Three peaks in the normed difference plot that exceed 50cm difference indicate instances when the head was incorrectly detected to be elsewhere in the scene. The average error difference is 10.7cm. Omitting frames when the head is incorrectly detected, the average difference decreases to 7.55cm. This latter difference can be attributed to the approximate placement of the sensors on the subject's head during test to measure the centroid of the head. The alignment between the two tracks does not take into account head tilt. This can be observed in the y-axis plot. The subjects head at 60cm is tilted back, at 40cm is upright, and 20 cm is tilted forward, producing this deviation from ground-truth.

In 300 frames, 19 frames or 6.33% resulted in a head detection error. The primary cause is the voxel reconstruction artifact in a region that appeared more spherical than the head.


For references, see publications page.