Shape-from-Silhouette Based Occupant Posture Estimation
The feasibility of a multi-camera voxel based occupant
posture estimation system is investigated. Several new
considerations are made to allow this tested human body
modeling system to work reliably in the passenger seat
of a vehicle, including camera position, segmentation,
and body modeling with voxel reconstructions, all from a
constrained 4 camera setup. To describe the occupant
posture, a partial human body model consisting of a head
and torso is proposed. The accuracy of the estimation of
this model is compared against ground truth.
The parameters to the pinhole camera model are estimated
using OpenCV and Matlab Calibration toolboxes. These
parameters are collectively called the intrinsic,
extrinsic and radial distortion parameters. The
calibration uses planar points on a checkerboard as data
for the parameter estimation. Calibration is done for
all four cameras.
With knowledge of the parameters, points in the scene
space can be related to a pixel on the image plane using
the pinhole camera model.
Silhouettes are generated using a statistical background
subtraction with shadow suppression. The variance in
each of the three color channels for a sequence of
background images is learned. Shadows and highlights are
assumed to be pixel values along the "chromaticity line"
in the vicinity of the mean value. Pixel values outside
of this line is classified as foreground. The result is
shown in the black and white image below.
Shape-from-silhouette (SFS) has been studied to a good
extent in the recent past. Several real-time algorithms
were found. They all entail an efficient determination
of voxel occupancy that involves projecting the voxel
onto the silhouette image planes and determining whether
the silhouette and imaged voxel overlap.
The video below illustrates the raw
voxel reconstruction results from the silhouette images
captured in the LISA-M.
The boundaries for in-position, out-of-position, and
critically-out-of-position regions are also shown.
Voxels in the respective regions are colored green/blue,
yellow and red.
MODELING FROM VOXEL DATA
Following voxel reconstruction, the head
and torso is found. A spherical crust kernel is
convolved over the surface voxels of the reconstruction
to locate the head. The neck voxels are found by
calculating the centroid of voxels surrounding the outer
sphere of the crust kernel. The neck point serve as an
anchor point to iteratively align the torso cylinder
over the remaining voxel data.
TRACKING VS. GROUND-TRUTH
The position of the head is tracked with
the Polhemus FasTrak electromagnetic motion tracking
device. Two receivers are placed on either side of the
head of the occupant and their positions tracked. The
average of the two measured receiver locations is taken
to be the ground-truth location of the head at that
instant in time. The difference between estimated head
location and ground-truth are shown in the figure below.
The occupant was asked to assume a set
of poses. For 50 frames each, the following 6 poses was
asked of the occupant.
Move head forwards and backwards
whole body forward and backwards
Move head forwards and backwards
Move head left and right
Move head counterclockwise
Move head clockwise
Three peaks in the normed difference
plot that exceed 50cm difference indicate instances when
the head was incorrectly detected to be elsewhere in the
scene. The average error difference is 10.7cm. Omitting
frames when the head is incorrectly detected, the
average difference decreases to 7.55cm. This latter
difference can be attributed to the approximate
placement of the sensors on the subject's head during
test to measure the centroid of the head. The alignment
between the two tracks does not take into account head
tilt. This can be observed in the y-axis plot. The
subjects head at 60cm is tilted back, at 40cm is
upright, and 20 cm is tilted forward, producing this
deviation from ground-truth.
In 300 frames, 19 frames or 6.33%
resulted in a head detection error. The primary cause is
the voxel reconstruction artifact in a region that
appeared more spherical than the head.
For references, see