My research interests are vision-based
articulated body pose tracking and human activity analysis for
interactive applications such as intelligent driver assistance
systems, gesture-based interactive games, and smart rooms. In
particular, some research issues that I have been working on are
Human activities can be seen at
different levels of details such as full body, upper body, head,
hands, and feet. Depending on specific applications, we need to
develop algorithms focusing on different levels of details which
provide useful information about the concerned activities. How
to efficiently track and utilize information from multiple
levels of details for better human activity analysis is an
important and open issue.
Typically, we need to deal with
the trade-offs between achieving detailed information of human
gesture at different levels and the efficiency (e.g. real-time
performance) as well as robustness which are particularly
important for interactive applications.
The interaction with user could
provide helpful information. Therefore we should figure out how
to utilize such input.
Driver foot behavior modeling and prediction
behavior analysis is an important factor in developing
Intelligent Driver Assistance Systems but it has not been
explored much in the related research studies. We propose
and implement a new vision based framework for driver foot
behavior analysis using optical flow based foot tracking and
a Hidden Markov Model (HMM) based technique to characterize
the temporal foot behavior. Our experimental analysis with a
real-world driving testbed showed good results both in
characterizing driver foot behavior into several semantic
states that we proposed (e.g. moving towards brake, release
from brake, etc) as well as in predicting a pedal press
before it actually happens. These results indicate a unique
opportunity to harness computer vision in improving safety
on our roads.
optical flow based foot tracking
foot tracking output along with synchronously captured input
Examples of the
driver foot behavior analysis on brake and acceleration foot
C. Tran, A. Doshi, and M.M. Trivedi,
"Modeling and Prediction of Driver Behavior by Foot Gesture
Analysis", Computer Vision and Image Understanding, vol.
116, no. 3, pp. 435-445, 2012. (pdf)
C. Tran, A. Doshi, M.M. Trivedi,
“Pedal Errors Prediction by Driver Foot Gesture Analysis: A
Vision-based Inquiry”, IEEE Intelligent Vehicle Symposium,
June 2011. (pdf)
C. Tran, A. Doshi, and M.M. Trivedi, "Investigating
Pedal Errors and Multi-modal Effects: Novel Driving Testbeds and
Experimental Analysis", submitted to IEEE Intelligent Vehicle Symposium,
under review 2012.
(Extremity MOvement Observation) for upper body tracking and gesture
XMOB upper body
We develop a real-time, marker-less upper
body pose tracking in 3D. To achieve the robustness and
real-time performance, the idea is to break the
exponentially large search problem of upper body pose into
two steps: First the 3D movements of upper body extremities
(i.e. head and hands) are tracked. Then using knowledge of
upper body model constraints, these extremities movements
are used to infer the whole 3D upper body motion as an
inverse kinematics problem. Since the head and hand regions
are typically well defined and undergo less occlusion,
tracking is more reliable and could enables more robust
upper body pose determination. Moreover by breaking the
problem of upper body pose tracking into two steps, the
complexity is reduced considerably.
Using pose tracking output, the
recognition of several gestures (e.g. pointing, punching,
waving, clapping, dumbbell) is then done based on Longest
Common Sub-sequence (LCS) similarity measurement of upper
body joint angles dynamics. As far as I am concerned, this
is the very first system that
does both 3D upper body pose inference in real-time and then
gesture recognition based on pose tracking outputs. We have
also applied this system to develop an interactive balloon
game in which the player uses some upper body gesture to
interact with the balloons.
balloon game based on XMOB
C. Tran and M.M. Trivedi, “3D Posture
and Gesture Recognition for Interactivity in Smart Space”,
Transactions on Industrial Informatics, vol. 8, no. 1, pp. 178-187,
Feb 2012. (pdf)
C. Tran and M.M. Trivedi, “Introducing
XMOB: Extremity Movement Observation Framework for Upper Body Pose
Tracking in 3D", IEEE International Symposium on Multimedia,
C. Tran and M.M. Trivedi, "Driver
Assistance for Keeping Hands on the Wheel and Eyes on the Road,"
IEEE International Conference on Vehicular Electronics and Safety,
Body and hand tracking from
voxel data with automatic initialization
We develop an integrated
framework for automated body & hand model initialization and
tracking using voxel (volumetric) data. We combine the
Kinematically Constrained Gaussian Mixture Model (KC-GMM)
method for articulated body pose inference [Cheng
and Trivedi '07] with the Laplacian Eigenspace (LE)
based voxel segmentation method [Sundaresan
and Chellappa '08]. The idea is to exploit the
advantages of both methods: The LE based voxel segmentation
helps to fill in the gap of an automated model
initialization for KC-GMM method. On the other hand, a
tracking-based method like KC-GMM, to some extent, helps to
overcome the sensitization to noise of doing LE based of
voxel segmentation at every frame.
of the combined method for automatic body/hand model
initialization and tracking
procedure for body/hand model initialization with a
Although the ability of
tracking human body at multiple levels of details (e.g. full
body, head, hands, and feet) is desirable, typically related
research studies only deal with each level separately. Some
reasons for this fact are: First, the search space for an
optimal pose grows exponentially with the number of degrees
of freedom of the body model so a body model including both
coarse level of the body and fine level, e.g. hand and
fingers, would become an "explosion in the search space".
Applying current methods to solve this huge problem in one
shot will be very inefficient or even not possible.
Second, the difference in size between body and hand leads
to difficulty in achieving good data (e.g. voxel data) as
well as estimating the pose of both body and hand in one
very first attempt to deal with this issue, we develop a
system in which we use different camera arrays for body,
hand and head to be able to have good data for each task.
The huge problem of estimating full model of human body is
still broken into different tasks of estimating body pose,
hand pose and head pose. These tasks are done separately
therefore the issue of searching space explosion will not
arise. However, we do calibrate these different camera
arrays into the same extrinsic world coordinates and capture
data for body, hand and head simultaneously so that, at the
final step, the 3D pose tracking output at different levels
can be combined into a full model of human body.
result of combining achieved body model and hand model into
a full model
C. Tran and M.M. Trivedi, "Hand
Modeling and Tracking from Voxel Data: An Integrated Framework with
Automatic Initialization", IEEE International Conference on
Pattern Recognition, 2008. (pdf)
C. Tran, “Towards Multilevel
Human Body Modeling and Tracking in 3D: Investigation in
Laplacian Eigenspace Based Initialization and Kinematically
Constrained Gaussian Mixture Modeling”, UCSD Research
Exam, June 2008. (pdf)
Highway emission monitoring
In this project, I
am in charge of integrating the
VECTOR system for highway vehicle tracking & classification
developed by our lab-mate, Dr. Brendan Morris, with the
vehicle emission models developed by
our colleagues at UC
Riverside for highway emission monitoring in real-time. I also
developed a visualization
system with plots of several traffic
measurements and emission status as well as showing the monitored
area on Google map.
system monitoring part of Interstate 5 highway everyday from 5am-8pm
B.T. Morris, C. Tran, G. Scora, M.M.
Trivedi, and M.J.
Barth, “Real-Time Video-Based Traffic Measurement and
Visualization System for Energy/Emissions”, IEEE Transactions on Intelligent
Transportation Systems, to appear 2012.
G. Scora, B. Morris, C. Tran, M.
Barth, and M.M. Trivedi, “Real-Time Roadway Emissions Estimation
using Visual Traffic Measurements”, IEEE Forum on Integrated and
Sustainable Transportation System, 2011.
Testbed Including Audio-Visual and Physiological Signals for
Take part in designing and setting up a distributed testbed to
synchronously acquire mutisensory signals for investigating their
dynamics over time in multiplayer interactivity.
A. Tawari, C. Tran, A. Doshi, T.O.
Zander, and M.M. Trivedi, "Multisensory Signals Acquisition and
Analysis in Dyadic Interactions",
ACM SIGCHI, to appear 2012.
Open source driving
simulator with audio-visual inputs and displays
the open source TORCS
racing simulator to design simulated driving experiment for studying
driver behavior and attention models. There are multiple surrounding
vehicles with pre-defined behaviors as well as spatial audio-visual
A clip of
the driving simulator with audio-visual inputs and displays
A. Doshi, C. Tran, M.H. Wilder, M.C.
Mozer, M.M. Trivedi, “Sequential Dependencies in Driving”,
Cognitive Science, to appear 2012. (pdf)
interactive video array in UCSD Jacobs Hall for real-time analysis
of human activity and interaction in a common area (news).
This is an ongoing work in which I was in charge of setting up the
two omni-directional Ladybug system (each has its own six cameras)
and streaming the live video feeds over the network.