Personal images



Computer Vision and Robotics Research Lab

Department of Computer Science and Engineering

University of California, San Diego


SERF Building, Room 150, 9500 Gilman Dr, La Jolla, CA 92093-0434

cutran AT



My research interests are vision-based articulated body pose tracking and human activity analysis for interactive applications such as intelligent driver assistance systems, gesture-based interactive games, and smart rooms. In particular, some research issues that I have been working on are

  • Human activities can be seen at different levels of details such as full body, upper body, head, hands, and feet. Depending on specific applications, we need to develop algorithms focusing on different levels of details which provide useful information about the concerned activities. How to efficiently track and utilize information from multiple levels of details for better human activity analysis is an important and open issue.

  • Typically, we need to deal with the trade-offs between achieving detailed information of human gesture at different levels and the efficiency (e.g. real-time performance) as well as robustness which are particularly important for interactive applications.

  • The interaction with user could provide helpful information. Therefore we should figure out how to utilize such input.


Driver foot behavior modeling and prediction

Driver foot behavior analysis is an important factor in developing Intelligent Driver Assistance Systems but it has not been explored much in the related research studies. We propose and implement a new vision based framework for driver foot behavior analysis using optical flow based foot tracking and a Hidden Markov Model (HMM) based technique to characterize the temporal foot behavior. Our experimental analysis with a real-world driving testbed showed good results both in characterizing driver foot behavior into several semantic states that we proposed (e.g. moving towards brake, release from brake, etc) as well as in predicting a pedal press before it actually happens. These results indicate a unique opportunity to harness computer vision in improving safety on our roads.

Example of optical flow based foot tracking

Visualization of foot tracking output along with synchronously captured input data


Examples of the driver foot behavior analysis on brake and acceleration foot movements

Related publications

C. Tran, A. Doshi, and M.M. Trivedi, "Modeling and Prediction of Driver Behavior by Foot Gesture Analysis", Computer Vision and Image Understanding, vol. 116, no. 3, pp. 435-445, 2012. (pdf)

C. Tran, A. Doshi, M.M. Trivedi, “Pedal Errors Prediction by Driver Foot Gesture Analysis: A Vision-based Inquiry”, IEEE Intelligent Vehicle Symposium, June 2011. (pdf)

C. Tran, A. Doshi, and M.M. Trivedi, "Investigating Pedal Errors and Multi-modal Effects: Novel Driving Testbeds and Experimental Analysis", submitted to IEEE Intelligent Vehicle Symposium, under review 2012.



XMOB (Extremity MOvement Observation) for upper body tracking and gesture analysis

XMOB upper body tracker

We develop a real-time, marker-less upper body pose tracking in 3D. To achieve the robustness and real-time performance, the idea is to break the exponentially large search problem of upper body pose into two steps: First the 3D movements of upper body extremities (i.e. head and hands) are tracked. Then using knowledge of upper body model constraints, these extremities movements are used to infer the whole 3D upper body motion as an inverse kinematics problem. Since the head and hand regions are typically well defined and undergo less occlusion, tracking is more reliable and could enables more robust upper body pose determination. Moreover by breaking the problem of upper body pose tracking into two steps, the complexity is reduced considerably.

Using pose tracking output, the recognition of several gestures (e.g. pointing, punching, waving, clapping, dumbbell) is then done based on Longest Common Sub-sequence (LCS) similarity measurement of upper body joint angles dynamics. As far as I am concerned, this is the very first system that does both 3D upper body pose inference in real-time and then gesture recognition based on pose tracking outputs. We have also applied this system to develop an interactive balloon game in which the player uses some upper body gesture to interact with the balloons.

Interactive balloon game based on XMOB

Related publications

C. Tran and M.M. Trivedi, “3D Posture and Gesture Recognition for Interactivity in Smart Space”, IEEE Transactions on Industrial Informatics, vol. 8, no. 1, pp. 178-187, Feb 2012. (pdf)

C. Tran and M.M. Trivedi, “Introducing XMOB: Extremity Movement Observation Framework for Upper Body Pose Tracking in 3D", IEEE International Symposium on Multimedia, 2009. (pdf)

C. Tran and M.M. Trivedi, "Driver Assistance for Keeping Hands on the Wheel and Eyes on the Road," IEEE International Conference on Vehicular Electronics and Safety, 2009. (pdf)



Body and hand tracking from voxel data with automatic initialization

We develop an integrated framework for automated body & hand model initialization and tracking using voxel (volumetric) data. We combine the Kinematically Constrained Gaussian Mixture Model (KC-GMM) method for articulated body pose inference [Cheng and Trivedi '07] with the Laplacian Eigenspace (LE) based voxel segmentation method [Sundaresan and Chellappa '08]. The idea is to exploit the advantages of both methods: The LE based voxel segmentation helps to fill in the gap of an automated model initialization for KC-GMM method. On the other hand, a tracking-based method like KC-GMM, to some extent, helps to overcome the sensitization to noise of doing LE based of voxel segmentation at every frame.


Steps of the combined method for automatic body/hand model initialization and tracking

A fast procedure for body/hand model initialization with a stretch pose


Although the ability of tracking human body at multiple levels of details (e.g. full body, head, hands, and feet) is desirable, typically related research studies only deal with each level separately. Some reasons for this fact are: First, the search space for an optimal pose grows exponentially with the number of degrees of freedom of the body model so a body model including both coarse level of the body and fine level, e.g. hand and fingers, would become an "explosion in the search space". Applying current methods to solve this huge problem in one shot will be  very inefficient or even not possible. Second, the difference in size between body and hand leads to difficulty in achieving good data (e.g. voxel data) as well as estimating the pose of both body and hand in one shot.

In a very first attempt to deal with this issue, we develop a system in which we use different camera arrays for body, hand and head to be able to have good data for each task. The huge problem of estimating full model of human body is still broken into different tasks of estimating body pose, hand pose and head pose. These tasks are done separately therefore the issue of searching space explosion will not arise. However, we do calibrate these different camera arrays into the same extrinsic world coordinates and capture data for body, hand and head simultaneously so that, at the final step, the 3D pose tracking output at different levels can be combined into a full model of human body.

Visual result of combining achieved body model and hand model into a full model


Related publications

C. Tran and M.M. Trivedi, "Hand Modeling and Tracking from Voxel Data: An Integrated Framework with Automatic Initialization", IEEE International Conference on Pattern Recognition, 2008. (pdf)

C. Tran, “Towards Multilevel Human Body Modeling and Tracking in 3D: Investigation in Laplacian Eigenspace Based Initialization and Kinematically Constrained Gaussian Mixture Modeling”, UCSD Research Exam, June 2008. (pdf)



Highway emission monitoring

In this project, I am in charge of integrating the VECTOR system for highway vehicle tracking & classification developed by  our lab-mate, Dr. Brendan Morris, with the vehicle emission models developed by our colleagues at UC Riverside for highway emission monitoring in real-time. I also developed a visualization system with plots of several traffic measurements and emission status as well as showing the monitored area on Google map.



A test system monitoring part of Interstate 5 highway everyday from 5am-8pm



Related publications

B.T. Morris, C. Tran, G. Scora, M.M. Trivedi, and M.J. Barth, “Real-Time Video-Based Traffic Measurement and Visualization System for Energy/Emissions”, IEEE Transactions on Intelligent Transportation Systems, to appear 2012.

G. Scora, B. Morris, C. Tran, M. Barth, and M.M. Trivedi, “Real-Time Roadway Emissions Estimation using Visual Traffic Measurements”, IEEE Forum on Integrated and Sustainable Transportation System, 2011.



Multimodal Research Testbed Including Audio-Visual and Physiological Signals for Multiplayer Interactivity

Take part in designing and setting up a distributed testbed to synchronously acquire mutisensory signals for investigating their dynamics over time in multiplayer interactivity.




Related publications

A. Tawari, C. Tran, A. Doshi, T.O. Zander, and M.M. Trivedi, "Multisensory Signals Acquisition and Analysis in Dyadic Interactions", ACM SIGCHI, to appear 2012.



Open source driving simulator with audio-visual inputs and displays

Based on the open source TORCS racing simulator to design simulated driving experiment for studying driver behavior and attention models. There are multiple surrounding vehicles with pre-defined behaviors as well as spatial audio-visual feedbacks.

A clip of the driving simulator with audio-visual inputs and displays

Related publications

A. Doshi, C. Tran, M.H. Wilder, M.C. Mozer, M.M. Trivedi, “Sequential Dependencies in Driving”, Cognitive Science, to appear 2012. (pdf)



Triton Eyes

A distributed interactive video array in UCSD Jacobs Hall for real-time analysis of human activity and interaction in a common area (news). This is an ongoing work in which I was in charge of setting up the two omni-directional Ladybug system (each has its own six cameras) and streaming the live video feeds over the network.


An illustrative clip