The system is able to be track participants and their identities, as well as the identities of the current speaker. Based on this information and the known environment, an important set of activities can be defined and recognized in real time. 3D tracking is based on the video from static cameras with overlapping fields of view.

Active camera network is used for two purposes: taking snapshots of people's faces to be used for face recognition and for capturing video of interesting events. People are recognized by combing face and voice recognition results to achieve a more robust performance. In addition to the described real-time functionality, the system enables reviewing of past events in the environment. The events are summarized graphically to enable the user to easily grasp the spatio-temporal relationships between events and participants.

This graphic also serves as a user interface for an interactive review of the events such as replay of video associated with a certain event.


AVIARY video: Digivation Conference, September 2000. (17MB real video)

System Demonstration CVPR Workshop on Human Motion, Analysis and Synthesis, June 2000. (3MB real video)

HUMO 2000 Video, December 2000. (7MB real video)