ECE 285:  Topics In Intelligent Systems

Computer Vision and Multimodal Perception

-Winter 2009-

 

Instructor:
Mohan M. Trivedi 
trivedi@ece.ucsd.edu

 


 


 

 

Outline

Prerequisites

Announcements

Relevant Papers

Useful Links


 

Course Outline:

“Intelligent or Smart” environments should support efficient and effective interactions among humans and machines. This requires versatile multimodal systems for perceiving the environment and its dynamic states. In this course we will introduce and discuss topics associated with the problem of semantic interpretation of events and patterns captured by multimodal sensors embedded in Intelligent Environments. The main focus of this course will be to gain an understanding of how to extract useful cues and representations from information captured by single and multiple sensory modules. Students will acquire knowledge of important concepts associated with computer vision and perception systems to be designed for real-world applications, including intelligent vehicles; intelligent meeting rooms, and intelligent assistive living facilities. The class will include study of selected "classic" publications of the field as well as undertake critical examination of promising new approaches from the literature. A highlight of the course will be a project where the students will design, develop, and evaluate a full system or a major subsystem in the CVRR (http://cvrr.ucsd.edu); SHIVA (http://cvrr.ucsd.edu/SHIVA) or LISA (http://cvrr.ucsd.edu/LISA) laboratories.

Topics covered in the course include:

  • Models and Hierarchy for Computational Vision and Learning
  • Dynamic Scene Analysis and Tracking
  • Human Activity and Interactions Recognition
  • Active Perception and Exploration of Environments with Multiple Sensors
  • Multimodal Sensors and Information Fusion
  • Active Safety, Infotainment and Intelligent Driver Assistance Systems
  • Human Movement, Body, Gesture and Intent Analysis
  • Situational Awareness and Semantic Databases

 

Prerequisites:

Students enrolled in the course should have familiarity with image processing and pattern recognition concepts (e.g. ECE 172A, ECE 253A).  Additionally, knowledge of programming, interfacing, and computer graphics would be highly desirable.  Motivated undergraduate students may enroll in the course as an elective with the consent of the instructor.

 

Timings:  Monday- Wednesday, 4.00-5.50pm SERF 102

 

First Meeting: Monday, January 5th, 4.00-5.50pm, SERF 102


Final Presentation:  Program (pdf)


Project Report: Due Monday, March 23rd 2009.

 

4-6 pages long

 

Should bring out the following points clearly – same as presentation, just repeating for clarity.

 

Abstract

Introduction - Research Objectives, Research Motivation and Significance

Brief review of related studies - may be a nice comparative table

Methodology - Research Approach –why was it selected, key ideas and principles, computational approach and algorithmic details, novelty and originality (is there anything in your approach which is not reported in any published papers?)

Experimental studies for validation and evaluation—analysis of experimental data

Summary of your contributions

Concluding remarks and suggested directions for further research

 

 Write-up Tools for Report:
IEEE guidelines for articles
This site contains templates in Word doc and Latex form. Latex is a high quality typesetting system well suited for technical and scientific documents. It is the favored method for producing the papers read in this course. Check out http://www.latex-project.org to learn more and download a Latex distribution for use.

 



Announcements:

January 5th 2009

First Meeting: January 5, 4pm-5:20pm in SERF 102

 

Introduction and class overview.

Examining Vision from Neuro-scientific, perceptual psychology, and computational scientific perspectives.

Viewing of Mind's Eye documentary –

Email shankarts@gmail.com  if you would like to see it again.

 

 

January 6th 2009

 

Assignment 1 (Research Readings):

The objective of this assignment is to develop a good understanding of important findings about Vision, as discussed in the BBC documentary "Minds Eye" shown in the class. You will have to read related papers and material. A good place to start would be related links from Wikipedia and/or Google Scholar. Think about the ideas and then submit relative short answers to the following questions. Please try to be precise and to the point. Give specific references which you have used in answering each of the questions.

1. Summarize and discuss important ideas discussed in the documentary (100 words or less).
2. Read about the landmark research of Hubel and Wiesel. Summarize it in your own words.
3. The main neuro-physiological findings discussed in the documentary are about 25 years old. Read about the important new developments in human vision perception and summarize what you have learnt.
4. Read about Marr and his colleagues research on “Primal Sketch” and Multi-Scale Edge Matching. Discuss its relationship with the findings of Hubel and Wiesel.
5. Read about Marr-Ullman’s research on Motion Perception. Present a brief summary of the main ideas.
6. Read about the Prof. Gunnar Johansson’s contributions about perception and modeling of human movement. Summarize the main ideas and their implications.
7. Read about Bela Julesz’s insightful findings about human stereo perception. How did it influence early research in computational stereo?
8. Read about Julesz’s conjecture about Human Texture Perception. How did it influence computational frameworks about texture analysis?
9. State four different ways for depth extraction that animals (including humans) use. Discuss issues associated with the development of computational algorithms corresponding to the depth cues identified above.
10. How do humans differentiate between motion of objects from the motion associated with your own body and head (ego-motion)? Discuss a computational model for ego-motion compensation.
11. Read about “Vision-Based Action” and “Active Vision” paradigms. How is it different from what was introduced by David Marr?

DUE DATE: Monday Jan-12, email to shankarts@gmail.com by 11:59:59.99pm

January 7th 2009

Second Meeting: January 7, 4pm-5:20pm in SERF 102

Relevant Books –

1.      Computer Vision Related Textbooks:

a.       Computer Vision, by Linda Shapiro and George Stockman, Prentice Hall, 2001

b.      Computer Vision: A Modern Approach by Forsyth and Ponce, Prentice Hall, 2003

c.       Machine Vision by Ramesh Jain, Rangachar Kasturi and Brian Schunck, Mc-Graw Hill, Inc, 1995

d.      Image Processing, Analysis and Machine Vision by Sonka, Hlavac and Boyle, 2nd Edition, PWS Publishers, 1999.

 

2.      Image Processing and Analysis:

a.       Gonzalez and Woods

b.      Anil Jain

 

ECE 172A Fall 2008 – Class website

Reading Material :

 

1.      Mohan M. Trivedi, Kohsia S. Huang, Ivana Mikic, Dynamic Context Capture and Distributed Video Arrays for Intelligent Spaces, IEEE Trans. on Systems, Man and Cybernetics, Part A, Volume: 35, Issue: 1, Jan2005. Pages: 145-163. (pdf

)

 

2.      Kohsia Samuel Huang, Mohan M. Trivedi, Integrated Detection, Tracking, and Recognition of Faces with Omnivideo Array in Intelligent Environments, EURASIP Journal on Image and Video Processing, vol. 2008, Article ID 374528, 19 pages, 2008. (pdf)

January 12th 2009

Third Meeting: January 12th, 4pm-5:20pm in SERF 102

Reading Material :

1.      L Marchesotti, S Piva, C Regazzoni , Structured context-analysis techniques in biologically inspired ambient-intelligence systems, IEEE Transactions on Systems, Man and Cybernetics, Part A, 2005. (pdf)

 

2.      Katja Nummiaro, Esther Koller-Meier, Tom Svoboda, Daniel Roth and Luc Van Gool, Color-Based Object Tracking in Multi-camera Environment, Lecture Notes in Computer Science, September 2003. (pdf)

 

 

3.      Anton Nijholt, Rieks op den Akker and Dirk Heylen, Meetings and meeting modeling in smart environments, AI & Society, 2006. (pdf)

January 14th 2009

Fourth Meeting : January 14th, 4pm-5:20pm in SERF 102

 

Mini-Project Assignments.

 

White papers due by 19th January 2009.

January 19th 2009

Martin Luther King Day!

Januray 21st 2009

Fifth Meeting : January 21th, 4pm-5:20pm in SERF 102

 

Presentations –

 

·         Shankar Shivappa – Hidden Markov models and Dynamic Bayesian networks

Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989 (pdf)

HMM tutorial by Prof. Andrew Moore. (pdf)

Hidden Markov toolkit (HTK)

 

·         Anup Doshi – Background modeling and subtraction

Background Subtraction Tutorial by Prof. Massimo Piccardi. (pdf)

Piccardi, M., "Background subtraction techniques: a review," Systems, Man and Cybernetics, 2004 IEEE International Conference on , vol.4, no., pp. 3099-3104 vol.4, 10-13 Oct. 2004. (pdf)



Stauffer C, Grimson W. E. L. "Adaptive background mixture models for real-time tracking," in Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). IEEE Comput. Soc. Part Vol. 2, 1999. (pdf)



K. Kim, T. H. Chalidabhongse, D. Harwood and L. Davis, "Real-time Foreground-Background Segmentation using Codebook Model", Real-time Imaging, Volume 11, Issue 3, Pages 167-256, June 2005. (pdf)



Anup Doshi, Mohan M. Trivedi, "Satellite Imagery Based Robust, Adaptive Background Models and Shadow Suppression." Signal, Image, and Video Processing (SIViP) Journal, 1(2):119-132, June 2007. (pdf)

January 26th 2009

Sixth Meeting : January 26th, 4pm-5:20pm in SERF 102

 

Mini-Project presentations –

 

·         Ashish Tawari and Alexis Allegra (pdf)

 

Answer the following questions :

  1. Mention the features that are useful for gender classification and provide the definition.
  2. A speech signal can be characterized into 3 segments what are they? Also which part is useful for extracting the pitch of the speech?
  3. What does voice activity detection (VAD) do? Why do we want to use VAD in the analysis of speech signals? Does VAD always provide useful information? If not, how can we handle these situations? (Hint for the last part: Think about question 2 in detail.)
  4. For accurate pitch extraction what are “good” characteristics for a given audio signal?
  5. Name 3 methods available for extracting pitch information. What are the pros and cons of these methods?

 

·         Mulloy Morrow (pdf)

 

Answer the following questions :

1.      What is Speech Recognition, particularly Speaker Identification? What are two methods of evaluating performance? 

2.      MFCCs, What are they and why are they desirable features for this problem?

3.      What are DCT (discrete cosine transform) coefficients?

4.      Gaussian Mixture Models: What are they and why they are an excellent modeling scheme?

5.      Besides MFCCs and GMMs, what are additional tractable-elements that will improve adaptability and accuracy?

 

January 28th 2009

Seventh Meeting : January 28th, 4pm-5:20pm in SERF 102

 

Mini-Project presentations –

 

·         Ankit K. Jain (pdf)

 

Answer the following questions :

  1. Why is emotion recognition based on facial expression meaningful?
  2. What are thin-plate splines?
  3. Why are thin-plate splines useful in affect analysis?
  4. Why is affect analysis a difficult problem?
  5. What are some applications for emotion recognition?

 

·          Jacoby Larson (pdf)

 

Answer the following questions :

1.      What is the basic idea of the mean shift algorithm?

2.      Why is Gaussian mixture model better than 1-frame background subtraction?

3.      How does the Kalman filter do?

4.      What are some ways to do connected component analysis?

5.      How do you handle occlusion?

 

·          Ali Jamaleddine and Yin-Kai Wang (pdf)

 

Answer the following questions :

1.      Why bother with designing Sound Source Localizer in the presence of cameras?

2.      What makes SRP-maximization based-technique superior to TDOA-based technique?

3.      Why does SRP-maximization work?

4.      What can be done to improve the efficiency of the system?

5.      What can be a better solution to localize sounds and at what cost (relative to SRP and TDOA)?

 

 

 


Some Relevant Papers (more to come!):

 


Links:

 

CVRR Videos

Check out some of the recent work at CVRR

Laboratory for Safe and Intelligent Vehicles (LISA)

Prof. Trivedi’s Lab (SERF 267)

Systems for Human Interactivity, Visualization and Analysis (SHIVA)

Prof. Trivedi’s Lab (SERF 266)

 

Computer Vision Bibliography

Useful resource for Computer Vision Papers

Research Index

Good Research Paper and Citation index

Note some of the top Computer Vision Journals/Conferences:
* IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
* International Journal of Computer Vision (IJCV)
* Computer Vision and Image Understanding (CVIU)
* IEEE International Conference on Computer Vision (ICCV)
* Computer Vision and Pattern Recognition (CVPR)
* European Conference on Computer Vision (ECCV)

 

Notes On Giving A Research Talk

By Prof. Charles Elkan

 

MSDN AVI site

Has some sample code to read an AVI file in.

Video Mach

Program that converts still images into AVIs

GameDev.net - Working with AVI Files

Website explaining how to read AVI files into C++ programs

Machine Perception Laboratory

Downloadable neural network libraries and other useful MP tools

Intel's OpenCV Library

High Level C++ Image Processing Library

Intel's Image Processing Library (IPL)

Medium Level C++ Image Processing Library

Intel's Performance Library Suite

Useful C++ Libraries

Vision For Matlab

Useful  for grabbing images from VFW compatible cameras

questions or comments should be directed to Mohan Trivedi
Page Last Updated: 
3/20/2009