Special Topics: Autonomous Driving
Mondays and Wednesdays 5:00 - 6:20 PM
Zoom (passcode available on Canvas and Piazza)
Instructor: Prof. Mohan M. Trivedi (mtrivedi)
TAs: Nachiket Deo (ndeo), Ross Greer (regreer)
Office Hours: Weds. 11 AM (Ross), Fri. 12 PM (Nachiket)
The field of intelligent vehicles has been subject to significant active research in recent times, with computer vision playing a key role in the development of autonomous systems capable of safely navigating real-world traffic. In this course we will review some of the concepts involved in such systems. In particular, we will focus on three themes:
Understanding the environment around the intelligent vehicle in order to safely navigate through it. This will involve topics in object detection, tracking, semantic and instance segmentation, motion/intent prediction of surrounding agents etc.
Monitoring the state of the driver. This is crucial for development of semi-autonomous systems and for safe and smooth control transitions between the driver and the vehicle. This will involve topics such as driver behavior analysis, driver activity estimation, driver readiness and takeover time estimation etc.
Auxiliary topics like end-to-end driving, adversarial attacks, domain adaptation etc.
This is an advanced class, covering recent developments in applied computer vision. Students at all levels including Undergraduate, Masters and PhD, with a strong interest in computer vision may enroll. Prior background in computer vision, machine learning, and deep learning is required, preferably through research experience or as covered by related courses in ECE and CSE. In particular, students should understand common loss functions, convolutional network architectures, and challenges associated with high-volume data. Knowledge of attention, transformers, and evaluation metrics for detection & tracking is highly recommended.
Students are encouraged to contact the instructor if unsure about meeting any criteria for enrollment. Students are additionally required to perform satisfactorily in the aptitude test administered in the first lecture.
Course Format and Requirements
The course will involve presenting and discussing recent papers related to intelligent vehicles, primarily focused on vision based approaches. The papers will be grouped by topic. These will be supported by presentations by the instructor and members of LISA to provide background or introduce topics.
You will be assigned a paper(s) to read and present to the class. The first 4 presenters each week present on Monday, and the next 4 present on Wednesday.
The presentation should last approximately 20 minutes, and will be followed by 10 minutes of Q&A and discussion. We will be timing presentations to maintain the reading schedule, and may cut off presenters who exceed the allotted time. Similarly, if we find one partner presenting more than half of the time, we may cut off and move on to the next partner.
At a minimum, your presentation should cover:
Required Background: for example, if your paper explores a novel CNN architecture, perhaps you might take a few slides to explain what a convolutional neural network is before you share details about the paper's unique network architecture. Your classmates will appreciate the lesson (or review), and you can continue to build your understanding of fundamental ideas as you explore your assigned paper.
Analysis and results
Advantages & disadvantages of approach
Key contribution(s) to the field
Some papers have readily-available repositories for readers to replicate and explore the models. If applicable to your paper, feel free to do so (and share your findings) to enhance your presentation.
You are additionally required to make a private post on Piazza with two questions & answers related to your paper. These questions should be at a level that a classmate could answer following your presentation.
If you have questions while reading your paper and developing your presentation, before asking in Office Hours, you should first ask your question publicly on Piazza to initiate class discussion. TAs will also be attentive to this channel to assist.
Presentation Overview Quizzes (30%)
Quizzes will be given at the end of each unit. Quizzes will draw on concepts and ideas from presented papers, and may include questions created by your classmates.
Class & Piazza Participation (20%)
You can earn your participation grade by:
Coming to class ready to discuss reading assignments,
Engaging in questions or discussion with presenters and peers during class,
Engaging in questions or discussion with presenters and peers on Piazza.
This list is non-exhaustive.
The course will cover a diverse range of topics in intelligent vehicle technology with a focus on computer vision applications. Topics may be added and/or removed as the course progresses based on timing and resource constraints.
Module 1: Overview
Module 2: Datasets for perception, prediction and planning
Module 3: Object detection
Module 4: Object tracking
Module 5: Segmentation (semantic, instance and panoptic)
Module 6: Behavior and trajectory prediction
Module 7: Planning and control
Module 8: Misc topics (e.g. simulation, mapping, domain adaptation, object saliency)
Module 9: Driver behavior analysis
Module 10: Control transitions
Week 1: Introduction & Aptitude Test [Presenter: Prof. Trivedi]
Welcome and Course Logistics
Introduction of the Intelligent Vehicles and Autonomous Driving
Levels for Autonomous Vehicles, Key Milestones and Research Contributions
Week 2: Autonomous Driving: Milestones and Challenges
Safety as a critically important factor for Real-world Intelligent Vehicles [Presenter: Prof. Trivedi]
Perception, Planning, Control Loop for Human-Centered Safe Vehicles [Presenter: Prof. Trivedi]
Data Driven Approaches, Machine Learning and CNN Overview [Presenter: Ross]
Week 3: Datasets for Perception, Prediction, and Planning
A. Geiger, P. Lenz, and R. Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." CVPR 2012 [Cameron Lewis & Tanvir Reza Hussain]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. "The cityscapes dataset for semantic urban scene understanding." CVPR 2016. [Cameron Lewis & Tanvir Reza Hussain]
N. Gählert, N. Jourdan, M. Cordts, U. Franke, and J. Denzler. "Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection." arXiv preprint arXiv:2006.07864 (2020) [Cameron Lewis & Tanvir Reza Hussain]
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo et al. "Scalability in perception for autonomous driving: Waymo open dataset." CVPR 2020. [Sumega Mandadi & Lulua Rakla]
S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai et al. "Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset." ICCV 2021. [Sumega Mandadi & Lulua Rakla]
H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. "nuscenes: A multimodal dataset for autonomous driving." CVPR 2020. [Kai Chuen Tan & Hao Yu]
H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari. "nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles." arXiv preprint arXiv:2106.11810 (2021). [Kai Chuen Tan & Hao Yu]
M-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang et al. "Argoverse: 3d tracking and forecasting with rich maps." CVPR 2019. [Albert Liao & Zhaowei Yu]
B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan et al. "Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting." ICLR 2022. [Albert Liao & Zhaowei Yu]
Module 1 & 2 Quiz during class
Anchor free 2-D detectors
Z. Tian, C. Shen, H. Chen, and T. He. "Fcos: Fully convolutional one-stage object detection." ICCV 2019. [Neil Chen & Chieko Sarah Imai]
X. Zhou, D. Wang, and P. Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019). [Neil Chen & Chieko Sarah Imai]
Transformer based 2-D detectors
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. "End-to-end object detection with transformers." ECCV 2020. [Afnan Alofi & Mo Chen]
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai. "Deformable detr: Deformable transformers for end-to-end object detection." ICLR 2021. [Afnan Alofi & Mo Chen]
3-D detection (Lidar-based)
Lang, Alex H., et al. "Pointpillars: Fast encoders for object detection from point clouds." CVPR 2019. [Songlin Xu & Jack Bae]
T. Yin, X. Zhou, and P. Krahenbuhl. "Center-based 3d object detection and tracking." CVPR 2021. [Songlin Xu & Jack Bae]
3-D detection (Camera-based)
Y. You, Y. Wang, W-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger. "Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving." ICLR 2020. [Xiaoqiang Qi]
D. Park, R. Ambrus, V. Guizilini, J. Li, and A. Gaidon. "Is Pseudo-Lidar needed for Monocular 3D Object detection?." ICCV 2021. [Curtis Lee]
3-D detection (Sensor fusion)
S. Vora, A. H. Lang, B. Helou, and O. Beijbom. "Pointpainting: Sequential fusion for 3d object detection." CVPR 2020. [Sean Carda & Alex Dalla-Ricca]
T. Yin, X. Zhou, and P. Krähenbühl. "Multimodal Virtual Point 3D Detection." NeurIPS 2021. [Sean Carda & Alex Dalla-Ricca]
Annotation for 3-D object detection and tracking
Zimmer, Walter, Akshay Rangesh, and Mohan Trivedi. "3d bat: A semi-automatic, web-based 3d annotation toolbox for full-surround, multi-modal data streams." 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019. [M. Ali Zolfaghari]
C. R, Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov. "Offboard 3D Object Detection from Point Cloud Sequences" CVPR 2021. [Kevin Anderson]
Multi-object tracking metrics:
K. Bernardin, and R. Stiefelhagen. "Evaluating multiple object tracking performance: the clear mot metrics." EURASIP Journal on Image and Video Processing 2008. [Ziming Qi & Junxian Qu]
J. Luiten, A. Osep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, and B. Leibe. "Hota: A higher order metric for evaluating multi-object tracking." IJCV 2021 [Ziming Qi & Junxian Qu]
Quiz 2 During Class, covering material through 5/2
2-D Multi-object tracking:
X. Zhou, V. Koltun, and P. Krähenbühl. "Tracking objects as points." ECCV 2020. [Muyao Liu & Jianyu Tao]
P. Tokmakov, J. Li, W. Burgard, and A. Gaidon. "Learning to track with object permanence." ICCV 2021. [Muyao Liu & Jianyu Tao]
2-D Multi-object tracking:
T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer. "Trackformer: Multi-object tracking with transformers." arXiv preprint (2021). [Samveed Desai & Sriram Shreedharan]
A. Rangesh, P. Maheshwari, M. Gebre, S. Mhatre, V. Ramezani, and M. M. Trivedi. "Trackmpnn: A message passing graph neural architecture for multi-object tracking." arXiv preprint (2021). [Samveed Desai & Sriram Shreedharan]
3-D Multi-object tracking:
X. Weng, J. Wang, D. Held, and K. Kitani. "3d multi-object tracking: A baseline and new evaluation metrics." IROS 2020. [Xingyue Wang & Haoran Zhu]
A. Kim, A. Ošep, and L. Leal-Taixé. "Eagermot: 3d multi-object tracking via sensor fusion." ICRA 2021. [Xingyue Wang & Haoran Zhu]
N. Deo, E. Wolff, O. Beijbom. "Multimodal Trajectory Prediction conditioned on Lane-Graph Traversals." CoRL 2021. [Yen-Ting Huang & Ching-Jin Chen]
J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H-T. L. Chiang, J. Ling, R. Roelofs et al. "Scene Transformer: A unified architecture for predicting multiple agent trajectories." ICLR 2022. [Yen-Ting Huang & Ching-Jin Chen]
K. Mangalam, Y. An, H. Girase, J. Malik. "From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting." ICCV 2021. [Jiawei Zhang & Rujun Yan]
B. Varadarajan, A. Hefny, A. Srivastava, K. S. Refaat, N. Nayakanti, A. Cornman, K. Chen et al. "MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction." arXiv preprint arXiv:2111.14973 (2021). [Jiawei Zhang & Rujun Yan]
Y. Liu, Q. Yan, A. Alahi. "Social NCE: Contrastive Learning of Socially-aware Motion Representations." ICCV 2021. [Hung-Te Cheng]
Greer, Ross, Nachiket Deo, and Mohan Trivedi. "Trajectory prediction in autonomous driving with a lane heading auxiliary loss." IEEE Robotics and Automation Letters (2021) [Kunaal Malodhakar]
Kevan Yuen and Mohan M. Trivedi, "Looking at Hands in Autonomous Vehicles:A ConvNet Approach using Part Affinity Fields," IEEE Transactions on Intelligent Vehicles, 2019 [Jason Isa]
Akshay Rangesh, Bowen Zhang and Mohan M. Trivedi, "Gaze Preserving CycleGANs for Eyeglass Removal & Persistent Gaze Estimation," 2020. [Evan Smith]
Martin, Manuel, et al. "Drive&act: A multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles." IEEE/CVF ICCV. 2019. [Tritai Nguyen]
Ortega, Juan Diego, et al. "Dmd: A large-scale multi-modal driver monitoring dataset for attention and alertness analysis." European Conference on Computer Vision. Springer, Cham, 2020. [Yichen Jia]
Akshay Rangesh, Nachiket Deo, Ross Greer, Pujitha Gunaratne and Mohan M. Trivedi, "Autonomous Vehicles that Alert Humans to Take-Over Controls: Modeling with Real-World Data," 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021 [Jia Qiu]
Palazzi, Andrea, et al. "Predicting the Driver's Focus of Attention: the DR (eye) VE Project." IEEE transactions on pattern analysis and machine intelligence (2018) [Ashwin Rao]
Abati, Davide, et al. "Latent space autoregression for novelty detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019 [Muqing Li]
M. Gao, A. Tawari and S. Martin, "Goal-oriented Object Importance Estimation in On-road Driving Videos," 2019 International Conference on Robotics and Automation (ICRA), 2019 [Tian Qiu]
Quiz 3 [Following Presentations, during our 7-10 PM final time]
DARPA Grand Challenge, Journal of Field Robotics, Special Issues (1 & 2), 2006 [link1, link2]
K. Bengler, K. Dietmayer, B. Farber, M. Maurer, C. Stiller, H. Winner, "Three Decades of Driver Assistance Systems: Review and Future Perspectives", IEEE Intelligent Transportation Systems, 2014 [pdf]
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R., "Vision meets robotics: The KITTI dataset", The International Journal of Robotics Research, 2013 [pdf]
Mohan M. Trivedi, Tarak Gandhi, Joel McCall, "Looking-In and Looking-Out of a Vehicle: Computer-Vision-Based Enhanced Vehicle Safety", IEEE Transactions on Intelligent Transportation Systems, 2007 [pdf]
Anup Doshi, Brendan Morris, and Mohan Trivedi, "On-Road Prediction of Driver's Intent with Multimodal Sensory Cues", IEEE Pervasive Computing, Special Issue on Automotive Pervasive Computing, 2011 [pdf]
Ashish Tawari, Sayanan Sivaraman, M Trivedi, T Shanon, M Tippelhofer, "Looking-in and Looking-out Vision for Urban Intelligent Assistance: Estimation of Driver Attention and Dynamic Surround for Safe Merging and Braking", IEEE Intelligent Vehicles Symposium, 2014 [pdf]
Frankie Lu, Sean Lee, Ravi Kumar Satzoda, and Mohan M. Trivedi, "Embedded Computing Framework for Vision-Based Real-Time Surround Threat Analysis and Driver Assistance", IEEE Conference on Computer Vision and Pattern Recognition - Workshop on Embedded Vision (Best Demo Award), 2016 [pdf]
Eshed Ohn-Bar and Mohan M. Trivedi, "Looking at Humans in the Age of Self-Driving and Highly Automated Vehicles", IEEE Transactions on Intelligent Vehicles, 2016 [pdf]
Nachiket Deo, Akshay Rangesh, Mohan Trivedi, "How would surround vehicles move? A Unified Framework for Maneuver Classification and Motion Prediction", IEEE Transactions on Intelligent Vehicles, 2018 [pdf]
Akshay Rangesh, Nachiket Deo, Kevan Yuen, Kirill Pirozhenko, M. Trivedi, Heishiro Toyoda, Pujitha Gunaratne, "Exploring the Situational Awareness of Humans inside Autonomous Vehicles", IEEE International Conference on Intelligent Transportation Systems (ITSC), 2018 [pdf]
P. Varaiya, "Smart Cars on Smart Roads: Problems of Control", IEEE Transactions on Automatic Control, 1992
J.K. Hedrick, M. Tomizuka, & P. Varaiya, "Control Issues in Automated Highway Systems", IEEE Control Systems, 1994
Batavia, P. H., Pomerleau, D. A., & Thorpe, C. E., "Applying Advanced Learning Algorithms to ALVINN", 1996
Jochem, T. M., Pomerleau, D. A., & Thorpe, C. E., "MANIAC: A Next Generation Neurally Based Autonomous Road Follower", Proceedings of the International Conference on Intelligent Autonomous Systems, 1996
D. Langer & T. Jochem , "Fusing Radar and Vision for Detecting, Classifying and Avoiding Roadway Obstacles", Proceedings of Conference on Intelligent Vehicles, 1994
Benjamin Ranft and Christoph Stiller: The Role of Machine Vision for Intelligent Vehicles