模式识别实验室

中国科学院自动化研究所

联系我们 English

学术讲座

Bridging Vision, Language, and Action: Toward Generalizable and Efficient Multi-modal AI Systems

模式识别系列讲座
Lecture Series in Pattern Recognition

题 目（TITLE）：Bridging Vision, Language, and Action: Toward Generalizable and Efficient Multi-modal AI Systems

讲座 人（SPEAKER）: Prof. Rama Chellappa

主持人 (CHAIR)：Prof. Zhaoxiang Zhang

时间 (TIME)：April 24, 2026, 9:00 AM

地点 (VENUE)： Microsoft Teams Meeting ID: 939 331 714 3542 Password: Tj6M3a

报告摘要（ABSTRACT）：

Recent advances in multimodal learning are transforming how AI systems integrate perception, reasoning, and interaction across visual and linguistic domains. In this talk, I will present a set of research contributions aimed at developing generalizable and computationally efficient vision-language models. I will begin with egocentric video-language pretraining (EgoVLPv2) and general-purpose vision-language reasoning (VistaLLM), which unify coarse-to-fine understanding across diverse tasks within a single framework. I will then describe progress in temporal video grounding (Enrich-and-Detect), adaptive low-rank personalization (DuoLoRA), and open-vocabulary robotic task execution (ConceptAgent), highlighting advances that extend multimodal reasoning into embodied settings. Complementary work on building spatial intelligence, visual distillation (ViT-Linearizer) and federated domain adaptation (ADAPT) further demonstrate scalable approaches for data- and resource-efficient learning. I will conclude by discussing recent studies on robustness and bias evaluation in medical AI applications. Together, these developments outline a path toward flexible, dependable, and broadly capable multimodal AI systems.

报告人简介（BIOGRAPHY）：

Prof. Rama Chellappa is a Bloomberg Distinguished Professor in the Departments of Electrical and Computer Engineering and Biomedical Engineering, with a secondary appointment in Computer Science, at Johns Hopkins University (JHU). At JHU, he is affiliated with the Center for Imaging Science, the Center for Language and Speech Processing, the Data Science and Artificial Intelligence Institute, the Institute for Assured Autonomy, and the Mathematical Institute for Data Science. He also holds a non-tenured appointment as a College Park Professor in the ECE department at the University of Maryland. His research spans artificial intelligence, computer vision, image processing and analysis, and machine learning. In recent years, his work has increasingly focused on developing large multi-modal foundational models. Prof. Chellappa is the recipient of numerous prestigious honors, including the 2012 K. S. Fu Prize from the International Association for Pattern Recognition (IAPR); the Society, Technical Achievement, and Meritorious Service Awards from the IEEE Signal Processing Society; the Technical Achievement and Meritorious Service Awards from the IEEE Computer Society; the Inaugural Leadership Award from the IEEE Biometrics Council; the 2020 IEEE Jack S. Kilby Medal for Signal Processing; and the 2024 Edwin H. Land Medal from Optica. Most recently, he received both the Distinguished Researcher Award and the Azriel Rosenfeld Lifetime Achievement Award from the IEEE Computer Society’s Pattern Analysis and Machine Intelligence Technical Committee. He is a member of the National Academy of Engineering, a Foreign Fellow of the Indian National Academy of Engineering, and a Fellow of AAAI, AAAS, ACM, AIMBE, IAPR, IEEE, the National Academy of Inventors (NAI), Optica, and the Washington Academy of Science. He holds nine patents.

友情链接