The Center for Research on Intelligent Perception and Computing
     Chinese
Research  
   Research focus
Research focus

Pattern Recognition Theories and Frontiers
 

The NLPR has an energetic team of young talented researchers active in cutting-edge research in the broad field of Pattern Recognition theories and Computer Vision frontiers, with specific interests in Science for Al (Science4Al), pattern Representation Learning, image/video Generative Models, Al Security and Ethics, Al for Science (Al4Science). The team undertakes projects from the National Science Fund for Distinguished Young Scholars, National Science Fund for Excellent Young Scholars, Beijing Natural Science Foundation for Distinguished Young Scholars, Young Elite Scientists Sponsorship Program, Beijing Nova Program, and Excellent Member of CAS Youth Innovation Promotion Association. The team members are or have served as Deputy Editors-in-Chief, senior associate editor and associate editor of leading international journals including IEEE TPAMI, TIP, TIFS, TCSVT, TBIOM, ICV, PR, and TMLR, and have acted as area chairs for premier conferences (CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR, AAAI, IJCAI) over 30 times. In terms of education, faculty have received awards including the UCAS Li Pei Excellent Teacher Award, UCAS Zhu Li Yuehua Excellent Teacher Award, CAS Lu Jiaxi Young Talent Award, CAS Outstanding Mentor Award. Students have won honors such as the Beijing Outstanding Doctoral Dissertation Award, CAS Outstanding Doctoral Dissertation Award, CSIG Outstanding Doctoral Dissertation Award, CAS President Scholarship Excellence Award, CAS President Scholarship Special Award, and Baosteel Scholarship.
 
Brain-Like and Embodied Intelligence
 

This research group focuses on the core objective of "integrating brain science mechanisms with embodied intelligence to advance next-generation artificial intelligence." We are committed to developing intelligent agents with biological brain cognitive characteristics, environmental interaction capabilities, and autonomous evolution abilities, aiming to overcome traditional AI bottlenecks in environmental adaptation, causal reasoning, and autonomous learning, thereby providing theoretical and technical support for cutting-edge fields such as intelligent robotics and human-machine collaboration. Our research focuses on four key directions: 1) Brain-inspired intelligent modeling, constructing biologically plausible brain-like computational architectures based on neuroscientific analyses of cognitive mechanisms including perception, memory, and decision-making; 2) Multimodal environmental interaction, focusing on scene perception and understanding, spatial intelligence enhancement, visual language models (VLM), and visual-language-action models; 3) Agent architecture and evolution, building autonomous decision-making systems, developing evolutionary learning frameworks, and enabling multi-agent collaboration; 4) World models and embodiment platforms, conducting research on dynamic world modeling, embodied training environment development, and robotic experimental platforms. The research group has established a closed-loop research system of "mechanism exploration-algorithm innovation-platform verification": deriving insights from cognitive mechanisms through neuroscience experiments and transforming them into novel neural network architectures; building adaptive training paradigms based on reinforcement learning and meta-learning frameworks; optimizing agent behavioral strategies through brain-machine co-evolutionary approaches; and validating algorithms using independently developed simulation platforms and robotic systems, forming a complete innovation chain from theory to application.
 
Multi-Modal Computing and Intelligence
 

Towards the intelligent analysis needs of multimodal big data such as text, images, graphs, and time-series, this research focuses on studying the theoretical methods and key technologies of multimodal big data computation, and empowering a variety of intelligent industrial applications in both physical and cyber spaces. The research directions include: (1) Multimodal Foundation Models. Concentrating on the cutting-edge innovation and application of multimodal large models, systematically tackle key technologies such as instruction fine-tuning, continuous learning, value alignment, and model interpretability, strive to enhance the model's perception and understanding capabilities in multimodal scenarios, as well as complex reasoning abilities, and improve model security, reliability, and generalization. (2) Multimodal Network Big Data Mining Technology. Facing major application scenarios such as national security strategy, enterprise application implementation, and cutting-edge scientific research, focus on complex relationship mining and cross-modal semantic understanding of large-scale heterogeneous multimodal network data, break through key technologies such as intelligent analysis of multi-source heterogeneous data, high-performance computing, and knowledge discovery, effectively empowering industry applications. (3) Multimodal Vision-Language-Action Models. Addressing the core challenges in multimodal understanding and action for intelligent agents, research on data-knowledge dual-driven vision-language-action large models that can stably guide leg and hand movements across scenes and tasks, providing strong technical support for multimodal navigation and manipulation capabilities of robots in real-world scenarios. (4) Human-Machine Collaborative Multimodal Situational Awareness. Facing real-world human-machine collaborative tasks, research key technologies on human-centered multimodal perception reasoning and complex scenario situation evaluation, establish a multimodal situational awareness computing platform for human-machine collaboration, and serve important applications such as national public safety and intelligent transportation.
 
Intelligent Recognition Systems and Applications
 

The research on intelligent recognition systems and applications is driving a paradigm shift in artificial intelligence technology toward human-machine symbiosis, with its core lying in constructing a closed-loop system integrating multimodal perception, intelligent decision-making, and dynamic feedback. Anchored by foundational technologies including computational imaging, pattern recognition, large models, and edge computing, this field achieves comprehensive perception of human posture, behavior, facial expressions, and psychological characteristics. It develops high-precision, high-efficiency, and ultra-reliable AI systems to establish a "human-in-the-loop" interactive environment for future intelligent societies where humans and machines coexist. These advancements enable pioneering AI applications across industrial automation, security surveillance, healthcare diagnostics, and other critical domains, ultimately fostering next-generation human-centric intelligent interfaces.
 
Network Content Analysis and Security
 

The research in Internet Content Analysis and Security aims to forensic analysis of the authenticity, integrity, and originality of multimedia content using advanced technologies such as intelligent statistical learning and pattern recognition. Our focus is on theoretical and applied research in the fields of multimedia content security, false information detection, and artificial intelligence security. The research primarily encompasses three areas: (1) controllable generation and alignment of multimedia content, (2) identification and tracing of misinformation and disinformation, and (3) artificial intelligence system security. The related outcomes have been successfully applied to practical needs in information security, including multimedia intelligent forgery and biometric big data privacy protection. We have published over 110 research papers and have received numerous accolades, including the first prize of the Beijing Science and Technology Award, the first prize of the National Big Data and Computational Intelligence Challenge, the first prize of the Wu Wenjun Artificial Intelligence Science and Technology Award for Technical Invention, the first prize in the Deep Synthesis Technology Application category of the Second Broadcasting, Television, and Audiovisual Artificial Intelligence Applied Innovation Competition (MediaAIAC), the first prize of the Invention and Entrepreneurship Award of the China Association of Inventions, and the second prize of the CSIG Technical Invention Award, among other scientific research awards and honors.
 

 
Copyright © Editorial Board of New Laboratory of Pattern Recognition