Ubiquitous Embodied Intelligence
August 22 (Friday) 3:30 PM-5:30 PM
Location: Guangzhou Hall, 3rd Floor

Guest Profile
Liu Yong Zhejiang University	Introduction: Professor at the School of Control Science and Engineering, Zhejiang University; Director of the Center for Intelligent Driving and Future Transportation, Zhejiang University-Tianshu Zhixin Joint R&D Center for Advanced Intelligent Computing; Deputy Director of the Center for Advanced Intelligent Systems, Zhejiang University; Member of the Party Committee of the School of Control Science and Engineering, Zhejiang University; and Zhejiang Province Expert on Robot Replacement. He has received the First Prize in Natural Science of Zhejiang Province, the First Prize in Science and Technology of Zhejiang Province, the First Prize in Science and Technology Progress of Zhejiang Province, the First Prize in Intellectual Property Patent Award of Zhejiang Province, the Second Prize in Natural Science Academic Award of Zhejiang Province, and the Zhejiang Province Outstanding Young Scientist Fund Project. He has also been selected for the Thousand Talents Program of the Organization Department of the CPC Central Committee, the Young Scientist with Outstanding Contributions of Zhejiang Province, the 2022 Hangzhou Qianjiang Distinguished Expert, and the Zhejiang Province 151 Talent Project. His main research interests include autonomous robots and intelligent systems, autonomous robot planning and navigation control, visual recognition and pattern recognition, SLAM technology, and multi-sensor fusion technology. Report Title: From Artificial Intelligence to Embodied Intelligence Report Introduction: Embodied intelligence has potential applications in multiple fields, including industrial manufacturing, autonomous driving, logistics and transportation, home services, and healthcare. This report will briefly outline the historical journey from the AI boom to embodied intelligence, and briefly introduce the key elements and implementation paths of embodied intelligence.
Liu Si Beihang University	Introduction: Vice Dean of the School of Artificial Intelligence at Beihang University, Professor, Doctoral Supervisor, and Director and Deputy Secretary-General of the Chinese Society of Image and Graphics. He has won the Second Prize of the National Science and Technology Progress Award and the First Prize of the Natural Science Award of the Chinese Society of Image and Graphics. He is a recipient of the National Natural Science Foundation of China Outstanding Young Scholars Fund. He leads several projects, including key projects supported by the Enterprise Innovation and Development Joint Fund, and serves as the project leader for the Science and Technology Innovation 2030 major project. His research focuses on embodied intelligence and multimodal content understanding. He has published over 100 CCF-A papers and received 17,000 citations on Google Scholar. Report Title: UAV Visual-Language Navigation in the Era of Large Models Report Introduction: In recent years, the continuous exploration of large language models has opened up new avenues for enhancing the interpretability and controllability of embodied intelligence decision-making. However, planners based on large models still face significant challenges, including high resource consumption and extremely long inference times, which pose a significant obstacle to practical deployment. In response to these challenges, this report introduces an asynchronous large-model-enhanced closed-loop decision-making framework that decouples the inference process of large and small model planners. This work was validated in both UAV visual-language navigation and autonomous driving scenarios, achieving excellent navigation performance.
Zhao Hengshuang The University of Hong Kong	Introduction: Assistant Professor of Computer Science at the University of Hong Kong and a recipient of the National Excellent Youth Fund. Previously, he held postdoctoral research positions at MIT and the University of Oxford. His research interests span a wide range of areas, including computer vision, machine learning, and artificial intelligence, with a particular focus on building intelligent visual systems. He has published over 100 papers in top conferences and journals, including CVPR, NeurIPS, and TPAMI. His research has been cited approximately 40,000 times, with a single first-author paper citing over 17,000 times and five first-author papers citing over 1,000 times. He has won numerous international academic competitions, the World Artificial Intelligence Conference Rising Star Award and the Young Outstanding Paper Award, and the CVPR Best Demonstration Honorable Mention. He was recognized as one of the most influential scholars in computer vision by AI 2000 and ranked in the top 2% worldwide by Stanford University for lifetime impact. Report Title: Vision Foundation Models with Spatial Intelligence Report Introduction: With the increasing capabilities of deep learning models and the efficient acquisition and utilization of massive data, the construction of large-scale vision foundation models has attracted widespread attention. These vision foundation models have demonstrated strong generalization capabilities in handling complex visual scene tasks across multiple domains. However, they typically focus on understanding images and videos, while neglecting the ability to understand high-dimensional visual scenes with critical spatial properties. To address these limitations, we focus on developing spatially intelligent vision foundation models in higher dimensions, such as 2.5D and 3D. This report will introduce our recent research findings on endowing vision-based models with spatial intelligence, as well as their applications in a number of important downstream scenarios such as autonomous driving and embodied intelligence. It will also explore some of the challenges and future frontiers facing vision-based models.
Sun Zhuo Northwestern Polytechnical University	Introduction: Professor and doctoral supervisor at the School of Computer Science, Northwestern Polytechnical University, and a National Young Talent. He received his Bachelor of Engineering, Master of Engineering, and Doctor of Philosophy from Northwestern Polytechnical University and the University of New South Wales, respectively. From April 2019 to April 2021, he was a postdoctoral researcher at the Australian National University. His research focuses on multi-agent communication networks and task-driven semantic communication. He has presided over or participated in three national-level projects, including the National Natural Science Foundation of China Youth Project. He has published over 30 papers at international academic journals and conferences, including IEEE JSAC, TMC, Communication Magazine, and Ubicomp. He received the 2022 EAI Pervasive Health Best Paper Award and the 2024 Acta Aeronautica Sinica Most Influential Article Award. Report Title: A Preliminary Exploration of Multi-Agent Networks for Perception, Communication, and Decision-Making Collaboration Report Introduction: With the emergence of more and more embodied agents, such as drones and robot dogs, multiple embodied agents are connected and collaborated through networks to complete tasks such as exploring unknown environments and collaborative navigation. During this process, multiple embodied agents iteratively execute a dynamically coupled temporal process of perception, communication, and decision-making until the task is completed. Furthermore, unlike existing cellular networks, multi-embodied intelligent networks are designed for task completion performance and possess the ability to make autonomous decisions and evolve communication behaviors. Therefore, designing multi-embodied intelligent networks that leverage sensory, communication, and policy coordination becomes a key technology for achieving multi-embodied intelligent collaboration. To this end, this report will propose the composition framework, characteristics, and preliminary research of such multi-embodied intelligent networks.
Chen Longbiao Xiamen University	Introduction: Associate Professor and Doctoral Supervisor at the School of Information Science and Engineering, Xiamen University. He is a National Overseas High-Level Talent and a National Top Young Talent of Xiamen University. His research focuses on swarm intelligence, embodied intelligence, spatial computing, and ubiquitous computing. He has been nominated for the UbiComp Best Paper Award at the CCF-A conference as the first author twice in a row (a first for a domestic university) and was awarded the 2022 ACM SIGSPATIAL China Chapter Rising Star Award (the only one in the country this year). He has published over 70 papers, including 14 Class A papers (first author and corresponding author) from the China Computer Federation. He has been granted over 20 invention patents and has presided over six projects, including the National Natural Science Foundation of China (NSFC) General Project, the National Defense Basic Research Project, and the National High-Resolution Remote Sensing Project. He serves as a program committee member for the top conference. Report Title: Crowdverse: A Large-Model-Driven Training Ground for Swarm Embodied Intelligence Report Introduction: With the deep integration of artificial intelligence and robotics, embodied agents have transcended the limitations of single-task execution and demonstrated complex system capabilities, including physical space perception, multimodal collaborative decision-making, and swarm self-organization and evolution. With the surging demand for the application of swarm embodied intelligence technology in multiple scenarios, a low-cost, unified platform that can integrate various applications and algorithms is urgently needed. The Crowdverse training ground, researched by the presenters, aims to explore the deep integration of large models and swarm embodied intelligence, building a systematic platform for simulating, training, and optimizing multi-agent collaboration. Crowdverse encompasses high-fidelity scenario construction and swarm intelligence updates, intelligent task understanding and dynamic scheduling, and efficient multi-agent collaborative communication and adaptive optimization. By projecting a real-world swarm intelligence environment into a virtual swarm intelligence simulation, leveraging large models for perception, decision-making, and simulated scheduling, it mobilizes intelligent devices in the scene to execute swarm intelligence tasks, and uses simulation results to feed back into reality, achieving intelligent governance of open real-world systems.

Guest Profile

Liu Yong

Zhejiang University

Introduction:
Professor at the School of Control Science and Engineering, Zhejiang University; Director of the Center for Intelligent Driving and Future Transportation, Zhejiang University-Tianshu Zhixin Joint R&D Center for Advanced Intelligent Computing; Deputy Director of the Center for Advanced Intelligent Systems, Zhejiang University; Member of the Party Committee of the School of Control Science and Engineering, Zhejiang University; and Zhejiang Province Expert on Robot Replacement. He has received the First Prize in Natural Science of Zhejiang Province, the First Prize in Science and Technology of Zhejiang Province, the First Prize in Science and Technology Progress of Zhejiang Province, the First Prize in Intellectual Property Patent Award of Zhejiang Province, the Second Prize in Natural Science Academic Award of Zhejiang Province, and the Zhejiang Province Outstanding Young Scientist Fund Project. He has also been selected for the Thousand Talents Program of the Organization Department of the CPC Central Committee, the Young Scientist with Outstanding Contributions of Zhejiang Province, the 2022 Hangzhou Qianjiang Distinguished Expert, and the Zhejiang Province 151 Talent Project. His main research interests include autonomous robots and intelligent systems, autonomous robot planning and navigation control, visual recognition and pattern recognition, SLAM technology, and multi-sensor fusion technology.

Report Title: From Artificial Intelligence to Embodied Intelligence

Report Introduction:

Embodied intelligence has potential applications in multiple fields, including industrial manufacturing, autonomous driving, logistics and transportation, home services, and healthcare. This report will briefly outline the historical journey from the AI boom to embodied intelligence, and briefly introduce the key elements and implementation paths of embodied intelligence.

Liu Si

Beihang University

Introduction:
Vice Dean of the School of Artificial Intelligence at Beihang University, Professor, Doctoral Supervisor, and Director and Deputy Secretary-General of the Chinese Society of Image and Graphics. He has won the Second Prize of the National Science and Technology Progress Award and the First Prize of the Natural Science Award of the Chinese Society of Image and Graphics. He is a recipient of the National Natural Science Foundation of China Outstanding Young Scholars Fund. He leads several projects, including key projects supported by the Enterprise Innovation and Development Joint Fund, and serves as the project leader for the Science and Technology Innovation 2030 major project. His research focuses on embodied intelligence and multimodal content understanding. He has published over 100 CCF-A papers and received 17,000 citations on Google Scholar.

Report Title: UAV Visual-Language Navigation in the Era of Large Models

Report Introduction:

In recent years, the continuous exploration of large language models has opened up new avenues for enhancing the interpretability and controllability of embodied intelligence decision-making. However, planners based on large models still face significant challenges, including high resource consumption and extremely long inference times, which pose a significant obstacle to practical deployment. In response to these challenges, this report introduces an asynchronous large-model-enhanced closed-loop decision-making framework that decouples the inference process of large and small model planners. This work was validated in both UAV visual-language navigation and autonomous driving scenarios, achieving excellent navigation performance.

Zhao Hengshuang

The University of Hong Kong

Introduction:
Assistant Professor of Computer Science at the University of Hong Kong and a recipient of the National Excellent Youth Fund. Previously, he held postdoctoral research positions at MIT and the University of Oxford. His research interests span a wide range of areas, including computer vision, machine learning, and artificial intelligence, with a particular focus on building intelligent visual systems. He has published over 100 papers in top conferences and journals, including CVPR, NeurIPS, and TPAMI. His research has been cited approximately 40,000 times, with a single first-author paper citing over 17,000 times and five first-author papers citing over 1,000 times. He has won numerous international academic competitions, the World Artificial Intelligence Conference Rising Star Award and the Young Outstanding Paper Award, and the CVPR Best Demonstration Honorable Mention. He was recognized as one of the most influential scholars in computer vision by AI 2000 and ranked in the top 2% worldwide by Stanford University for lifetime impact.

Report Title: Vision Foundation Models with Spatial Intelligence

Report Introduction:

With the increasing capabilities of deep learning models and the efficient acquisition and utilization of massive data, the construction of large-scale vision foundation models has attracted widespread attention. These vision foundation models have demonstrated strong generalization capabilities in handling complex visual scene tasks across multiple domains. However, they typically focus on understanding images and videos, while neglecting the ability to understand high-dimensional visual scenes with critical spatial properties. To address these limitations, we focus on developing spatially intelligent vision foundation models in higher dimensions, such as 2.5D and 3D. This report will introduce our recent research findings on endowing vision-based models with spatial intelligence, as well as their applications in a number of important downstream scenarios such as autonomous driving and embodied intelligence. It will also explore some of the challenges and future frontiers facing vision-based models.

Sun Zhuo

Northwestern Polytechnical University

Introduction:
Professor and doctoral supervisor at the School of Computer Science, Northwestern Polytechnical University, and a National Young Talent. He received his Bachelor of Engineering, Master of Engineering, and Doctor of Philosophy from Northwestern Polytechnical University and the University of New South Wales, respectively. From April 2019 to April 2021, he was a postdoctoral researcher at the Australian National University. His research focuses on multi-agent communication networks and task-driven semantic communication. He has presided over or participated in three national-level projects, including the National Natural Science Foundation of China Youth Project. He has published over 30 papers at international academic journals and conferences, including IEEE JSAC, TMC, Communication Magazine, and Ubicomp. He received the 2022 EAI Pervasive Health Best Paper Award and the 2024 Acta Aeronautica Sinica Most Influential Article Award.

Report Title: A Preliminary Exploration of Multi-Agent Networks for Perception, Communication, and Decision-Making Collaboration

Report Introduction:

With the emergence of more and more embodied agents, such as drones and robot dogs, multiple embodied agents are connected and collaborated through networks to complete tasks such as exploring unknown environments and collaborative navigation. During this process, multiple embodied agents iteratively execute a dynamically coupled temporal process of perception, communication, and decision-making until the task is completed. Furthermore, unlike existing cellular networks, multi-embodied intelligent networks are designed for task completion performance and possess the ability to make autonomous decisions and evolve communication behaviors. Therefore, designing multi-embodied intelligent networks that leverage sensory, communication, and policy coordination becomes a key technology for achieving multi-embodied intelligent collaboration. To this end, this report will propose the composition framework, characteristics, and preliminary research of such multi-embodied intelligent networks.

Chen Longbiao

Xiamen University

Introduction:
Associate Professor and Doctoral Supervisor at the School of Information Science and Engineering, Xiamen University. He is a National Overseas High-Level Talent and a National Top Young Talent of Xiamen University. His research focuses on swarm intelligence, embodied intelligence, spatial computing, and ubiquitous computing. He has been nominated for the UbiComp Best Paper Award at the CCF-A conference as the first author twice in a row (a first for a domestic university) and was awarded the 2022 ACM SIGSPATIAL China Chapter Rising Star Award (the only one in the country this year). He has published over 70 papers, including 14 Class A papers (first author and corresponding author) from the China Computer Federation. He has been granted over 20 invention patents and has presided over six projects, including the National Natural Science Foundation of China (NSFC) General Project, the National Defense Basic Research Project, and the National High-Resolution Remote Sensing Project. He serves as a program committee member for the top conference.

Report Title: Crowdverse: A Large-Model-Driven Training Ground for Swarm Embodied Intelligence

Report Introduction:

With the deep integration of artificial intelligence and robotics, embodied agents have transcended the limitations of single-task execution and demonstrated complex system capabilities, including physical space perception, multimodal collaborative decision-making, and swarm self-organization and evolution. With the surging demand for the application of swarm embodied intelligence technology in multiple scenarios, a low-cost, unified platform that can integrate various applications and algorithms is urgently needed. The Crowdverse training ground, researched by the presenters, aims to explore the deep integration of large models and swarm embodied intelligence, building a systematic platform for simulating, training, and optimizing multi-agent collaboration. Crowdverse encompasses high-fidelity scenario construction and swarm intelligence updates, intelligent task understanding and dynamic scheduling, and efficient multi-agent collaborative communication and adaptive optimization. By projecting a real-world swarm intelligence environment into a virtual swarm intelligence simulation, leveraging large models for perception, decision-making, and simulated scheduling, it mobilizes intelligent devices in the scene to execute swarm intelligence tasks, and uses simulation results to feed back into reality, achieving intelligent governance of open real-world systems.

Ubiquitous Embodied IntelligenceAugust 22 (Friday) 3:30 PM-5:30 PMLocation: Guangzhou Hall, 3rd Floor

Ubiquitous Embodied Intelligence
August 22 (Friday) 3:30 PM-5:30 PM
Location: Guangzhou Hall, 3rd Floor