Ubiquitous Embodied Intelligence
August 22 (Friday) 3:30 PM-5:30 PM
Location: Guangzhou Hall, 3rd Floor
| Guest Profile | |
|---|---|
Liu Yong Zhejiang University |
Introduction:
Report Title: From Artificial Intelligence to Embodied Intelligence
Report Introduction: Embodied intelligence has potential applications in multiple fields, including industrial manufacturing, autonomous driving, logistics and transportation, home services, and healthcare. This report will briefly outline the historical journey from the AI boom to embodied intelligence, and briefly introduce the key elements and implementation paths of embodied intelligence. |
Liu Si Beihang University |
Introduction:
Report Title: UAV Visual-Language Navigation in the Era of Large Models
Report Introduction: In recent years, the continuous exploration of large language models has opened up new avenues for enhancing the interpretability and controllability of embodied intelligence decision-making. However, planners based on large models still face significant challenges, including high resource consumption and extremely long inference times, which pose a significant obstacle to practical deployment. In response to these challenges, this report introduces an asynchronous large-model-enhanced closed-loop decision-making framework that decouples the inference process of large and small model planners. This work was validated in both UAV visual-language navigation and autonomous driving scenarios, achieving excellent navigation performance. |
Zhao Hengshuang The University of Hong Kong |
Introduction:
Report Title: Vision Foundation Models with Spatial Intelligence
Report Introduction: With the increasing capabilities of deep learning models and the efficient acquisition and utilization of massive data, the construction of large-scale vision foundation models has attracted widespread attention. These vision foundation models have demonstrated strong generalization capabilities in handling complex visual scene tasks across multiple domains. However, they typically focus on understanding images and videos, while neglecting the ability to understand high-dimensional visual scenes with critical spatial properties. To address these limitations, we focus on developing spatially intelligent vision foundation models in higher dimensions, such as 2.5D and 3D. This report will introduce our recent research findings on endowing vision-based models with spatial intelligence, as well as their applications in a number of important downstream scenarios such as autonomous driving and embodied intelligence. It will also explore some of the challenges and future frontiers facing vision-based models. |
Sun Zhuo Northwestern Polytechnical University |
Introduction:
Report Title: A Preliminary Exploration of Multi-Agent Networks for Perception, Communication, and Decision-Making Collaboration
Report Introduction: With the emergence of more and more embodied agents, such as drones and robot dogs, multiple embodied agents are connected and collaborated through networks to complete tasks such as exploring unknown environments and collaborative navigation. During this process, multiple embodied agents iteratively execute a dynamically coupled temporal process of perception, communication, and decision-making until the task is completed. Furthermore, unlike existing cellular networks, multi-embodied intelligent networks are designed for task completion performance and possess the ability to make autonomous decisions and evolve communication behaviors. Therefore, designing multi-embodied intelligent networks that leverage sensory, communication, and policy coordination becomes a key technology for achieving multi-embodied intelligent collaboration. To this end, this report will propose the composition framework, characteristics, and preliminary research of such multi-embodied intelligent networks. |
Chen Longbiao Xiamen University |
Introduction:
Report Title: Crowdverse: A Large-Model-Driven Training Ground for Swarm Embodied Intelligence
Report Introduction: With the deep integration of artificial intelligence and robotics, embodied agents have transcended the limitations of single-task execution and demonstrated complex system capabilities, including physical space perception, multimodal collaborative decision-making, and swarm self-organization and evolution. With the surging demand for the application of swarm embodied intelligence technology in multiple scenarios, a low-cost, unified platform that can integrate various applications and algorithms is urgently needed. The Crowdverse training ground, researched by the presenters, aims to explore the deep integration of large models and swarm embodied intelligence, building a systematic platform for simulating, training, and optimizing multi-agent collaboration. Crowdverse encompasses high-fidelity scenario construction and swarm intelligence updates, intelligent task understanding and dynamic scheduling, and efficient multi-agent collaborative communication and adaptive optimization. By projecting a real-world swarm intelligence environment into a virtual swarm intelligence simulation, leveraging large models for perception, decision-making, and simulated scheduling, it mobilizes intelligent devices in the scene to execute swarm intelligence tasks, and uses simulation results to feed back into reality, achieving intelligent governance of open real-world systems. |