Machine Embodied Interactive Intelligence Forum
August 22 (Friday) 9:00-11:30
Location: VIP Room, 3rd Floor

Guest Profiles
Zhuang Yan Dalian University of Technology	Introduction: Zhuang Yan, Professor and Doctoral Supervisor, Senior Member of IEEE and CCF, Member of the Academic Committee of Dalian University of Technology, Director of the Academic Committee of the School of Control Science and Engineering, Director of the Liaoning Industrial Equipment Distributed Control Technology Innovation Center, Standing Committee Member of the Robotics Committee of the Chinese Automation Society, Vice President of the Liaoning Artificial Intelligence Society, and selected as a Leading Talent in the Xingliao Elite Program. He has long been dedicated to research in autonomous perception, modeling, scene understanding, and autonomous environmental adaptation for intelligent robots and unmanned systems in complex scenarios. He has presided over and collaborated on over 10 key/general projects of the National Natural Science Foundation of China, key R&D programs, and provincial and ministerial-level scientific and technological projects. He has published over 100 papers, has been granted 36 national invention patents, and has led a team in the development of two international standards. He has been invited to give keynote speeches at major academic conferences at home and abroad, including IEEE CYBER 2018, the TAOYAKA International Forum in Hiroshima, Japan, and the National SLAM Technology Forum. He currently serves on the editorial boards of international journals such as IEEE Transactions on Instrumentation and Measurement. He has trained over 80 PhD and Master's graduates and 6 postdoctoral researchers. Report Title: Full-Condition Perception and Autonomous Environmental Adaptation of Mobile Robots Report Introduction: Mobile robots operating in large outdoor environments are subject to a variety of factors, including lighting variations, dynamic interference, and unstructured scenes. These factors pose challenges to their autonomous environmental modeling, understanding, and adaptation capabilities. As technology continues to mature and develop, enabling different types of mobile robots to perform autonomous environmental modeling and scene understanding during long-term operation, while also adapting to complex and changing environmental conditions, has become a research hotspot in the field of mobile robotics. This report focuses on mobile robot behavioral capabilities, including multi-sensory data acquisition and fusion, large-scale map construction and maintenance, and autonomous scene understanding. It also introduces the research progress achieved by the presenter's laboratory in large-scale environmental modeling and autonomous scene understanding, and provides a preliminary discussion of research priorities in this field, particularly the development trends in air-ground cross-domain collaboration.
Fu Ying Beijing Institute of Technology	Introduction: Fu Ying is a professor and doctoral supervisor at Beijing Institute of Technology. In 2017, she was selected for the Overseas High-level Young Talent Introduction Program. In the past five years, she has published over 50 papers as the first or corresponding author in journals in the first tier of the Chinese Academy of Sciences and in Category A conferences recommended by the Chinese Academy of Computing Machinery. She has also been granted over 30 invention patents. This paper received the ICML Outstanding Paper Award (2/4990) and the PRCV Best Paper Award (1/412) at the Computer Society Recommended A-Class International Conference. He primarily conducts research in computer vision and computational imaging, leading over ten national-level projects and programs, including key projects of the National Natural Science Foundation of China. He has received the First Prize of the Science and Technology Progress Award of the Chinese Society for Command and Control, the First Prize of the Teaching Achievement Award Program of the Chinese Society of Artificial Intelligence, the Shi Qingyun Female Scientist Award of the Chinese Society of Image and Graphics, and the Young Scientist Award of the Chinese Institute of Electronics. Report Title: Visual Representation Learning and Scene Understanding Based on Frequency Domain Analysis Report Introduction: Scene understanding is the core of embodied intelligence, but existing models still face challenges in feature representation and adaptability to complex environments. This report focuses on how to systematically improve the representation learning and scene understanding capabilities of visual models through frequency domain analysis. Starting from a frequency domain perspective, we reshape key neural network operators such as convolution, feature fusion, and attention mechanisms to optimize the representation balance between high-frequency details and low-frequency context. This approach not only deeply diagnoses and addresses the frequency representation issues common in deep learning models, but also significantly improves the model's performance and robustness in common scene understanding tasks and challenging environments such as low light. This work provides a theoretical framework and practical solutions for building more accurate and robust embodied intelligent perception systems.
Jin Yi Beijing Jiaotong University	Introduction: Jin Yi is a professor and doctoral supervisor at the School of Computer Science at Beijing Jiaotong University, and a distinguished member of CCF. He served as Vice Chairman of CCF YOCSEF (2025-26) and an executive member of the Multimedia and Big Data Committees. He is also a member of the expert group for key projects under the National Key R&D Program. His research focuses on multimodal data fusion perception, traffic video semantic understanding, trusted behavior analysis, and multimedia privacy protection. He has presided over over 10 national and provincial/ministerial projects, published over 70 academic papers, including five highly cited ESI papers, in leading journals such as IEEE/ACM Transactions and CCF Category A conferences such as CVPR, AAAI, ICCV, ICAI, and ACM MM. He has been granted 37 national invention patents and participated in the compilation of three national and industry standards. He has received three international paper awards, including a nomination for the IEEE Computer Society's Annual Best Paper Award, a Silver Medal at the 50th Geneva International Innovation and Invention Awards, a Second Prize in the 2023 China Industry-University-Research Cooperation Innovation and Promotion Award, a Second Prize in the 2024 Science and Technology Progress Award from the China Communications Society, and a Second Prize in the Science and Technology Progress Award from the China Railway Society. He was selected as an Outstanding Young Talent of the 2023 Beijing Rail Transit Society. Report Title: Modal Perception and Embodied Interaction in Complex Traffic Scenarios: Technologies and Challenges Report Introduction: This report focuses on the critical role of multimodal perception and embodied interaction intelligence in intelligent transportation systems, exploring how to integrate heterogeneous sensory data such as vision and LiDAR to enhance intelligent agents' ability to understand and respond to complex traffic environments. We propose a multimodal collaborative perception and cognition mechanism for complex traffic scenarios, aiming to address the semantic gap between heterogeneous modalities and enhance intelligent agents' ability to perceive and model key objects, environmental structures, and dynamic relationships. This mechanism lays the core technical foundation for building efficient and robust embodied intelligent agents. Drawing on the team's recent research achievements in three-dimensional perception, cross-modal alignment, and fusion, the report systematically outlines the technical challenges and future trends of embodied intelligence in smart transportation, aiming to provide key support and inspiration for building a safer, more efficient, and autonomous next-generation transportation system.
Li Xiaohui China Automotive Engineering Research Institute Co., Ltd.	Introduction: Li Xiaohui holds a PhD and is a senior engineer and expert at the China Automotive Engineering Research Institute Co., Ltd. He has long been engaged in research on autonomous driving and vehicle-road-cloud integration. He has led the development of products such as vehicle-road collaborative driving systems, automated road inspection systems with floating vehicles, intelligent roadside perception systems, automated roadside sensor deployment software, a comprehensive vehicle networking application platform, and a vehicle-city testing and certification platform. He has presided over or participated in numerous national and provincial/ministerial projects on the construction and application of connected vehicles (IoV) and related technologies and equipment, resulting in over 20 intellectual property rights, including papers and patents. He received second prizes in the China Automotive Industry and China Intelligent Transportation Association Science and Technology Progress Awards in 2020 and 2023, respectively. Report Title: High-Quality Digital Scene Construction and Generalization Technology for Intelligent Vehicles Based on Collaborative Vehicle-Infrastructure Perception Report Introduction: The coordinated development of intelligent vehicles and smart cities is driving a profound transformation of the modern transportation system towards digitalization, intelligence, and connectivity. Data has become the core driving force for technological innovation and industrial development in the vehicle and transportation sector. Vehicle-road-cloud integration leverages holographic digital scenarios constructed through fused vehicle-road perception to systematically explore and generalize edge cases that address the long-tail problem of autonomous driving. This allows for the establishment of a data-driven, closed-loop development and verification system, supporting rapid iteration and robustness verification of vehicle-traffic systems and helping the industry overcome the challenges of 10 billion kilometers of testing. This report focuses on the digital construction of holographic traffic scenarios using multi-source fused roadside perception, high-precision dynamic scenario modeling methods for fused vehicle-road perception, and multi-dimensional scenario generalization and convergence technologies. It also briefly discusses the achievements already made and their impact on industry development.
Ma Nan Beijing University of Technology	Introduction: Ma Nan is a professor and doctoral supervisor at Beijing University of Technology. He is a recipient of the National Key Talent Program, a Young Beijing Scholar (in the field of artificial intelligence), a project leader of the National Key R&D Program, a project leader of the Beijing Municipal Intelligent Manufacturing and Robotics Technology Innovation Program, and a director of a national first-class undergraduate program. He serves as Vice Dean of the School of Information Science and Technology at Beijing University of Technology, Deputy Director of the Engineering Research Center for Intelligent Perception and Autonomous Control of the Ministry of Education, a Council Member and Deputy Secretary-General of the China Artificial Intelligence Society, and a Distinguished Member of the China Federation of Industrial Automation. His research focuses on interactive cognition, embodied intelligence, autonomous driving, and intelligent robotics. As the lead author, he has received the First Prize of the China Society of Image and Graphics Scientific and Technological Progress Award and the Second Prize of the China Electronics Society Science and Technology Award (Technical Invention Category). He has led numerous national, provincial, and ministerial projects, including over ten autonomous vehicle and service robot intelligent interaction projects commissioned by companies such as BAIC Group, Dongfeng Yuexiang, and Yunji Technology. He has led his team to numerous championships in artificial intelligence and autonomous driving competitions both domestically and internationally. His team's achievement, "Unmanned Driving Cloud Intelligent Interaction System," won the Grand Prize in the finals of the Second China "AI+" Innovation and Entrepreneurship Competition. He has published over 90 papers in domestic and international academic journals and conferences, including IEEE TRO, TIP, TNNLS, TMM, PR, Science China Information Science, ICRA, and ACMM. Regarding talent development, he developed a national first-class undergraduate course, "Intelligent Interaction Technology," which has been offered 12 times through the China University MOOC. He has edited five monographs and textbooks, including "Intelligent Interaction Technology and Applications," which was selected as a national key publication planning textbook for the 13th Five-Year Plan and a series of higher education textbooks for strategic emerging fields for the 14th Five-Year Plan. He has also won second prize at the 6th National Education Science Research Outstanding Achievement Award and first prize at the Beijing Teaching Achievement Award. Report Title: Artificial Intelligence Empowers Mobile Agricultural Robots Report Introduction: Agricultural production urgently needs intelligent upgrades to meet the strategic needs of the country's modern agricultural development. Artificial intelligence, particularly cutting-edge technologies such as computer vision and embodied intelligence, is providing strong support for transforming agricultural production methods and unleashing new productivity. Addressing the perception and interaction challenges in greenhouse tomato harvesting, we have developed an intelligent harvesting robot system based on a top-down fusion network. This system integrates crop phenotypic perception, key point detection, and multimodal human-machine interaction, achieving high-precision, low-damage autonomous harvesting and collaborative operations. Practical applications have demonstrated that this system can significantly improve harvesting efficiency and operational intelligence. This report will focus on the intersection of AI and agriculture, highlighting the key technological breakthroughs of the intelligent interactive tomato harvesting robot and its industrial application in practical greenhouse scenarios. It will also explore the supporting role and development prospects of intelligent agricultural machinery and equipment in agricultural modernization.

Guest Profiles

Zhuang Yan

Dalian University of Technology

Introduction:
Zhuang Yan, Professor and Doctoral Supervisor, Senior Member of IEEE and CCF, Member of the Academic Committee of Dalian University of Technology, Director of the Academic Committee of the School of Control Science and Engineering, Director of the Liaoning Industrial Equipment Distributed Control Technology Innovation Center, Standing Committee Member of the Robotics Committee of the Chinese Automation Society, Vice President of the Liaoning Artificial Intelligence Society, and selected as a Leading Talent in the Xingliao Elite Program. He has long been dedicated to research in autonomous perception, modeling, scene understanding, and autonomous environmental adaptation for intelligent robots and unmanned systems in complex scenarios. He has presided over and collaborated on over 10 key/general projects of the National Natural Science Foundation of China, key R&D programs, and provincial and ministerial-level scientific and technological projects. He has published over 100 papers, has been granted 36 national invention patents, and has led a team in the development of two international standards. He has been invited to give keynote speeches at major academic conferences at home and abroad, including IEEE CYBER 2018, the TAOYAKA International Forum in Hiroshima, Japan, and the National SLAM Technology Forum. He currently serves on the editorial boards of international journals such as IEEE Transactions on Instrumentation and Measurement. He has trained over 80 PhD and Master's graduates and 6 postdoctoral researchers.

Report Title: Full-Condition Perception and Autonomous Environmental Adaptation of Mobile Robots

Report Introduction:

Mobile robots operating in large outdoor environments are subject to a variety of factors, including lighting variations, dynamic interference, and unstructured scenes. These factors pose challenges to their autonomous environmental modeling, understanding, and adaptation capabilities. As technology continues to mature and develop, enabling different types of mobile robots to perform autonomous environmental modeling and scene understanding during long-term operation, while also adapting to complex and changing environmental conditions, has become a research hotspot in the field of mobile robotics. This report focuses on mobile robot behavioral capabilities, including multi-sensory data acquisition and fusion, large-scale map construction and maintenance, and autonomous scene understanding. It also introduces the research progress achieved by the presenter's laboratory in large-scale environmental modeling and autonomous scene understanding, and provides a preliminary discussion of research priorities in this field, particularly the development trends in air-ground cross-domain collaboration.

Fu Ying

Beijing Institute of Technology

Introduction:
Fu Ying is a professor and doctoral supervisor at Beijing Institute of Technology. In 2017, she was selected for the Overseas High-level Young Talent Introduction Program. In the past five years, she has published over 50 papers as the first or corresponding author in journals in the first tier of the Chinese Academy of Sciences and in Category A conferences recommended by the Chinese Academy of Computing Machinery. She has also been granted over 30 invention patents. This paper received the ICML Outstanding Paper Award (2/4990) and the PRCV Best Paper Award (1/412) at the Computer Society Recommended A-Class International Conference. He primarily conducts research in computer vision and computational imaging, leading over ten national-level projects and programs, including key projects of the National Natural Science Foundation of China. He has received the First Prize of the Science and Technology Progress Award of the Chinese Society for Command and Control, the First Prize of the Teaching Achievement Award Program of the Chinese Society of Artificial Intelligence, the Shi Qingyun Female Scientist Award of the Chinese Society of Image and Graphics, and the Young Scientist Award of the Chinese Institute of Electronics.

Report Title: Visual Representation Learning and Scene Understanding Based on Frequency Domain Analysis

Report Introduction:

Scene understanding is the core of embodied intelligence, but existing models still face challenges in feature representation and adaptability to complex environments. This report focuses on how to systematically improve the representation learning and scene understanding capabilities of visual models through frequency domain analysis. Starting from a frequency domain perspective, we reshape key neural network operators such as convolution, feature fusion, and attention mechanisms to optimize the representation balance between high-frequency details and low-frequency context. This approach not only deeply diagnoses and addresses the frequency representation issues common in deep learning models, but also significantly improves the model's performance and robustness in common scene understanding tasks and challenging environments such as low light. This work provides a theoretical framework and practical solutions for building more accurate and robust embodied intelligent perception systems.

Jin Yi

Beijing Jiaotong University

Introduction:
Jin Yi is a professor and doctoral supervisor at the School of Computer Science at Beijing Jiaotong University, and a distinguished member of CCF. He served as Vice Chairman of CCF YOCSEF (2025-26) and an executive member of the Multimedia and Big Data Committees. He is also a member of the expert group for key projects under the National Key R&D Program. His research focuses on multimodal data fusion perception, traffic video semantic understanding, trusted behavior analysis, and multimedia privacy protection. He has presided over over 10 national and provincial/ministerial projects, published over 70 academic papers, including five highly cited ESI papers, in leading journals such as IEEE/ACM Transactions and CCF Category A conferences such as CVPR, AAAI, ICCV, ICAI, and ACM MM. He has been granted 37 national invention patents and participated in the compilation of three national and industry standards. He has received three international paper awards, including a nomination for the IEEE Computer Society's Annual Best Paper Award, a Silver Medal at the 50th Geneva International Innovation and Invention Awards, a Second Prize in the 2023 China Industry-University-Research Cooperation Innovation and Promotion Award, a Second Prize in the 2024 Science and Technology Progress Award from the China Communications Society, and a Second Prize in the Science and Technology Progress Award from the China Railway Society. He was selected as an Outstanding Young Talent of the 2023 Beijing Rail Transit Society.

Report Title: Modal Perception and Embodied Interaction in Complex Traffic Scenarios: Technologies and Challenges

Report Introduction:

This report focuses on the critical role of multimodal perception and embodied interaction intelligence in intelligent transportation systems, exploring how to integrate heterogeneous sensory data such as vision and LiDAR to enhance intelligent agents' ability to understand and respond to complex traffic environments. We propose a multimodal collaborative perception and cognition mechanism for complex traffic scenarios, aiming to address the semantic gap between heterogeneous modalities and enhance intelligent agents' ability to perceive and model key objects, environmental structures, and dynamic relationships. This mechanism lays the core technical foundation for building efficient and robust embodied intelligent agents. Drawing on the team's recent research achievements in three-dimensional perception, cross-modal alignment, and fusion, the report systematically outlines the technical challenges and future trends of embodied intelligence in smart transportation, aiming to provide key support and inspiration for building a safer, more efficient, and autonomous next-generation transportation system.

Li Xiaohui

China Automotive Engineering Research Institute Co., Ltd.

Introduction:
Li Xiaohui holds a PhD and is a senior engineer and expert at the China Automotive Engineering Research Institute Co., Ltd. He has long been engaged in research on autonomous driving and vehicle-road-cloud integration. He has led the development of products such as vehicle-road collaborative driving systems, automated road inspection systems with floating vehicles, intelligent roadside perception systems, automated roadside sensor deployment software, a comprehensive vehicle networking application platform, and a vehicle-city testing and certification platform. He has presided over or participated in numerous national and provincial/ministerial projects on the construction and application of connected vehicles (IoV) and related technologies and equipment, resulting in over 20 intellectual property rights, including papers and patents. He received second prizes in the China Automotive Industry and China Intelligent Transportation Association Science and Technology Progress Awards in 2020 and 2023, respectively.

Report Title: High-Quality Digital Scene Construction and Generalization Technology for Intelligent Vehicles Based on Collaborative Vehicle-Infrastructure Perception

Report Introduction:

The coordinated development of intelligent vehicles and smart cities is driving a profound transformation of the modern transportation system towards digitalization, intelligence, and connectivity. Data has become the core driving force for technological innovation and industrial development in the vehicle and transportation sector. Vehicle-road-cloud integration leverages holographic digital scenarios constructed through fused vehicle-road perception to systematically explore and generalize edge cases that address the long-tail problem of autonomous driving. This allows for the establishment of a data-driven, closed-loop development and verification system, supporting rapid iteration and robustness verification of vehicle-traffic systems and helping the industry overcome the challenges of 10 billion kilometers of testing. This report focuses on the digital construction of holographic traffic scenarios using multi-source fused roadside perception, high-precision dynamic scenario modeling methods for fused vehicle-road perception, and multi-dimensional scenario generalization and convergence technologies. It also briefly discusses the achievements already made and their impact on industry development.

Ma Nan

Beijing University of Technology

Introduction:
Ma Nan is a professor and doctoral supervisor at Beijing University of Technology. He is a recipient of the National Key Talent Program, a Young Beijing Scholar (in the field of artificial intelligence), a project leader of the National Key R&D Program, a project leader of the Beijing Municipal Intelligent Manufacturing and Robotics Technology Innovation Program, and a director of a national first-class undergraduate program. He serves as Vice Dean of the School of Information Science and Technology at Beijing University of Technology, Deputy Director of the Engineering Research Center for Intelligent Perception and Autonomous Control of the Ministry of Education, a Council Member and Deputy Secretary-General of the China Artificial Intelligence Society, and a Distinguished Member of the China Federation of Industrial Automation. His research focuses on interactive cognition, embodied intelligence, autonomous driving, and intelligent robotics. As the lead author, he has received the First Prize of the China Society of Image and Graphics Scientific and Technological Progress Award and the Second Prize of the China Electronics Society Science and Technology Award (Technical Invention Category). He has led numerous national, provincial, and ministerial projects, including over ten autonomous vehicle and service robot intelligent interaction projects commissioned by companies such as BAIC Group, Dongfeng Yuexiang, and Yunji Technology. He has led his team to numerous championships in artificial intelligence and autonomous driving competitions both domestically and internationally. His team's achievement, "Unmanned Driving Cloud Intelligent Interaction System," won the Grand Prize in the finals of the Second China "AI+" Innovation and Entrepreneurship Competition. He has published over 90 papers in domestic and international academic journals and conferences, including IEEE TRO, TIP, TNNLS, TMM, PR, Science China Information Science, ICRA, and ACMM. Regarding talent development, he developed a national first-class undergraduate course, "Intelligent Interaction Technology," which has been offered 12 times through the China University MOOC. He has edited five monographs and textbooks, including "Intelligent Interaction Technology and Applications," which was selected as a national key publication planning textbook for the 13th Five-Year Plan and a series of higher education textbooks for strategic emerging fields for the 14th Five-Year Plan. He has also won second prize at the 6th National Education Science Research Outstanding Achievement Award and first prize at the Beijing Teaching Achievement Award.

Report Title: Artificial Intelligence Empowers Mobile Agricultural Robots

Report Introduction:

Agricultural production urgently needs intelligent upgrades to meet the strategic needs of the country's modern agricultural development. Artificial intelligence, particularly cutting-edge technologies such as computer vision and embodied intelligence, is providing strong support for transforming agricultural production methods and unleashing new productivity. Addressing the perception and interaction challenges in greenhouse tomato harvesting, we have developed an intelligent harvesting robot system based on a top-down fusion network. This system integrates crop phenotypic perception, key point detection, and multimodal human-machine interaction, achieving high-precision, low-damage autonomous harvesting and collaborative operations. Practical applications have demonstrated that this system can significantly improve harvesting efficiency and operational intelligence. This report will focus on the intersection of AI and agriculture, highlighting the key technological breakthroughs of the intelligent interactive tomato harvesting robot and its industrial application in practical greenhouse scenarios. It will also explore the supporting role and development prospects of intelligent agricultural machinery and equipment in agricultural modernization.

Machine Embodied Interactive Intelligence ForumAugust 22 (Friday) 9:00-11:30Location: VIP Room, 3rd Floor

Machine Embodied Interactive Intelligence Forum
August 22 (Friday) 9:00-11:30
Location: VIP Room, 3rd Floor