PCC顶会顶刊交流论文论坛

主持人简介

周斌彬

副教授

 浙大城市学院

Session 1: 多模态感知

 

简介:
       周斌彬,浙大城市学院计算学院副教授、科研处专聘副处长、“智能物联网技术与系统”浙江省国际科研合作基地副主任、CCF普适计算专委会执行委员。研究方向主要为时空深度学习、多模态数据融合、人工智能、类脑计算等,主持并参与多项国家自然科学基金及国家重点研发计划等科研项目。已在ACM、IEEE Trans等多个重要期刊和会议上发表了40余篇学术论文。

王楚豫

助理教授

 南京大学

Session 2: 无线感知

 

简介:
       王楚豫,南京大学计算机学院特聘研究员(准聘助理教授),博导,2018年10月于南京大学计算机科学与技术系获得博士学位。主要研究方向为“普适计算”与“无线感知计算”,目前在普适计算与移动计算研究领域共发表论文40余篇,包括国际一流学术期刊ACM/ IEEE JSAC、ACM/ IEEE TON、IEEE TMC,国际一流学术会议ACM Mobicom、ACM UBICOMP、IEEE INFOCOM等,并获得IEEE INFOCOM 2018最佳演讲奖(Best-in-session Presentation Award)。曾获得江苏省优秀博士论文奖、ACM中国优秀博士论文提名奖、江苏省行业领域十大科技进展等。

刘志丹

教授

 香港科技大学

Session 3: 城市、交通与网络

 

简介:
       刘志丹,香港科技大学(广州)助理教授,博士生导师,CCF普适计算专委会、CCF物联网专委会和ACM SIGSPATIAL中国分会执行委员,CCF高级会员。曾获评深圳市“孔雀计划”海外高层次人才。2014年于浙江大学获得博士学位,2015年至2017年于新加坡南洋理工大学从事博士后研究工作,2017年至2024年于深圳大学担任教职。主要研究兴趣包括普适计算、城市计算、AIoT、时空大数据挖掘与分析等,在IEEE TON、IEEE TMC、IEEE TITS、ACM MobiSys、IEEE ICDE、ACM/IEEE IPSN等国际知名期刊和会议发表学术论文四十余篇。研究成果曾获国际会议IEEE ICPADS (2020)“最佳论文奖”,部分研究成果已整理形成英文专著一部,申请国家发明专利16项(含授权7项)。受邀担任ACM KDD、ACM WSDM、IEEE ICDCS等国际会议的程序委员会委员,并担任IEEE TMC、IEEE TKDE等多个重要国际期刊的审稿人。

卢立

研究员

 浙江大学

Session 4: 声音与视觉

 

简介:
       卢立,浙江大学计算机学院-网安学院特聘研究员、博士生导师。分别于西安交通大学与上海交通大学获双学士、博士学位,期间访问美国罗格斯大学。主要研究方向包括物联网安全、智能语音安全、普适计算等,在USENIX Security、UbiComp等国际一流期刊与会议上发表50余篇论文。主持国家重点研发计划课题、国家自然科学基金等多项项目。获MobiCom 2019与2022年最佳海报展示提名奖、ACM中国SIGAPP分会新星奖等荣誉。担任CCF普适计算专委会与物联网专委会执委、浙江省网络空间安全协会专技委副秘书长。担任INFOCOM, ICDCS, IWQoS等国际会议的程序委员会委员,及UbiComp、TIFS、TMC等国际期刊会议审稿人。

Session 1 多模态感知

PCC_M001

报告报告题目:Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity Recognition

论坛讲者:缪盛欢,陈岭,胡容

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:The widespread adoption of wearable devices has led to a surge in the development of multi-device wearable human activity recognition (WHAR) systems. Nevertheless, the performance of traditional supervised learning-based methods to WHAR is limited by the challenge of collecting ample annotated wearable data. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising solution by first training a competent feature extractor on a substantial quantity of unlabeled data, followed by refining a minimal classifier with a small amount of labeled data. Despite the promise of SSL in WHAR, the majority of studies have not considered missing device scenarios in multi-device WHAR. To bridge this gap, we propose a multi-device SSL WHAR method termed Spatial-Temporal Masked Autoencoder (STMAE). STMAE captures discriminative activity representations by utilizing the asymmetrical encoder-decoder structure and two-stage spatial-temporal masking strategy, which can exploit the spatial-temporal correlations in multi-device data to improve the performance of SSL WHAR, especially on missing device scenarios. Experiments on four real-world datasets demonstrate the efficacy of STMAE in various practical scenarios.

PCC_M002

报告题目:HMGAN: A Hierarchical Multi-Modal Generative Adversarial Network Model for Wearable Human Activity Recognition

论坛讲者:陈岭,胡容,武梦晗,周鑫

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Wearable Human Activity Recognition (WHAR) is an important research field of ubiquitous and mobile computing. Deep WHAR models suffer from the overfitting problem caused by the lack of a large amount and variety of labeled data, which is usually addressed by generating data to enlarge the training set, i.e., Data Augmentation (DA). Generative Adversarial Networks (GANs) have shown their excellent data generation ability, and the generalization ability of a classification model can be improved by GAN-based DA. However, existing GANs cannot make full use of the important modality information and fail to balance modality details and global consistency, which cannot meet the requirements of deep multi-modal WHAR. In this paper, a hierarchical multi-modal GAN model (HMGAN) is proposed for WHAR. HMGAN consists of multiple modal generators, one hierarchical discriminator, and one auxiliary classifier. Multiple modal generators can learn the complex multi-modal data distributions of sensor data. Hierarchical discriminator can provide discrimination outputs for both low-level modal discrimination losses and high-level overall discrimination loss to draw a balance between modality details and global consistency. Experiments on five public WHAR datasets demonstrate that HMGAN achieves the state-of-the-art performance for WHAR, outperforming the best baseline by an average of 3.4%, 3.8%, and 3.5% in accuracy, macro F1 score, and weighted F1 score, respectively.

PCC_M003

报告题目:HeadMon: Head Dynamics Enabled Riding Maneuver Prediction

论坛讲者:Zengyi Han,Liqiang Xu,Xuefu Dong,Yuuki Nishiyama,Kaoru Sezaki

出处:2023 IEEE International Conference on Pervasive Computing and Communications (PerCom)

 

报告摘要:Although micro-mobility brings convenience to modern cities, they also cause various social problems, such as traffic accidents, casualties, and substantial economic losses. Wearing protective equipment has become the primary recommendation for safe riding. However, passive protection cannot prevent the occurrence of accidents. Thus, timely predicting the rider's maneuver is essential for active protection and providing more time to avoid potential accidents from happening. Through the qualitative study, we argue that we can use the rider's head dynamic as an information source to predict the rider's following maneuvers. We accordingly present HeadMon, a riding maneuver prediction system for safe riding. HeadMon utilizes the head dynamics of a rider by installing an inertial measurement unit on the helmet. It uses the extracted head dynamics features as the input of the deep learning architecture to achieve prediction. We implemented the HeadMon prototype on Android smartphone as a proof of concept. Through comprehensive experiments with 20 participants, the result demonstrates the excellent performance of HeadMon: not only could it achieve an overall precision of at least 85\% for maneuver prediction under a 4s prediction time gap, but it also could keep a high accuracy under a low sampling rate. The low-cost feature of HeadMon allows it to be readily deployable and towards more safety riding.

PCC_M004

报告题目:Generalizable Sleep Staging via Multi-Level Domain Alignment

论坛讲者:王跻权,赵莎,江海腾,李石坚,李涛,潘纲

出处:The Thirty-Eighth AAAI Conference on Artificial Intelligence

 

报告摘要:Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance.

PCC_M005

报告题目:DiffMDD: A Diffusion-based Deep Learning Framework for MDD Diagnosis Using EEG

论坛讲者:王宜霖,赵莎,江海腾,李石坚,李涛,潘纲

出处:IEEE Transactions on Neural Systems and Rehabilitation Engineering

 

报告摘要:Major Depression Disorder (MDD) is a common yet destructive mental disorder that affects millions of people worldwide. Making early and accurate diagnosis of it is very meaningful. Recently, EEG, a non-invasive technique of recording spontaneous electrical activity of brains, has been widely used for MDD diagnosis. However, there are still some challenges in data quality and data size of EEG: (1) A large amount of noise is inevitable during EEG collection, making it difficult to extract discriminative features from raw EEG; (2) It is difficult to recruit a large number of subjects to collect sufficient and diverse data for model training. Both of the challenges cause the overfitting problem, especially for deep learning methods. In this paper, we propose DiffMDD, a diffusion-based deep learning framework for MDD diagnosis using EEG. Specifically, we extract more noise-irrelevant features to improve the model’s robustness by designing the Forward Diffusion Noisy Training Module. Then we increase the size and diversity of data to help the model learn more generalized features by designing the Reverse Diffusion Data Augmentation Module. Finally, we re-train the classifier on the augmented dataset for MDD diagnosis. We conducted comprehensive experiments to test the overall performance and each module’s effectiveness. The framework was validated on two public MDD diagnosis datasets, achieving the state-of-the-art performance.

PCC_M006

报告题目:Simplifying Multimodal With Single EOG Modality for Automatic Sleep Staging

论坛讲者:周杨煊,赵莎,王跻权,江海腾,于正和,李石坚,李涛,潘纲

出处:IEEE Transactions on Neural Systems and Rehabilitation Engineering

 

报告摘要:Polysomnography (PSG) recordings have been widely used for sleep staging in clinics, containing multiple modality signals (i.e., EEG and EOG). Recently, many studies have combined EEG and EOG modalities for sleep staging, since they are the most and the second most powerful modality for sleep staging among PSG recordings, respectively. However, EEG is complex to collect and sensitive to environment noise or other body activities, imbedding its use in clinical practice. Comparatively, EOG is much more easily to be obtained. In order to make full use of the powerful ability of EEG and the easy collection of EOG, we propose a novel framework to simplify multimodal sleep staging with a single EOG modality. It still performs well with only EOG modality in the absence of the EEG. Specifically, we first model the correlation between EEG and EOG, and then based on the correlation we generate multimodal features with time and frequency guided generators by adopting the idea of generative adversarial learning. We collected a real-world sleep dataset containing 67 recordings and used other four public datasets for evaluation. Compared with other existing sleep staging methods, our framework performs the best when solely using the EOG modality. Moreover, under our framework, EOG provides a comparable performance to EEG.

PCC_M007

报告题目:A Multiscale Cross-modal Interactive Fusion Network for Human Activity Recognition Using Wearable Sensors and Smartphones

论坛讲者:Xin Yang,Zeju Xu,Haodong Liu,Guanzheng Liu,Changhong Wang

出处:IEEE Internet of Things Journal

 

报告摘要:Human activity recognition (HAR) enables real-time monitoring of human movement, posture, and activity level, and can provide valuable information for health management. With the continuous advancement of Internet of Things (IoT) technology, wearable sensors and smartphones equipped with various types of sensors have become widely utilized to collect multimodal data for HAR. However, in multimodal HAR, current fusion methods fall short in capturing inter-modality correlations, hampering the full exploitation of complementary information between modalities and leading to lower recognition accuracy. We thus propose a novel multiscale cross-modal interactive fusion network (MCIFN), which can fully capture correlations between various modalities and obtain an effective fused representation for HAR. Specifically, we employ a multiscale parallel convolution module to extract features from each modality at multiple scales. Then, an interactive fusion strategy based on the cross-modal attention mechanism is introduced to adjust and enhance each modality based on its correlations with other modalities. Additionally, to resolve the information redundancy caused by the interactive fusion strategy, we utilize a hybrid attention module to focus on important information in the fusion representation. Extensive experiments conducted on three publicly available datasets and one private dataset demonstrate that our proposed network outperforms the previous baseline networks for HAR. Additionally, our proposed fusion strategy yielded a notable improvement in accuracy ranging from 1.87% to 9.96% compared to existing strategies. These findings imply that our newly proposed network can realize comprehensive multimodal fusion and effectively enhance HAR accuracy, potentially contributing to advancements in individual health management and personalized healthcare interventions.

PCC_M008

报告题目:Smart Garment: A Long-Term Feasible, Whole-Body Textile Pressure-Sensing System

论坛讲者:Zhen Liang,Dongquan Zhang,Guanghua Xu,Fangting Xie,Hui Cai,Hao Guo,Xiaohui Cai

出处:IEEE Sensors Journal

 

报告摘要:Tactile sensation is important for human beings that equips the whole-body surface. To understand what kind of force distribution our bodies might sense, we created a set of pressure-sensing garment consisting of a sweater and trousers, which provides 1952 sensing points and covers 80% of the body surface evenly. As our skin works days and nights, an ideal pressure acquisition system for such purpose shall also feature both high temporal coverage and population coverage, casting simultaneous demands on wearability, durability, and affordability. Special cares were, thus, given to all design procedures, from material selection, sensor structure, and electronic-driving architecture to garment design. The capability of this smart garment in obtaining rich information about both the wearer and the environment is then demonstrated, including and not limited to the recognition of postures, self-contacts, object contacts, and interactions.

PCC_M009

报告题目:HDTSLR: A Framework Based on Hierarchical Dynamic Positional Encoding for Sign Language Recognition

论坛讲者:张江涛,王青山,王琦

出处:IEEE Transactions on Mobile Computing

 

报告摘要:Sign language is the basic way for people with hearing impairment to communicate, and sign language recognition (SLR) could effectively help in this regard. Mainstream Transformer-based SLR requires positional encoding to sense the positional information of the data. However, existing PE methods globally encode the sign data result in weaken or even ignoring the sequence variation within the gestures. This article proposes HDTSLR: A Transformer-based SLR framework built on hierarchical dynamic positional encoding (HDPE) enhances individual gesture sequence features while preserving the sign overall temporal features. HDPE designs semantic positional encoding utilizing predefined scale functions with trainable biases to emphasize sign semantic relationships. The t-distribution is used by the designed lexical positional encoding to explore the unique variation of gestures. Before the HDPE operation, the sign language data is split into equal-length feature clips while feature extraction and chunking are performed by the autoencoder. The feature clips with significant changes in gesture chunk are further selected and aggregated with the remaining ones by deforming Gram matrix. In addition, HDTSLR is evaluated on the one-handed and two-handed datasets, achieving word error rates of 16.59% and 21.67%, respectively. Comparison experiments show that it outperforms known SLR methods in both accuracy and robustness.

PCC_M010

报告题目:UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language

论坛讲者:王重阳,冯渊,钟凌逍,朱思忆,张弛,郑思齐,梁宸,王运涛,何成奇,喻纯,史元春

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:We introduce UbiPhysio, a milestone framework that delivers fine-grained action description and feedback in natural language to support people's daily functioning, fitness, and rehabilitation activities. This expert-like capability assists users in properly executing actions and maintaining engagement in remote fitness and rehabilitation programs. Specifically, the proposed UbiPhysio framework comprises a fine-grained action descriptor and a knowledge retrieval-enhanced feedback module. The action descriptor translates action data, represented by a set of biomechanical movement features we designed based on clinical priors, into textual descriptions of action types and potential movement patterns. Building on physiotherapeutic domain knowledge, the feedback module provides clear and engaging expert feedback. We evaluated UbiPhysio's performance through extensive experiments with data from 104 diverse participants, collected in a home-like setting during 25 types of everyday activities and exercises. We assessed the quality of the language output under different tuning strategies using standard benchmarks. We conducted a user study to gather insights from clinical physiotherapists and potential users about our framework. Our initial tests show promise for deploying UbiPhysio in real-life settings without specialized devices.

PCC_M011

报告题目:Integrating Gaze and Mouse Via Joint Cross-Attention Fusion Net for Students’ Activity Recognition in E-learning

论坛讲者:朱蓉蓉,施亮,宋云鹏,蔡忠闽

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:E-learning has emerged as an indispensable educational mode in the post-epidemic era. However, this mode makes it difficult for students to stay engaged in learning without appropriate activity monitoring. Our work explores a promising solution that combines gaze and mouse data to recognize students' activities, thereby facilitating activity monitoring and analysis during e-learning. We initially surveyed 200 students from a local university, finding more acceptance for eye trackers and mouse loggers compared to video surveillance. We then designed eight students' routine digital activities to collect a multimodal dataset and analyze the patterns and correlations between gaze and mouse across various activities. Our proposed Joint Cross-Attention Fusion Net, a multimodal activity recognition framework, leverages the gaze-mouse relationship to yield improved classification performance by integrating cross-modal representations through a cross-attention mechanism and integrating the joint features that characterize gaze-mouse coordination. Evaluation results show that our method can achieve up to 94.87% F1 score in predicting 8-classes activities, with an improvement of at least 7.44% over using gaze or mouse data independently. This research illuminates new possibilities for monitoring student engagement in intelligent education systems, also suggesting a promising strategy for melding perception and action modalities in behavioral analysis across a range of ubiquitous computing environments.

Session 2 无线感知

PCC_W001

报告题目:LiqDetector: Enabling Container-Independent Liquid Detection with mmWave Signals Based on a Dual-Reflection Model Learning

论坛讲者:王柱,郭依凡,任智慧,宋文超,孙卓,陈超,郭斌,於志文

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material).In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container’s impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid’s refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas.

PCC_W002

报告题目:WiProfile: Unlocking Diffraction Effects for Sub-Centimeter Target Profiling Using Commodity WiFi Devices

论坛讲者:姚智允,王炫之,牛凯,郑榕,王俊喆,张大庆

出处:Proceedings of the 30th Annual International Conference on Mobile Computing and Networking

 

报告摘要:Despite intensive research efforts in radio frequency noncontact sensing, capturing fine-grained geometric properties of objects, such as shape and size, remains an open problem using commodity WiFi devices. Prior attempts are incapable of characterizing object shape or size because they predominantly rely on weak signals reflected off objects in a very small number of directions. In this paper, motivated by the observation that the diffracted signals around an object between two WiFi devices carry the contour information of the object, we formulate the problem of reconstructing the 2D target profile and develop WiProfile, the first WiFi-based system that unlocks the diffraction effects for target profiling. We introduce a CSI-Profile model to characterize the relationship between the CSI measured at different target positions and the target profile in the diffraction zone. With suitable approximations, the inverse problem of deriving the target profile from CSI can be solved by the inverse Fresnel transform. To mitigate CSI measurement errors on commodity WiFi devices, we propose a novel antenna placement strategy. Comprehensive experiments demonstrate that WiProfile can accurately reconstruct profiles with median absolute errors of less than 1 cm under various conditions, and effectively estimate the profiles of everyday objects of diverse shapes, sizes, and materials. We believe this work opens up new directions for fine-grained target imaging using commodity WiFi devices.

PCC_W003

报告题目:Robust WiFi Respiration Sensing in the Presence of Interfering Individual

论坛讲者:谢学诚,张东恒,李亚东,胡杨,孙启彬,陈彦

出处:IEEE Transactions on Mobile Computing

 

报告摘要:WiFi-based respiration sensing technology has gained increasing attention due to its contactless sensing capabilities and utilization of existing WiFi devices. However, existing studies are limited to certain scenarios without addressing the motion interference from other individuals. In this paper, we tackle the challenge of robust respiration sensing in the presence of other individuals. Specifically, through an in-depth examination of the correlation between respiratory signals and spatial beam patterns, we develop a respiratory-energy based approach to evaluate the diverse impact of dynamic interference on respiratory signals. When significant interference is detected, we employ a convex-optimization-based beam control strategy, which exploits the inherent characteristics of human respiration, to adaptively adjust the spatial beam pattern. This approach enables a robust and precise gain adjustment between the target and interfering individual, effectively mitigating the impact of interference. Experimental results demonstrate that our approach can reduce the mean absolute error (MAE) of respiration detection by up to 32% compared to state-of-the-art methods, significantly enhancing the accuracy and robustness of WiFi-based respiration sensing.

PCC_W004

报告题目:UWB-enabled Sensing for Fast and Effortless Blood Pressure Monitoring

论坛讲者:王志,金蓓弘,张扶桑,李思恒,马俊麒

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Blood Pressure (BP) is a critical vital sign to assess cardiovascular health. However, existing cuff-based and wearable-based BP measurement methods require direct contact between the user's skin and the device, resulting in poor user experience and limited engagement for regular daily monitoring of BP. In this paper, we propose a contactless approach using Ultra-WideBand (UWB) signals for regular daily BP monitoring. To remove components of the received signals that are not related to the pulse waves, we propose two methods that utilize peak detection and principal component analysis to identify aliased and deformed parts. Furthermore, to extract BP-related features and improve the accuracy of BP prediction, particularly for hypertensive users, we construct a deep learning model that extracts features of pulse waves at different scales and identifies the different effects of features on BP. We build the corresponding BP monitoring system named RF-BP and conduct extensive experiments on both a public dataset and a self-built dataset. The experimental results show that RF-BP can accurately predict the BP of users. Over the self-built dataset, the mean absolute error (MAE) and standard deviation (SD) for SBP are 6.5 mmHg and 6.1 mmHg, and the MAE and SD for DBP are 4.7 mmHg and 4.9 mmHg.

PCC_W005

报告题目:PmTrack: Enabling Personalized mmWave-based Human Tracking

论坛讲者:刘涵凯,刘秀龙,谢鑫,佟鑫宇,李克秋

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively.

PCC_W006

报告题目:Waffle: A Waterproof mmWave-based Human Sensing System inside Bathrooms with Running Water

论坛讲者:张旭升,张舵,谢亚雄,吴丹,李洋,张大庆

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:The bathroom has consistently ranked among the most perilous rooms in households, with slip and fall incidents during showers posing a critical threat, particularly to the elders. To address this concern while ensuring privacy and accuracy, the mmWave-based sensing system has emerged as a promising solution. Capable of precisely detecting human activities and promptly triggering alarms in response to critical events, it has proved especially valuable within bathroom environments. However, deploying such a system in bathrooms faces a significant challenge: interference from running water. Similar to the human body, water droplets reflect substantial mmWave signals, presenting a major obstacle to accurate sensing. Through rigorous empirical study, we confirm that the interference caused by running water adheres to a Weibull distribution, offering insight into its behavior. Leveraging this understanding, we propose a customized Constant False Alarm Rate (CFAR) detector, specifically tailored to handle the interference from running water. This innovative detector effectively isolates human-generated signals, thus enabling accurate human detection even in the presence of running water interference. Our implementation of "Waffle" on a commercial off-the-shelf mmWave radar demonstrates exceptional sensing performance. It achieves median errors of 1.8cm and 6.9cm for human height estimation and tracking, respectively, even in the presence of running water. Furthermore, our fall detection system, built upon this technique, achieves remarkable performance (a recall of 97.2% and an accuracy of 97.8%), surpassing the state-of-the-art method.

PCC_W007

报告题目:XRF55: A Radio Frequency Dataset for Human Indoor Action Analysis

论坛讲者:王飞,吕一喆,朱梦蝶,丁菡,韩劲松

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Radio frequency (RF) devices such as Wi-Fi transceivers, radio frequency identification tags, and millimeter-wave radars have appeared in large numbers in daily lives. The presence and movement of humans can affect the propagation of RF signals, further, this phenomenon is exploited for human action recognition. However, current works have many limitations, including the unavailability of datasets, insufficient training samples, and simple or limited action categories for specific applications, which seriously hinder the growth of RF solutions, presenting a significant obstacle in transitioning RF sensing research from the laboratory to a wide range of everyday life applications. To facilitate the transitioning, in this paper, we introduce and release a large-scale multiple radio frequency dataset, named XRF55, for indoor human action analysis. XRF55 encompasses 42.9K RF samples and 55 action classes of human-object interactions, human-human interactions, fitness, body motions, and human-computer interactions, collected from 39 subjects within 100 days. These actions were meticulously selected from 19 RF sensing papers and 16 video action recognition datasets. XRF55 contains 23 RFID tags at 922.38MHz, 9 Wi-Fi links at 5.64GHz, one mmWave radar at 60-64GHz, and one Azure Kinect with RGB+D+IR sensors, covering frequency across decimeter wave, centimeter wave, and millimeter wave. In addition, we apply a mutual learning strategy over XRF55 for the task of action recognition. Unlike simple modality fusion, under mutual learning, three RF modalities are trained collaboratively and then work solely. We find these three RF modalities will promote each other. It is worth mentioning that, with synchronized Kinect, XRF55 also supports the exploration of action detection, action segmentation, pose estimation, human parsing, mesh reconstruction, etc., with RF-only or RF-Vision approaches.

PCC_W008

报告题目:Beamforming for Sensing: Hybrid Beamforming based on Transmitter-Receiver Collaboration for Millimeter-Wave Sensing

论坛讲者:Long Fan,Lei Xie,Wenhui Zhou,Chuyu Wang,Yanling Bu,Sanglu Lu

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Previous mmWave sensing solutions assumed good signal quality. Ensuring an unblocked or strengthened LoS path is challenging. Therefore, finding an NLoS path is crucial to enhancing perceived signal quality. This paper proposes Trebsen, a Transmitter-REceiver collaboration-based Beamforming SENsing using commercial mmWave radars. Specifically, we define the hybrid beamforming problem as an optimization challenge involving beamforming angle search based on transmitter-receiver collaboration. We derive a comprehensive expression for parameter optimization by modeling the signal attenuation variations resulting from the propagation path. To comprehensively assess the perception signal quality, we design a novel metric perceived signal-to-interference-plus-noise ratio (PSINR), combining the carrier signal and baseband signal to quantify the fine-grained sensing motion signal quality. Considering the high time cost of traversing or randomly searching methods, we employ a search method based on deep reinforcement learning to quickly explore optimal beamforming angles at both transmitter and receiver. We implement Trebsen and evaluate its performance in a fine-grained sensing application (i.e., heartbeat). Experimental results show that Trebsen significantly enhances heartbeat sensing performance in blocked or weakened LoS scenes. Comparing non-beamforming, Trebsen demonstrates a reduction of 23.6% in HR error and 27.47% in IBI error. Moreover, comparing random search, Trebsen exhibits a 90% increase in speed.

PCC_W009

报告题目:Understanding the Diffraction Model in Static Multipath-Rich Environments for WiFi Sensing System Design

论坛讲者:王炫之,余安澜,牛凯,石唯妍,王俊喆,姚智允,Rahul C. Shah,H. Lu,张大庆

出处:IEEE Transactions on Mobile Computing

 

报告摘要:Although WiFi-based contactless sensing has made significant progress in the past decade, most prior work still focus on the reflection zone far from WiFi transceivers, while few studies explore the diffraction zone near transceivers. Additionally, previous diffraction models only consider the CSI amplitude signal and ignore the impact of multipath. In this work, we develop an accurate diffraction model to characterize the relationship between both CSI amplitude and phase and target's movement in the diffraction zone. We further put forward the deformation forms of the model under static multipath conditions and find that the CSI patterns vary significantly with multipath. Consequently, the common assumption of a one-to-one mapping between CSI patterns and activities in existing work fails due to multipaths, degrading sensing performance when multipath changes. To address this challenge, we propose to extract a relative change pattern from CSI signals to recover the one-to-one mapping relations and eliminate the impact of static multipath. Extensive experiments under various multipath conditions demonstrate an accuracy higher than 96% for the coarse-grained intrusion detection and an average error rate of 0.6bpm for the fine-grained respiration monitoring.

PCC_W010

报告题目:Push the Limit of Highly Accurate Ranging on Commercial UWB Devices

论坛讲者:马俊麒,张扶桑,金蓓弘,苏畅,李思恒,王志,倪嘉志

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Ranging plays a crucial role in many wireless sensing applications. Among the wireless techniques employed for ranging, Ultra-Wideband (UWB) has received much attention due to its excellent performance and widespread integration into consumer-level electronics. However, the ranging accuracy of the current UWB systems is limited to the centimeter level due to bandwidth limitation, hindering their use for applications that require a very high resolution. This paper proposes a novel system that achieves sub-millimeter-level ranging accuracy on commercial UWB devices for the first time. Our approach leverages the fine-grained phase information of commercial UWB devices. To eliminate the phase drift, we design a fine-grained phase recovery method by utilizing the bi-directional messages in UWB two-way ranging. We further present a dual-frequency switching method to resolve phase ambiguity. Building upon this, we design and implement the ranging system on commercial UWB modules. Extensive experiments demonstrate that our system achieves a median ranging error of just 0.77 mm, reducing the error by 96.54% compared to the state-of-the-art method. We also present three real-life applications to showcase the fine-grained sensing capabilities of our system, including i) smart speaker control, ii) free-style user handwriting, and iii) 3D tracking for virtual-reality (VR) controllers.

PCC_W011

报告题目:Revisiting Cardinality Estimation in COTS RFID Systems

论坛讲者:陈星宇,刘佳

出处:Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

 

报告摘要:With 30 billion RFID tags sold worldwide in 2021, a common basic functionality needed by RFID-enabled applications is cardinality estimation --- to quickly estimate the number of distinct tags in an RFID system. Although many advanced solutions have been proposed over the past decade, they suffer from one major limitation in practical use: they need to either modify the existing RFID standard or obtain MAC-layer information, both of which however cannot be supported by commercial off-the-shelf (COTS) devices. In this paper, we revisit the counting problem and propose a novel counting scheme called average time duration based counter (ATD) that quickly estimates the number of distinct tags in a standards-compliant manner. Compared with existing work, the competitive advantage of ATD is that it can be directly deployed on a COTS RFID system, with no need for any hardware modifications. In ATD, we found a new and measurable indicator --- the time duration between two adjacent singleton slots, which depends on the number of tags. Following this observation, we derive the theoretical relationship between the time indicator and the number of tags and then give the proof of the estimation as well as its parameter settings. Additionally, we propose a flag-flipping solution to address the overlapping problem in the multi-reader case. We implement ATD in a COTS RFID system with 1000 tags. Experimental results show that ATD is 4.2 times faster than the baseline of tag inventory; the performance gain will be further increased in a larger RFID system.

Session 3 城市、交通与网络

PCC_C001

报告题目:Make Partition Fit Task: A Novel Framework for Joint Learning of City Region Partition and Representation

论坛讲者:Mingyu Deng,Wanyi Zhang,Jie Zhao,Zhu Wang,Mingliang Zhou,Jun Luo,Chao Chen

出处:ACM Transactions on Multimedia Computing, Communications and Applications

 

报告摘要:The proliferation of multimodal big data in cities provides unprecedented opportunities for modeling and forecasting urban problems, e.g., crime prediction and house price prediction, through data-driven approaches. A fundamental and critical issue in modeling and forecasting urban problems lies in identifying suitable spatial analysis units, also known as city region partition. Existing works rely on subjective domain knowledge for static partitions, which is general and universal for all tasks. In fact, different tasks may need different city region partitions. To address this issue, we propose a task-oriented framework for Joint Learning of region Partition and Representation (JLPR for short hereafter). To make partition fit task, JLPR integrates the region partition into the representation model training and learns region partitions using the supervision signal from the downstream task. We evaluate the framework on two prediction tasks (i.e., crime prediction and housing price prediction) in Chicago. Experiments show that JLPR consistently outperforms state-of-the-art partitioning methods in both tasks, which achieves above 25% and 70% performance improvements in terms of Mean Absolute Error (MAE) for crime prediction and house price prediction tasks, respectively. Additionally, we meticulously undertake three visualization case studies, which yield profound and illuminating findings from diverse perspectives, demonstrating the remarkable effectiveness and superiority of our approach.

PCC_C002

报告题目:Coupling Makes Better: An Intertwined Neural Network for Taxi and Ridesourcing Demand Co-Prediction

论坛讲者:Jie Zhao,Chao Chen,Wanyi Zhang,Ruiyuan Li,Fuqiang Gu,Songtao Guo,Jun Luo,Yu Zheng

出处:IEEE Transactions on Intelligent Transportation Systems

 

报告摘要:While a variety of innovative travel modes, such as taxi service and ridesourcing service, have been launched to improve the transportation efficiency, people still encounter travel problems in real life. The major cause is the imbalance between transportation supply and demand. To strike a balance, it is well-recognized that an accurate and timely passenger demand prediction model is the foundation to enable high-level human intelligence (i.e., taxi drivers) or machine intelligence (i.e., ride- hailing platforms) to allocate resources in advance. Although quite a lot of deep models have been designed to model the complicated spatial and temporal dependencies in a data-driven way, they focus on the demand prediction of a single mode and ignore the fact that passengers may shift between different modes, especially between taxis and ridesourcing cars. In this paper, we target a co-prediction problem that considers the prediction of taxi and ridesourcing as two coupled and associated tasks, and propose a novel Temporal and Spatial Intertwined Network (TSIN) that consists of two twin components and an intertwined component. Each twin in the TSIN model is able to extract spatial and temporal dependencies from its corresponding travel mode separately (i.e., intra-mode features), and the in-between intertwined component is designed to bridge the twins and allow them to exchange information (i.e., inter-mode features), thus enabling better prediction. We first evaluate our model on four real-world datasets. Results demonstrate the outstanding performance of our model and the necessity to take into account the influence between modes. Based on an additional demand data from bike in NYC, we then discuss the generalizability in coupling more transportation modes. Further results demonstrate that our proposed intertwined neural network is highly flexible and extendable, and can yield better prediction performance.

PCC_C003

报告题目:Seeking Based on Dynamic Prices: Higher Earnings and Better Strategies in Ride-on-demand Services

论坛讲者:Suiming Guo,Qianrong Shen,Zhiquan Liu,Chao Chen,Chaoxiong Chen,Jingyuan Wang,Zhetao Li,Ke Xu

出处:IEEE Transactions on Intelligent Transportation Systems

 

报告摘要:In recent years, ride-on-demand (RoD) services such as Uber and DiDi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD service, the dynamic price is a new and accurate indicator describing the supply and demand, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we propose to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first justfiy why it is necessary to recommend seeking routes and consider dynamic prices, by analyzing real service data from a typical RoD service. We then design a reinforcement learning model based on order and GPS trajectories datasets, and take into account dynamic prices in the design. Results prove that our model improves both driver earnings and seeking strategies. On driver earnings, the reinforcement learning model increases revenue efficiency by up to 34.52%, and considering dynamic prices leads to another increase of 6.19%. On seeking strategies, drivers are encouraged to serve local demand first, and they are redistributed more evenly and effectively.

PCC_C004

报告题目:Data-Driven Pick-Up Location Recommendation for Ride-Hailing Services

论坛讲者:刘志丹,张鸿权,欧阳国峰,陈俊杨,伍楷舜

出处:IEEE Transactions on Mobile Computing

 

报告摘要:Ride-hailing service (RHS) has become an important transportation mode in our daily life. Although many works have been proposed to improve RHS from different aspects, only few works focus on the selections of pick-up locations, where rider and driver meet and start a trip. In this paper, we present MPLRec, a data-driven pick-up location recommendation system that exploits riders' specific mobility demands, e.g., destination, and historical experiences to meet riders’ travel requirements. MPLRec generates potential pick-up locations over the road network and characterizes them with rich features that describe a location from the riders' perspective. We also build spatio-temporal indexes to organize potential pick-up locations and historical data for facilitating online recommending. When processing an online recommendation request, MPLRec derives candidate pick-up locations and investigates them with materialized features, which are computed from historical order and trajectory data while considering rider’s mobility demands. Based on these features, a novel scoring function is used to derive the best pick-up location for each request. Moreover, we implement an RHS simulator to evaluate MPLRec using large-scale practical ride-hailing datasets. Extensive experiments and simulations demonstrate the effectiveness and efficiency of MPLRec, which can complete each request within 0.5 s and largely reduce the ride-hailing costs when compared to baseline methods.

PCC_C005

报告题目:Federated Representation Learning With Data Heterogeneity for Human Mobility Prediction

论坛讲者:张啸,应豪超,于东晓

出处:IEEE Transactions on Intelligent Transportation Systems

 

报告摘要:The advancement of smart wearable devices and location-based smart services has enabled a new paradigm for smart human mobility prediction (HMP), which has a broad range of applications in smart healthcare and smart cities. Due to the privacy concerns and rigorous data regulations, federated learning provides a distributed learning framework to collaboratively train the HMP model without sharing the highly sensitive location data with others. However, in real-world scenarios, federated human mobility prediction suffers from data heterogeneity challenge, which includes two main aspects: heterogeneity mobility patterns, and data scarcity. In this paper, we propose an end-to-end federated representation learning framework for human mobility prediction, named FR-HMP, to overcome all the above obstacles. Specially, in order to enhance the representation abilities of data-scarcity clients, a two-phase learning process is proposed. The clustering module could cluster similar clients together on the parameter server to address the heterogeneous mobility patterns, and the representation learning module learns the enhanced representations of each client through the graph learning layer and graph convolution layer on the third-part server. Finally, extensive experiments are conducted using two diverse real-world HMP datasets to show the advantages of FR-HMP over state-of-the-art methods.

PCC_C006

报告题目:A Unified Model for Spatio-Temporal Prediction Queries with Arbitrary Modifiable Areal Units

论坛讲者:陈李越,房江祎,刘腾飞,曹绍升,王乐业

出处:IEEE International Conference on Data Engineering

 

报告摘要:Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly to support. Secondly, different ST models may produce conflicting outputs, resulting in confusing predictions. In this paper, we propose One4All-ST, a framework that can conduct ST prediction for arbitrary modifiable areal units using only one model. To reduce the cost of getting multiscale predictions, we design an ST network with hierarchical spatial modeling and scale normalization modules to efficiently and equally learn multi-scale representations. To address prediction inconsistencies across scales, we propose a dynamic programming scheme to solve the formulated optimal combination problem, minimizing predicted error through theoretical analysis. Besides, we suggest using an extended quad-tree to index the optimal combinations for quick response to arbitrary modifiable areal units in practical online scenarios. Extensive experiments on two real-world datasets verify the efficiency and effectiveness of One4All-ST in ST prediction for arbitrary modifiable areal units. The source codes and data of this work are available at https://github.com/uctb/One4All-ST.

PCC_C007

报告题目:Privacy Leakage from Dynamic Prices: Trip Purpose Mining as an Example

论坛讲者:Suiming Guo,Chao Chen,Zhetao Li,Chengwu Liao,Yaxiao Liu,Ke Xu,Daqing Zhang

出处:IEEE Transactions on Mobile Computing

 

报告摘要:Dynamic prices are used in many scenarios, e.g., flight ticketing, hotel room booking and ride-on-demand (RoD) service such as Uber and DiDi, and while they are beneficial for service providers, practitioners or users, they lead to the concern of privacy leakage -- the possibility of learning user information from dynamic prices. In this paper, we aim to study this possibility and choose trip purpose mining in RoD service as an attack example, based on real-world large datasets. We discuss the criteria of choosing datasets -- ubiquitous, collective and easily accessible -- from the perspective of an attacker, and extract features describing trip information, spatio-temporal and dynamic prices context. The trip purpose mining problem is then solved as a multi-class classification problem and multiple binary-class problems. In the multi-class problem, we verify that dynamic prices information results in a 17.1% improvement in classification accuracy; in the binary-class problems, we quantify feature contributions and explain the different extents of privacy leakage in identifying different trip purposes. Our hope is that the study not only serves as a case study demonstrating the privacy leakage problem in RoD service, but also sheds light on such privacy problem in other services using dynamic prices and triggers more research efforts.

PCC_C008

报告题目:RF-Boundary: RFID-Based Virtual Boundary

论坛讲者:李潇宇,刘佳

出处:Proceedings of the IEEE International Conference on Computer Communications

 

报告摘要:A boundary is a physical or virtual line that marks the edge or limit of a specific region, which has been widely used in many applications, such as autonomous driving, virtual wall, and robotic lawn mowers. However, none of existing work can well balance the cost, the deployability, and the scalability of a boundary. In this paper, we propose a new RFID-based boundary scheme together with its detection algorithm called RF-Boundary, which has the competitive advantages of being battery-free, lowcost, and easy-to-maintain. We develop two technologies of phase gradient and dual-antenna DoA to address the key challenges posed by RF-boundary, in terms of lack of calibration information and multi-edge interference. We implement a prototype of RF-Boundary with commercial RFID systems and a mobile robot. Extensive experiments verify the feasibility as well as the good performance of RF-Boundary.

PCC_C009

报告题目:Adaptive Budgeting for Collaborative Multi-Task Data Collection in Online Sparse Crowdsensing

论坛讲者:涂淳钰,於志勇,韩磊,郭贤伟,黄昉菀,郭文忠,王乐业

出处:IEEE Transactions on Mobile Computing

 

报告摘要:Sparse crowdsensing collects data from a subset of the sensing area and infers data for unsensed areas, reducing data collection costs. Previous works have primarily focused on independently collecting and inferring single types of data. However, real-world scenarios often involve multiple types of data that can complement each other by providing missing spatiotemporal distribution information. In this paper, we fully consider both intra-data correlations among data of the same type and inter-data correlations among data of different types, enabling collaborative execution of various tasks. In addition, we enhance the adaptability in practical application scenarios by utilizing real-time collected sparse data to guide task execution. For this purpose, we propose a multi-task adaptive budgeting framework for online sparse crowdsensing, called MTAB-SC. This framework consists of three parts: training data updating, data inference, and data collection. First, we propose a multi-task data updating method to keep models up-to-date. Second, we design a data inference network for multi-task data joint inference. Finally, to allocate suitable budgets for each task and facilitate collaborative data collection across multiple tasks, we propose an Adaptive Budgeting for Collaborative Data Collection model (AB-CoDC). The effectiveness of our proposals is demonstrated through extensive experiments on two real-world datasets.

PCC_C010

报告题目:Edge-Assisted Spectrum Sharing for Freshness-Aware Industrial Wireless Networks: A Learning-Based Approach

论坛讲者:Mingyan Li,Cailian Chen,Huaqing Wu,Xinping Guan,Xuemin Shen

出处:IEEE Transactions on Wireless Communications

 

报告摘要:Information freshness is essential to industrial wireless networks (IWNs) and can be quantified by the age-ofinformation (AoI) metric. This paper addresses an AoI-aware spectrum sharing (AgeS) problem in IWNs, where multiple device-to-device (D2D) links opportunistically access the spectrum to satisfy their AoI constraints while maximizing primal links’ throughput. Particularly, we orchestrate the access of D2D links in a distributed manner. Since distributed scheduling results in incomplete observation, D2D links share the spectrum with uncertainty on the transmission environment. Therefore, we propose a distributed scheduling scheme, called D-age, to deal with the transmission uncertainty in the AgeS problem, where an adaptation of actor-critic method is adopted with AoI constraints tackled in the dual domain. To address the non-stationary environment and multi-agent credit assignment issue, cooperative multi-agent reinforcement learning (MARL) approach is developed, where multiple local actors are designed to guide D2D links to make real-time decisions via distributed scheduling policies, which are evaluated by an edge-assisted global critic with action-aware advantage functions. Integrated with graph attention networks (GATs), the critic selectively learns contextual information by assigning different importances to neighboring links, which enables the evaluation of scheduling policies in a scalable and computation-efficient manner. Theoretical guarantee of the time-averaged AoI constraints is provided and the effectiveness of D-age in terms of both AoI violation ratio and the capacity of primal links is demonstrated by simulation.

PCC_C011

报告题目:Citywide LoRa Network Deployment and Operation: Measurements, Analysis, and Implications

论坛讲者:童率,王继良

出处:Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems

 

报告摘要:LoRa, as a representative Low-Power Wide-Area Network (LPWAN) technology, holds tremendous potential for various city and industrial applications. However, as there are few real large-scale deployments, it is unclear whether and how well LoRa can eventually meet its prospects. In this paper, we demystify the real performance of LoRa by deploying and measuring a citywide LoRa network, named CityWAN, which consists of 100 gateways and 19,821 LoRa end nodes, covering an area of 130 km^2 for 12 applications. Our measurement focuses on the following perspectives: (i) Performance of applications running on the citywide LoRa network; (ii) Infrastructure efficiency and deployment optimization; (iii) Physical layer signal features and link performance; (iv) Energy profiling and cost estimation for LoRa applications. The results reveal that LoRa performance in urban settings is bottlenecked by the prevalent blind spots, and there is a gap between the gateway efficiency and network coverage for the infrastructure deployment. Besides, we find that LoRa links at the physical layer are susceptible to environmental variations, and LoRa and other LPWANs show diverse costs for different scenarios. Our measurement provides insights for large-scale LoRa network deployment and also for future academic research to fully unleash the potential of LoRa.

Session 4 声音与视觉

PCC_S001

报告题目:RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFID

论坛讲者:Yunzhong Chen,Jiadi Yu,Yingying Chen,Linghe Kong,Yanmin Zhu,Yi-Chao Chen

出处:Proceedings of the 22nd Annual International Conference on Mobile Systems

 

报告摘要:Eavesdropping on human sound is one of the most common but harmful ways to threaten personal privacy. As one of the most essential accessories, headsets have been widely used in common online conversations, such as online calls, video meetings, etc. The metal coil vibration patterns of headset speakers/microphones have been proven to be highly correlated with the speaker-produced/microphone-received sound content. This paper presents an online conversation eavesdropping system, RFSpy, which uses only one RFID tag attached on a headset to alternately sense the metal coil vibrations of headset speaker and microphone for eavesdropping on speaker-produced and microphone-received sound. In some accessible scenarios, such as meeting rooms, offices, etc., assuming attackers secretly attach a small, battery-free RFID tag under one ear cushion of an eavesdropped user’s headset without being noticed. Meanwhile, RFID readers are camouflaged as decorations placed in/out of rooms to transmit and receive RF signals. When the eavesdropped user talks with other users online by using the headset, RFSpy first activates the RFID tag attached on the headset to capture the metal coil vibration patterns of headset speaker and microphone upon RF signals. Then, RFSpy reconstructs sound spectrograms from the RF signal-based vibration patterns for not only trained words but also untrained (i.e., out-of-vocabulary) words by utilizing a designed Sound Spectrogram Reconstruction (SSR) network. Finally, RFSpy converts the sound spectrograms to conversation content through a sound recognition API. Extensive experiments in real environments demonstrate that RFSpy can eavesdrop on online conversations with out-of-vocabulary (OOV) words effectively.

PCC_S002

报告题目:The EarSAVAS Dataset: Enabling Subject-Aware Vocal Activity Sensing on Earables

论坛讲者:张奚宇星,王运涛,韩宇轩,梁宸,Ishan Chatterjee,唐健凯,易鑫,Shwetak Patel,史元春

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Subject-aware vocal activity sensing on wearables, which specifically recognizes and monitors the wearer's distinct vocal activities, is essential in advancing personal health monitoring and enabling context-aware applications. While recent advancements in earables present new opportunities, the absence of relevant datasets and effective methods remains a significant challenge. In this paper, we introduce EarSAVAS, the first publicly available dataset constructed specifically for subject-aware human vocal activity sensing on earables. EarSAVAS encompasses eight distinct vocal activities from both the earphone wearer and bystanders, including synchronous two-channel audio and motion data collected from 42 participants totaling 44.5 hours. Further, we propose EarVAS, a lightweight multi-modal deep learning architecture that enables efficient subject-aware vocal activity recognition on earables. To validate the reliability of EarSAVAS and the efficiency of EarVAS, we implemented two advanced benchmark models. Evaluation results on EarSAVAS reveal EarVAS's effectiveness with an accuracy of 90.84% and a Macro-AUC of 89.03%. Comprehensive ablation experiments were conducted on benchmark models and demonstrated the effectiveness of feedback microphone audio and highlighted the potential value of sensor fusion in subject-aware vocal activity sensing on earables. We hope that the proposed EarSAVAS and benchmark models can inspire other researchers to further explore efficient subject-aware human vocal activity sensing on earables.

PCC_S003

报告题目:Self-supervised domain exploration with an Optimal Transport Regularization for Open Set Cross-domain Speech Emotion Recognition

论坛讲者:张瑞腾,魏建国,路文焕,更太加,徐君海

出处:2024 IEEE International Conference on Acoustics, Speech and Signal Processing

 

报告摘要:In the tasks of domain adaptation (DA) for speech emotion recognition (SER), self-supervised learning (SSL) algorithms could effectively explore domain and structural information from target domain samples, thereby mitigating domain discrepancies. However, in a general setting, when the target domain contains emotions that are never observed in the source domain, namely in open-set DA, existing SSL-based DA methods cannot maintain the robust- ness because of the interference of the extra unknown classes. To address this challenge, we propose the self-supervised domain exploration with an optimal transport (OT) regularization (SDEOTR) algorithm. First, we integrate the SSL algorithm into the SER model to mitigate the domain differences. Further, we categorize target domain samples into known and unknown groups based on the net- work’s prediction confidence. Finally, we employ OT to maximize the global probability distance between the two groups, aiming to decrease the impact of unknown emotions on the SER model. Cross-domain SER experimental results showed that our label-free SDEOTR significantly improved the performance of existing adaptive SER algorithms in open-set scenarios.

PCC_S004

报告题目:AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception

论坛讲者:陈锰,卢立,俞嘉地,巴钟杰,林峰,任奎

出处:IEEE Transactions on Information Forensics and Security

 

报告摘要:As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been demonstrated to be vulnerable to adversarial examples, which have already raised general concerns in both academia and industry. Existing attacks follow the same adversarial example generation paradigm from computer vision, i.e., overlaying the optimized additive perturbations on original voices. However, due to the additive perturbations’ nature on human audibility, balancing the stealthiness and attack capability remains a challenging problem. In this paper, we rethink the stealthiness of audio adversarial examples and turn to introduce another kind of audio distortion, i.e., reverberation, as a new perturbation format for stealthy adversarial example generation. Such convolutional adversarial perturbations are crafted as real-world impulse responses and behave as a natural reverberation for deceiving humans. Based on this idea, we propose AdvReverb to construct, optimize, and deliver phoneme-level convolutional adversarial perturbations on both speech and music carriers with a well-designed objective.

PCC_S005

报告题目:MoiréVision: A Generalized Moiré-based Mechanism for 6-DoF Motion Sensing

论坛讲者:Jingyi Ning,Lei Xie,Zhihao Yan,Yanling Bu,Jun Luo

出处:Proceedings of the 30th Annual International Conference on Mobile Computing and Networking

 

报告摘要:Ultra-high precision motion sensing leveraging computer vision (CV) is a key technology in many high-precision AR/VR applications such as precise industrial manufacture and image-guided surgery, yet conventional CV can be challenged by moiré-based sensing mechanism, thanks to moiré pattern’s high sensitivity to six degrees of freedom (6-DoF) pose changes. Unfortunately, existing moiré-based solutions, in their infancy, cannot deal with complicated curvilinear moiré patterns caused by various perspective angles. In this paper, we propose a generalized moiré-based mechanism, MoiréVision, towards practical adoptions; it relies on high-frequency gratings as visual marker to help extract the fine-grained feature points for ultra-high precision motion sensing. As the foundation of general moiré-based sensing, we propose a formulation to characterize uncontrolled curvilinear moiré patterns in practical scenarios. To deal with the problem of moiré feature interference in practice, we propose a Gabor-based algorithm to separate overlapped curvilinear moiré patterns from two dimensions. Furthermore, to extract fine-grained feature points for high-precision motion sensing, we propose a bending function-based model and a resolution-enhanced strategy to reconstruct detailed texture of moiré markers and extract moiré feature points at sub-pixel level. Extensive experimental results show that MoiréVision greatly enhances the usability and generalizability of moiré-based sensing systems in real-world applications.

PCC_S006

报告题目:EFEVD: Enhanced Feature Extraction for Smart Contract Vulnerability Detection

论坛讲者:江池,刘熙涵,王申奥,Jinzhuo Liu,张引

出处:the 33rd International Joint Conference on Artificial Intelligence

 

报告摘要:Because of the wide deployment of smart contracts, smart contract vulnerabilities pose a challenging risk to blockchain security. Currently, deep learning-based vulnerability detection is a very attractive solution due to its ability to identify complex patterns and features. The existing methods mainly consider the contract code content features, expert knowledge patterns, and contract code modalities. To further enhance smart contract vulnerability detection, this paper attempts to identify community features from smart contracts with similar semantic and syntactic structures, and shared features from two related vulnerability detection tasks, vulnerability classification and localization. The experimental results verify that the proposed approach significantly outperforms the state-of-the-art methods in terms of accuracy, recall, precision, and F1-score.

PCC_S007

报告题目:Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning

论坛讲者:Yanwen Ba,Xuan Liu,Xinning Chen,Hao Wang,Yang Xu,Kenli Li,Shigeng Zhang

出处:The Thirty-Eighth AAAI Conference on Artificial Intelligence

 

报告摘要:While decentralized training is attractive in multi-agent reinforcement learning (MARL) for its excellent scalability and robustness, its inherent coordination challenges in collaborative tasks result in numerous interactions for agents to learn good policies. To alleviate this problem, action advising methods make experienced agents share their knowledge about what to do, while less experienced agents strictly follow the received advice. However, this method of sharing and utilizing knowledge may hinder the team's exploration of better states, as agents can be unduly influenced by suboptimal or even adverse advice, especially in the early stages of learning. Inspired by the fact that humans can learn not only from the success but also from the failure of others, this paper proposes a novel knowledge sharing framework called Cautiously-Optimistic kNowledge Sharing} (CONS). CONS enables each agent to share both positive and negative knowledge and cautiously assimilate knowledge from others, thereby enhancing the efficiency of early-stage exploration and the agents' robustness to adverse advice. Moreover, considering the continuous improvement of policies, agents value negative knowledge more in the early stages of learning and shift their focus to positive knowledge in the later stages. Our framework can be easily integrated into existing Q-learning based methods without introducing additional training costs. We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance.

PCC_S008

报告题目:Minimizing Latency for Multi-DNN Inference on Resource-Limited CPU-Only Edge Devices

论坛讲者:王涛,石拓,刘秀龙

出处:Proceedings of the IEEE International Conference on Computer Communications

 

报告摘要:Despite considerable advancements in specialized hardware, the majority of IoT edge devices still rely on CPUs. The burgeoning number of IoT users amplifies the challenges associated with performing multiple Deep Neural Network inferences on these resource-limited, CPU-only edge devices. Existing strategies, including model compression, hardware acceleration, and model partitioning, often involve a trade-off in inference accuracy, are unsuitable due to hardware specificity, or lead to inefficient resource utilization. In response to these challenges, this paper introduces L-PIC (Latency Minimized Parallel Inference on CPU)—a framework expressly devised to optimize resource allocation, decrease inference latency, and maintain result accuracy on CPU-only edge devices. A series of comprehensive experiments have verified the superior efficiency and effectiveness of the L-PIC framework in comparison to the state-of-the-art method. Remarkably, compared to the state-of-the-art method, L-PIC can reduce the inference latency of multi-DNN by an average of approximately 30% across all tested scenarios.

PCC_S009

报告题目:Seeing through the Tactile: 3D Human Shape Estimation from Temporal In-Bed Pressure Images

论坛讲者:Ziyu Wu,Fangting Xie,Yiran Fang,Zhen Liang,Quan Wan,Yufan Xiong,Xiaohui Cai

出处:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

 

报告摘要:Humans spend about one-third of their lives resting. Reconstructing human dynamics in in-bed scenarios is of considerable significance in sleep studies, bedsore monitoring, and biomedical factor extractions. However, the mainstream human pose and shape estimation methods mainly focus on visual cues, facing serious issues in non-line-of-sight environments. Since in-bed scenarios contain complicated human-environment contact, pressure-sensing bedsheets provide a non-invasive and privacy-preserving approach to capture the pressure distribution on the contact surface, and have shown prospects in many downstream tasks. However, few studies focus on in-bed human mesh recovery. To explore the potential of reconstructing human meshes from the sensed pressure distribution, we first build a high-quality temporal human in-bed pose dataset, TIP, with 152K multi-modality synchronized images. We then propose a label generation pipeline for in-bed scenarios to generate reliable 3D mesh labels with a SMPLify-based optimizer. Finally, we present PIMesh, a simple yet effective temporal human shape estimator to directly generate human meshes from pressure image sequences. We conduct various experiments to evaluate PIMesh’s performance, showing that PIMesh archives 79.17mm joint position errors on our TIP dataset. The results demonstrate that the pressure-sensing bedsheet could be a promising alternative for long-term in-bed human shape estimation.

PCC_S010

报告题目:基于边缘特征的视觉里程计定位方法

论坛讲者:赵辉,尚建嘎,刘凯,陈超,古富强

出处:2023 IEEE International Conference on Robotics and Automation

 

报告摘要:Visual odometry is important for plenty of applications such as autonomous vehicles, and robot navigation. It is challenging to conduct visual odometry in textureless scenes or environments with sudden illumination changes where popular feature-based methods or direct methods cannot work well. To address this challenge, some edge-based methods have been proposed, but they usually struggle between the efficiency and accuracy. In this work, we propose a novel visual odometry approach called EdgeVO, which is accurate, efficient, and robust. By efficiently selecting a small set of edges with certain strategies, we significantly improve the computational efficiency without sacrificing the accuracy. Compared to existing edge-based method, our method can significantly reduce the computational complexity while maintaining similar accuracy or even achieving better accuracy. This is attributed to that our method removes useless or noisy edges. Experimental results on the TUM datasets indicate that EdgeVO significantly outperforms other methods in terms of efficiency, accuracy and robustness.