PCC顶会顶刊交流论文论坛
主持人简介 | ||||||
---|---|---|---|---|---|---|
![]() 周斌彬 副教授 浙大城市学院 |
Session 1: 多模态感知
简介: |
|||||
![]() 王楚豫 助理教授 南京大学 |
Session 2: 无线感知
简介: |
|||||
![]() 刘志丹 教授 香港科技大学 |
Session 3: 城市、交通与网络
简介: |
|||||
![]() 卢立 研究员 浙江大学 |
Session 4: 声音与视觉
简介: |
PCC_M001 | |
---|---|
报告报告题目:Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity Recognition
报告摘要:The widespread adoption of wearable devices has led to a surge in the development of multi-device wearable human activity recognition (WHAR) systems. Nevertheless, the performance of traditional supervised learning-based methods to WHAR is limited by the challenge of collecting ample annotated wearable data. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising solution by first training a competent feature extractor on a substantial quantity of unlabeled data, followed by refining a minimal classifier with a small amount of labeled data. Despite the promise of SSL in WHAR, the majority of studies have not considered missing device scenarios in multi-device WHAR. To bridge this gap, we propose a multi-device SSL WHAR method termed Spatial-Temporal Masked Autoencoder (STMAE). STMAE captures discriminative activity representations by utilizing the asymmetrical encoder-decoder structure and two-stage spatial-temporal masking strategy, which can exploit the spatial-temporal correlations in multi-device data to improve the performance of SSL WHAR, especially on missing device scenarios. Experiments on four real-world datasets demonstrate the efficacy of STMAE in various practical scenarios. |
|
PCC_M002 | |
报告题目:HMGAN: A Hierarchical Multi-Modal Generative Adversarial Network Model for Wearable Human Activity Recognition
报告摘要:Wearable Human Activity Recognition (WHAR) is an important research field of ubiquitous and mobile computing. Deep WHAR models suffer from the overfitting problem caused by the lack of a large amount and variety of labeled data, which is usually addressed by generating data to enlarge the training set, i.e., Data Augmentation (DA). Generative Adversarial Networks (GANs) have shown their excellent data generation ability, and the generalization ability of a classification model can be improved by GAN-based DA. However, existing GANs cannot make full use of the important modality information and fail to balance modality details and global consistency, which cannot meet the requirements of deep multi-modal WHAR. In this paper, a hierarchical multi-modal GAN model (HMGAN) is proposed for WHAR. HMGAN consists of multiple modal generators, one hierarchical discriminator, and one auxiliary classifier. Multiple modal generators can learn the complex multi-modal data distributions of sensor data. Hierarchical discriminator can provide discrimination outputs for both low-level modal discrimination losses and high-level overall discrimination loss to draw a balance between modality details and global consistency. Experiments on five public WHAR datasets demonstrate that HMGAN achieves the state-of-the-art performance for WHAR, outperforming the best baseline by an average of 3.4%, 3.8%, and 3.5% in accuracy, macro F1 score, and weighted F1 score, respectively. |
|
PCC_M003 | |
报告题目:HeadMon: Head Dynamics Enabled Riding Maneuver Prediction
报告摘要:Although micro-mobility brings convenience to modern cities, they also cause various social problems, such as traffic accidents, casualties, and substantial economic losses. Wearing protective equipment has become the primary recommendation for safe riding. However, passive protection cannot prevent the occurrence of accidents. Thus, timely predicting the rider's maneuver is essential for active protection and providing more time to avoid potential accidents from happening. Through the qualitative study, we argue that we can use the rider's head dynamic as an information source to predict the rider's following maneuvers. We accordingly present HeadMon, a riding maneuver prediction system for safe riding. HeadMon utilizes the head dynamics of a rider by installing an inertial measurement unit on the helmet. It uses the extracted head dynamics features as the input of the deep learning architecture to achieve prediction. We implemented the HeadMon prototype on Android smartphone as a proof of concept. Through comprehensive experiments with 20 participants, the result demonstrates the excellent performance of HeadMon: not only could it achieve an overall precision of at least 85\% for maneuver prediction under a 4s prediction time gap, but it also could keep a high accuracy under a low sampling rate. The low-cost feature of HeadMon allows it to be readily deployable and towards more safety riding. |
|
PCC_M004 | |
报告题目:Generalizable Sleep Staging via Multi-Level Domain Alignment
报告摘要:Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance. |
|
PCC_M005 | |
报告题目:DiffMDD: A Diffusion-based Deep Learning Framework for MDD Diagnosis Using EEG
报告摘要:Major Depression Disorder (MDD) is a common yet destructive mental disorder that affects millions of people worldwide. Making early and accurate diagnosis of it is very meaningful. Recently, EEG, a non-invasive technique of recording spontaneous electrical activity of brains, has been widely used for MDD diagnosis. However, there are still some challenges in data quality and data size of EEG: (1) A large amount of noise is inevitable during EEG collection, making it difficult to extract discriminative features from raw EEG; (2) It is difficult to recruit a large number of subjects to collect sufficient and diverse data for model training. Both of the challenges cause the overfitting problem, especially for deep learning methods. In this paper, we propose DiffMDD, a diffusion-based deep learning framework for MDD diagnosis using EEG. Specifically, we extract more noise-irrelevant features to improve the model’s robustness by designing the Forward Diffusion Noisy Training Module. Then we increase the size and diversity of data to help the model learn more generalized features by designing the Reverse Diffusion Data Augmentation Module. Finally, we re-train the classifier on the augmented dataset for MDD diagnosis. We conducted comprehensive experiments to test the overall performance and each module’s effectiveness. The framework was validated on two public MDD diagnosis datasets, achieving the state-of-the-art performance. |
|
PCC_M006 | |
报告题目:Simplifying Multimodal With Single EOG Modality for Automatic Sleep Staging
报告摘要:Polysomnography (PSG) recordings have been widely used for sleep staging in clinics, containing multiple modality signals (i.e., EEG and EOG). Recently, many studies have combined EEG and EOG modalities for sleep staging, since they are the most and the second most powerful modality for sleep staging among PSG recordings, respectively. However, EEG is complex to collect and sensitive to environment noise or other body activities, imbedding its use in clinical practice. Comparatively, EOG is much more easily to be obtained. In order to make full use of the powerful ability of EEG and the easy collection of EOG, we propose a novel framework to simplify multimodal sleep staging with a single EOG modality. It still performs well with only EOG modality in the absence of the EEG. Specifically, we first model the correlation between EEG and EOG, and then based on the correlation we generate multimodal features with time and frequency guided generators by adopting the idea of generative adversarial learning. We collected a real-world sleep dataset containing 67 recordings and used other four public datasets for evaluation. Compared with other existing sleep staging methods, our framework performs the best when solely using the EOG modality. Moreover, under our framework, EOG provides a comparable performance to EEG. |
|
PCC_M007 | |
报告题目:A Multiscale Cross-modal Interactive Fusion Network for Human Activity Recognition Using Wearable Sensors and Smartphones
报告摘要:Human activity recognition (HAR) enables real-time monitoring of human movement, posture, and activity level, and can provide valuable information for health management. With the continuous advancement of Internet of Things (IoT) technology, wearable sensors and smartphones equipped with various types of sensors have become widely utilized to collect multimodal data for HAR. However, in multimodal HAR, current fusion methods fall short in capturing inter-modality correlations, hampering the full exploitation of complementary information between modalities and leading to lower recognition accuracy. We thus propose a novel multiscale cross-modal interactive fusion network (MCIFN), which can fully capture correlations between various modalities and obtain an effective fused representation for HAR. Specifically, we employ a multiscale parallel convolution module to extract features from each modality at multiple scales. Then, an interactive fusion strategy based on the cross-modal attention mechanism is introduced to adjust and enhance each modality based on its correlations with other modalities. Additionally, to resolve the information redundancy caused by the interactive fusion strategy, we utilize a hybrid attention module to focus on important information in the fusion representation. Extensive experiments conducted on three publicly available datasets and one private dataset demonstrate that our proposed network outperforms the previous baseline networks for HAR. Additionally, our proposed fusion strategy yielded a notable improvement in accuracy ranging from 1.87% to 9.96% compared to existing strategies. These findings imply that our newly proposed network can realize comprehensive multimodal fusion and effectively enhance HAR accuracy, potentially contributing to advancements in individual health management and personalized healthcare interventions. |
|
PCC_M008 | |
报告题目:Smart Garment: A Long-Term Feasible, Whole-Body Textile Pressure-Sensing System
报告摘要:Tactile sensation is important for human beings that equips the whole-body surface. To understand what kind of force distribution our bodies might sense, we created a set of pressure-sensing garment consisting of a sweater and trousers, which provides 1952 sensing points and covers 80% of the body surface evenly. As our skin works days and nights, an ideal pressure acquisition system for such purpose shall also feature both high temporal coverage and population coverage, casting simultaneous demands on wearability, durability, and affordability. Special cares were, thus, given to all design procedures, from material selection, sensor structure, and electronic-driving architecture to garment design. The capability of this smart garment in obtaining rich information about both the wearer and the environment is then demonstrated, including and not limited to the recognition of postures, self-contacts, object contacts, and interactions. |
|
PCC_M009 | |
报告题目:HDTSLR: A Framework Based on Hierarchical Dynamic Positional Encoding for Sign Language Recognition
报告摘要:Sign language is the basic way for people with hearing impairment to communicate, and sign language recognition (SLR) could effectively help in this regard. Mainstream Transformer-based SLR requires positional encoding to sense the positional information of the data. However, existing PE methods globally encode the sign data result in weaken or even ignoring the sequence variation within the gestures. This article proposes HDTSLR: A Transformer-based SLR framework built on hierarchical dynamic positional encoding (HDPE) enhances individual gesture sequence features while preserving the sign overall temporal features. HDPE designs semantic positional encoding utilizing predefined scale functions with trainable biases to emphasize sign semantic relationships. The t-distribution is used by the designed lexical positional encoding to explore the unique variation of gestures. Before the HDPE operation, the sign language data is split into equal-length feature clips while feature extraction and chunking are performed by the autoencoder. The feature clips with significant changes in gesture chunk are further selected and aggregated with the remaining ones by deforming Gram matrix. In addition, HDTSLR is evaluated on the one-handed and two-handed datasets, achieving word error rates of 16.59% and 21.67%, respectively. Comparison experiments show that it outperforms known SLR methods in both accuracy and robustness. |
|
PCC_M010 | |
报告题目:UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language
报告摘要:We introduce UbiPhysio, a milestone framework that delivers fine-grained action description and feedback in natural language to support people's daily functioning, fitness, and rehabilitation activities. This expert-like capability assists users in properly executing actions and maintaining engagement in remote fitness and rehabilitation programs. Specifically, the proposed UbiPhysio framework comprises a fine-grained action descriptor and a knowledge retrieval-enhanced feedback module. The action descriptor translates action data, represented by a set of biomechanical movement features we designed based on clinical priors, into textual descriptions of action types and potential movement patterns. Building on physiotherapeutic domain knowledge, the feedback module provides clear and engaging expert feedback. We evaluated UbiPhysio's performance through extensive experiments with data from 104 diverse participants, collected in a home-like setting during 25 types of everyday activities and exercises. We assessed the quality of the language output under different tuning strategies using standard benchmarks. We conducted a user study to gather insights from clinical physiotherapists and potential users about our framework. Our initial tests show promise for deploying UbiPhysio in real-life settings without specialized devices. |
|
PCC_M011 | |
报告题目:Integrating Gaze and Mouse Via Joint Cross-Attention Fusion Net for Students’ Activity Recognition in E-learning
报告摘要:E-learning has emerged as an indispensable educational mode in the post-epidemic era. However, this mode makes it difficult for students to stay engaged in learning without appropriate activity monitoring. Our work explores a promising solution that combines gaze and mouse data to recognize students' activities, thereby facilitating activity monitoring and analysis during e-learning. We initially surveyed 200 students from a local university, finding more acceptance for eye trackers and mouse loggers compared to video surveillance. We then designed eight students' routine digital activities to collect a multimodal dataset and analyze the patterns and correlations between gaze and mouse across various activities. Our proposed Joint Cross-Attention Fusion Net, a multimodal activity recognition framework, leverages the gaze-mouse relationship to yield improved classification performance by integrating cross-modal representations through a cross-attention mechanism and integrating the joint features that characterize gaze-mouse coordination. Evaluation results show that our method can achieve up to 94.87% F1 score in predicting 8-classes activities, with an improvement of at least 7.44% over using gaze or mouse data independently. This research illuminates new possibilities for monitoring student engagement in intelligent education systems, also suggesting a promising strategy for melding perception and action modalities in behavioral analysis across a range of ubiquitous computing environments. |
PCC_W001 | |
---|---|
报告题目:LiqDetector: Enabling Container-Independent Liquid Detection with mmWave Signals Based on a Dual-Reflection Model Learning
报告摘要:With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material).In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container’s impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid’s refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas. |
|
PCC_W002 | |
报告题目:WiProfile: Unlocking Diffraction Effects for Sub-Centimeter Target Profiling Using Commodity WiFi Devices
报告摘要:Despite intensive research efforts in radio frequency noncontact sensing, capturing fine-grained geometric properties of objects, such as shape and size, remains an open problem using commodity WiFi devices. Prior attempts are incapable of characterizing object shape or size because they predominantly rely on weak signals reflected off objects in a very small number of directions. In this paper, motivated by the observation that the diffracted signals around an object between two WiFi devices carry the contour information of the object, we formulate the problem of reconstructing the 2D target profile and develop WiProfile, the first WiFi-based system that unlocks the diffraction effects for target profiling. We introduce a CSI-Profile model to characterize the relationship between the CSI measured at different target positions and the target profile in the diffraction zone. With suitable approximations, the inverse problem of deriving the target profile from CSI can be solved by the inverse Fresnel transform. To mitigate CSI measurement errors on commodity WiFi devices, we propose a novel antenna placement strategy. Comprehensive experiments demonstrate that WiProfile can accurately reconstruct profiles with median absolute errors of less than 1 cm under various conditions, and effectively estimate the profiles of everyday objects of diverse shapes, sizes, and materials. We believe this work opens up new directions for fine-grained target imaging using commodity WiFi devices. |
|
PCC_W003 | |
报告题目:Robust WiFi Respiration Sensing in the Presence of Interfering Individual
报告摘要:WiFi-based respiration sensing technology has gained increasing attention due to its contactless sensing capabilities and utilization of existing WiFi devices. However, existing studies are limited to certain scenarios without addressing the motion interference from other individuals. In this paper, we tackle the challenge of robust respiration sensing in the presence of other individuals. Specifically, through an in-depth examination of the correlation between respiratory signals and spatial beam patterns, we develop a respiratory-energy based approach to evaluate the diverse impact of dynamic interference on respiratory signals. When significant interference is detected, we employ a convex-optimization-based beam control strategy, which exploits the inherent characteristics of human respiration, to adaptively adjust the spatial beam pattern. This approach enables a robust and precise gain adjustment between the target and interfering individual, effectively mitigating the impact of interference. Experimental results demonstrate that our approach can reduce the mean absolute error (MAE) of respiration detection by up to 32% compared to state-of-the-art methods, significantly enhancing the accuracy and robustness of WiFi-based respiration sensing. |
|
PCC_W004 | |
报告题目:UWB-enabled Sensing for Fast and Effortless Blood Pressure Monitoring
报告摘要:Blood Pressure (BP) is a critical vital sign to assess cardiovascular health. However, existing cuff-based and wearable-based BP measurement methods require direct contact between the user's skin and the device, resulting in poor user experience and limited engagement for regular daily monitoring of BP. In this paper, we propose a contactless approach using Ultra-WideBand (UWB) signals for regular daily BP monitoring. To remove components of the received signals that are not related to the pulse waves, we propose two methods that utilize peak detection and principal component analysis to identify aliased and deformed parts. Furthermore, to extract BP-related features and improve the accuracy of BP prediction, particularly for hypertensive users, we construct a deep learning model that extracts features of pulse waves at different scales and identifies the different effects of features on BP. We build the corresponding BP monitoring system named RF-BP and conduct extensive experiments on both a public dataset and a self-built dataset. The experimental results show that RF-BP can accurately predict the BP of users. Over the self-built dataset, the mean absolute error (MAE) and standard deviation (SD) for SBP are 6.5 mmHg and 6.1 mmHg, and the MAE and SD for DBP are 4.7 mmHg and 4.9 mmHg. |
|
PCC_W005 | |
报告题目:PmTrack: Enabling Personalized mmWave-based Human Tracking
报告摘要:The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively. |
|
PCC_W006 | |
报告题目:Waffle: A Waterproof mmWave-based Human Sensing System inside Bathrooms with Running Water
报告摘要:The bathroom has consistently ranked among the most perilous rooms in households, with slip and fall incidents during showers posing a critical threat, particularly to the elders. To address this concern while ensuring privacy and accuracy, the mmWave-based sensing system has emerged as a promising solution. Capable of precisely detecting human activities and promptly triggering alarms in response to critical events, it has proved especially valuable within bathroom environments. However, deploying such a system in bathrooms faces a significant challenge: interference from running water. Similar to the human body, water droplets reflect substantial mmWave signals, presenting a major obstacle to accurate sensing. Through rigorous empirical study, we confirm that the interference caused by running water adheres to a Weibull distribution, offering insight into its behavior. Leveraging this understanding, we propose a customized Constant False Alarm Rate (CFAR) detector, specifically tailored to handle the interference from running water. This innovative detector effectively isolates human-generated signals, thus enabling accurate human detection even in the presence of running water interference. Our implementation of "Waffle" on a commercial off-the-shelf mmWave radar demonstrates exceptional sensing performance. It achieves median errors of 1.8cm and 6.9cm for human height estimation and tracking, respectively, even in the presence of running water. Furthermore, our fall detection system, built upon this technique, achieves remarkable performance (a recall of 97.2% and an accuracy of 97.8%), surpassing the state-of-the-art method. |
|
PCC_W007 | |
报告题目:XRF55: A Radio Frequency Dataset for Human Indoor Action Analysis
报告摘要:Radio frequency (RF) devices such as Wi-Fi transceivers, radio frequency identification tags, and millimeter-wave radars have appeared in large numbers in daily lives. The presence and movement of humans can affect the propagation of RF signals, further, this phenomenon is exploited for human action recognition. However, current works have many limitations, including the unavailability of datasets, insufficient training samples, and simple or limited action categories for specific applications, which seriously hinder the growth of RF solutions, presenting a significant obstacle in transitioning RF sensing research from the laboratory to a wide range of everyday life applications. To facilitate the transitioning, in this paper, we introduce and release a large-scale multiple radio frequency dataset, named XRF55, for indoor human action analysis. XRF55 encompasses 42.9K RF samples and 55 action classes of human-object interactions, human-human interactions, fitness, body motions, and human-computer interactions, collected from 39 subjects within 100 days. These actions were meticulously selected from 19 RF sensing papers and 16 video action recognition datasets. XRF55 contains 23 RFID tags at 922.38MHz, 9 Wi-Fi links at 5.64GHz, one mmWave radar at 60-64GHz, and one Azure Kinect with RGB+D+IR sensors, covering frequency across decimeter wave, centimeter wave, and millimeter wave. In addition, we apply a mutual learning strategy over XRF55 for the task of action recognition. Unlike simple modality fusion, under mutual learning, three RF modalities are trained collaboratively and then work solely. We find these three RF modalities will promote each other. It is worth mentioning that, with synchronized Kinect, XRF55 also supports the exploration of action detection, action segmentation, pose estimation, human parsing, mesh reconstruction, etc., with RF-only or RF-Vision approaches. |
|
PCC_W008 | |
报告题目:Beamforming for Sensing: Hybrid Beamforming based on Transmitter-Receiver Collaboration for Millimeter-Wave Sensing
报告摘要:Previous mmWave sensing solutions assumed good signal quality. Ensuring an unblocked or strengthened LoS path is challenging. Therefore, finding an NLoS path is crucial to enhancing perceived signal quality. This paper proposes Trebsen, a Transmitter-REceiver collaboration-based Beamforming SENsing using commercial mmWave radars. Specifically, we define the hybrid beamforming problem as an optimization challenge involving beamforming angle search based on transmitter-receiver collaboration. We derive a comprehensive expression for parameter optimization by modeling the signal attenuation variations resulting from the propagation path. To comprehensively assess the perception signal quality, we design a novel metric perceived signal-to-interference-plus-noise ratio (PSINR), combining the carrier signal and baseband signal to quantify the fine-grained sensing motion signal quality. Considering the high time cost of traversing or randomly searching methods, we employ a search method based on deep reinforcement learning to quickly explore optimal beamforming angles at both transmitter and receiver. We implement Trebsen and evaluate its performance in a fine-grained sensing application (i.e., heartbeat). Experimental results show that Trebsen significantly enhances heartbeat sensing performance in blocked or weakened LoS scenes. Comparing non-beamforming, Trebsen demonstrates a reduction of 23.6% in HR error and 27.47% in IBI error. Moreover, comparing random search, Trebsen exhibits a 90% increase in speed. |
|
PCC_W009 | |
报告题目:Understanding the Diffraction Model in Static Multipath-Rich Environments for WiFi Sensing System Design
报告摘要:Although WiFi-based contactless sensing has made significant progress in the past decade, most prior work still focus on the reflection zone far from WiFi transceivers, while few studies explore the diffraction zone near transceivers. Additionally, previous diffraction models only consider the CSI amplitude signal and ignore the impact of multipath. In this work, we develop an accurate diffraction model to characterize the relationship between both CSI amplitude and phase and target's movement in the diffraction zone. We further put forward the deformation forms of the model under static multipath conditions and find that the CSI patterns vary significantly with multipath. Consequently, the common assumption of a one-to-one mapping between CSI patterns and activities in existing work fails due to multipaths, degrading sensing performance when multipath changes. To address this challenge, we propose to extract a relative change pattern from CSI signals to recover the one-to-one mapping relations and eliminate the impact of static multipath. Extensive experiments under various multipath conditions demonstrate an accuracy higher than 96% for the coarse-grained intrusion detection and an average error rate of 0.6bpm for the fine-grained respiration monitoring. |
|
PCC_W010 | |
报告题目:Push the Limit of Highly Accurate Ranging on Commercial UWB Devices
报告摘要:Ranging plays a crucial role in many wireless sensing applications. Among the wireless techniques employed for ranging, Ultra-Wideband (UWB) has received much attention due to its excellent performance and widespread integration into consumer-level electronics. However, the ranging accuracy of the current UWB systems is limited to the centimeter level due to bandwidth limitation, hindering their use for applications that require a very high resolution. This paper proposes a novel system that achieves sub-millimeter-level ranging accuracy on commercial UWB devices for the first time. Our approach leverages the fine-grained phase information of commercial UWB devices. To eliminate the phase drift, we design a fine-grained phase recovery method by utilizing the bi-directional messages in UWB two-way ranging. We further present a dual-frequency switching method to resolve phase ambiguity. Building upon this, we design and implement the ranging system on commercial UWB modules. Extensive experiments demonstrate that our system achieves a median ranging error of just 0.77 mm, reducing the error by 96.54% compared to the state-of-the-art method. We also present three real-life applications to showcase the fine-grained sensing capabilities of our system, including i) smart speaker control, ii) free-style user handwriting, and iii) 3D tracking for virtual-reality (VR) controllers. |
|
PCC_W011 | |
报告题目:Revisiting Cardinality Estimation in COTS RFID Systems
报告摘要:With 30 billion RFID tags sold worldwide in 2021, a common basic functionality needed by RFID-enabled applications is cardinality estimation --- to quickly estimate the number of distinct tags in an RFID system. Although many advanced solutions have been proposed over the past decade, they suffer from one major limitation in practical use: they need to either modify the existing RFID standard or obtain MAC-layer information, both of which however cannot be supported by commercial off-the-shelf (COTS) devices. In this paper, we revisit the counting problem and propose a novel counting scheme called average time duration based counter (ATD) that quickly estimates the number of distinct tags in a standards-compliant manner. Compared with existing work, the competitive advantage of ATD is that it can be directly deployed on a COTS RFID system, with no need for any hardware modifications. In ATD, we found a new and measurable indicator --- the time duration between two adjacent singleton slots, which depends on the number of tags. Following this observation, we derive the theoretical relationship between the time indicator and the number of tags and then give the proof of the estimation as well as its parameter settings. Additionally, we propose a flag-flipping solution to address the overlapping problem in the multi-reader case. We implement ATD in a COTS RFID system with 1000 tags. Experimental results show that ATD is 4.2 times faster than the baseline of tag inventory; the performance gain will be further increased in a larger RFID system. |
PCC_C001 | |
---|---|
报告题目:Make Partition Fit Task: A Novel Framework for Joint Learning of City Region Partition and Representation
报告摘要:The proliferation of multimodal big data in cities provides unprecedented opportunities for modeling and forecasting urban problems, e.g., crime prediction and house price prediction, through data-driven approaches. A fundamental and critical issue in modeling and forecasting urban problems lies in identifying suitable spatial analysis units, also known as city region partition. Existing works rely on subjective domain knowledge for static partitions, which is general and universal for all tasks. In fact, different tasks may need different city region partitions. To address this issue, we propose a task-oriented framework for Joint Learning of region Partition and Representation (JLPR for short hereafter). To make partition fit task, JLPR integrates the region partition into the representation model training and learns region partitions using the supervision signal from the downstream task. We evaluate the framework on two prediction tasks (i.e., crime prediction and housing price prediction) in Chicago. Experiments show that JLPR consistently outperforms state-of-the-art partitioning methods in both tasks, which achieves above 25% and 70% performance improvements in terms of Mean Absolute Error (MAE) for crime prediction and house price prediction tasks, respectively. Additionally, we meticulously undertake three visualization case studies, which yield profound and illuminating findings from diverse perspectives, demonstrating the remarkable effectiveness and superiority of our approach. |
|
PCC_C002 | |
报告题目:Coupling Makes Better: An Intertwined Neural Network for Taxi and Ridesourcing Demand Co-Prediction
报告摘要:While a variety of innovative travel modes, such as taxi service and ridesourcing service, have been launched to improve the transportation efficiency, people still encounter travel problems in real life. The major cause is the imbalance between transportation supply and demand. To strike a balance, it is well-recognized that an accurate and timely passenger demand prediction model is the foundation to enable high-level human intelligence (i.e., taxi drivers) or machine intelligence (i.e., ride- hailing platforms) to allocate resources in advance. Although quite a lot of deep models have been designed to model the complicated spatial and temporal dependencies in a data-driven way, they focus on the demand prediction of a single mode and ignore the fact that passengers may shift between different modes, especially between taxis and ridesourcing cars. In this paper, we target a co-prediction problem that considers the prediction of taxi and ridesourcing as two coupled and associated tasks, and propose a novel Temporal and Spatial Intertwined Network (TSIN) that consists of two twin components and an intertwined component. Each twin in the TSIN model is able to extract spatial and temporal dependencies from its corresponding travel mode separately (i.e., intra-mode features), and the in-between intertwined component is designed to bridge the twins and allow them to exchange information (i.e., inter-mode features), thus enabling better prediction. We first evaluate our model on four real-world datasets. Results demonstrate the outstanding performance of our model and the necessity to take into account the influence between modes. Based on an additional demand data from bike in NYC, we then discuss the generalizability in coupling more transportation modes. Further results demonstrate that our proposed intertwined neural network is highly flexible and extendable, and can yield better prediction performance. |
|
PCC_C003 | |
报告题目:Seeking Based on Dynamic Prices: Higher Earnings and Better Strategies in Ride-on-demand Services
报告摘要:In recent years, ride-on-demand (RoD) services such as Uber and DiDi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD service, the dynamic price is a new and accurate indicator describing the supply and demand, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we propose to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first justfiy why it is necessary to recommend seeking routes and consider dynamic prices, by analyzing real service data from a typical RoD service. We then design a reinforcement learning model based on order and GPS trajectories datasets, and take into account dynamic prices in the design. Results prove that our model improves both driver earnings and seeking strategies. On driver earnings, the reinforcement learning model increases revenue efficiency by up to 34.52%, and considering dynamic prices leads to another increase of 6.19%. On seeking strategies, drivers are encouraged to serve local demand first, and they are redistributed more evenly and effectively. |
|
PCC_C004 | |
报告题目:Data-Driven Pick-Up Location Recommendation for Ride-Hailing Services
报告摘要:Ride-hailing service (RHS) has become an important transportation mode in our daily life. Although many works have been proposed to improve RHS from different aspects, only few works focus on the selections of pick-up locations, where rider and driver meet and start a trip. In this paper, we present MPLRec, a data-driven pick-up location recommendation system that exploits riders' specific mobility demands, e.g., destination, and historical experiences to meet riders’ travel requirements. MPLRec generates potential pick-up locations over the road network and characterizes them with rich features that describe a location from the riders' perspective. We also build spatio-temporal indexes to organize potential pick-up locations and historical data for facilitating online recommending. When processing an online recommendation request, MPLRec derives candidate pick-up locations and investigates them with materialized features, which are computed from historical order and trajectory data while considering rider’s mobility demands. Based on these features, a novel scoring function is used to derive the best pick-up location for each request. Moreover, we implement an RHS simulator to evaluate MPLRec using large-scale practical ride-hailing datasets. Extensive experiments and simulations demonstrate the effectiveness and efficiency of MPLRec, which can complete each request within 0.5 s and largely reduce the ride-hailing costs when compared to baseline methods. |
|
PCC_C005 | |
报告题目:Federated Representation Learning With Data Heterogeneity for Human Mobility Prediction
报告摘要:The advancement of smart wearable devices and location-based smart services has enabled a new paradigm for smart human mobility prediction (HMP), which has a broad range of applications in smart healthcare and smart cities. Due to the privacy concerns and rigorous data regulations, federated learning provides a distributed learning framework to collaboratively train the HMP model without sharing the highly sensitive location data with others. However, in real-world scenarios, federated human mobility prediction suffers from data heterogeneity challenge, which includes two main aspects: heterogeneity mobility patterns, and data scarcity. In this paper, we propose an end-to-end federated representation learning framework for human mobility prediction, named FR-HMP, to overcome all the above obstacles. Specially, in order to enhance the representation abilities of data-scarcity clients, a two-phase learning process is proposed. The clustering module could cluster similar clients together on the parameter server to address the heterogeneous mobility patterns, and the representation learning module learns the enhanced representations of each client through the graph learning layer and graph convolution layer on the third-part server. Finally, extensive experiments are conducted using two diverse real-world HMP datasets to show the advantages of FR-HMP over state-of-the-art methods. |
|
PCC_C006 | |
报告题目:A Unified Model for Spatio-Temporal Prediction Queries with Arbitrary Modifiable Areal Units
报告摘要:Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly to support. Secondly, different ST models may produce conflicting outputs, resulting in confusing predictions. In this paper, we propose One4All-ST, a framework that can conduct ST prediction for arbitrary modifiable areal units using only one model. To reduce the cost of getting multiscale predictions, we design an ST network with hierarchical spatial modeling and scale normalization modules to efficiently and equally learn multi-scale representations. To address prediction inconsistencies across scales, we propose a dynamic programming scheme to solve the formulated optimal combination problem, minimizing predicted error through theoretical analysis. Besides, we suggest using an extended quad-tree to index the optimal combinations for quick response to arbitrary modifiable areal units in practical online scenarios. Extensive experiments on two real-world datasets verify the efficiency and effectiveness of One4All-ST in ST prediction for arbitrary modifiable areal units. The source codes and data of this work are available at https://github.com/uctb/One4All-ST. |
|
PCC_C007 | |
报告题目:Privacy Leakage from Dynamic Prices: Trip Purpose Mining as an Example
报告摘要:Dynamic prices are used in many scenarios, e.g., flight ticketing, hotel room booking and ride-on-demand (RoD) service such as Uber and DiDi, and while they are beneficial for service providers, practitioners or users, they lead to the concern of privacy leakage -- the possibility of learning user information from dynamic prices. In this paper, we aim to study this possibility and choose trip purpose mining in RoD service as an attack example, based on real-world large datasets. We discuss the criteria of choosing datasets -- ubiquitous, collective and easily accessible -- from the perspective of an attacker, and extract features describing trip information, spatio-temporal and dynamic prices context. The trip purpose mining problem is then solved as a multi-class classification problem and multiple binary-class problems. In the multi-class problem, we verify that dynamic prices information results in a 17.1% improvement in classification accuracy; in the binary-class problems, we quantify feature contributions and explain the different extents of privacy leakage in identifying different trip purposes. Our hope is that the study not only serves as a case study demonstrating the privacy leakage problem in RoD service, but also sheds light on such privacy problem in other services using dynamic prices and triggers more research efforts. |
|
PCC_C008 | |
报告题目:RF-Boundary: RFID-Based Virtual Boundary
报告摘要:A boundary is a physical or virtual line that marks the edge or limit of a specific region, which has been widely used in many applications, such as autonomous driving, virtual wall, and robotic lawn mowers. However, none of existing work can well balance the cost, the deployability, and the scalability of a boundary. In this paper, we propose a new RFID-based boundary scheme together with its detection algorithm called RF-Boundary, which has the competitive advantages of being battery-free, lowcost, and easy-to-maintain. We develop two technologies of phase gradient and dual-antenna DoA to address the key challenges posed by RF-boundary, in terms of lack of calibration information and multi-edge interference. We implement a prototype of RF-Boundary with commercial RFID systems and a mobile robot. Extensive experiments verify the feasibility as well as the good performance of RF-Boundary. |
|
PCC_C009 | |
报告题目:Adaptive Budgeting for Collaborative Multi-Task Data Collection in Online Sparse Crowdsensing
报告摘要:Sparse crowdsensing collects data from a subset of the sensing area and infers data for unsensed areas, reducing data collection costs. Previous works have primarily focused on independently collecting and inferring single types of data. However, real-world scenarios often involve multiple types of data that can complement each other by providing missing spatiotemporal distribution information. In this paper, we fully consider both intra-data correlations among data of the same type and inter-data correlations among data of different types, enabling collaborative execution of various tasks. In addition, we enhance the adaptability in practical application scenarios by utilizing real-time collected sparse data to guide task execution. For this purpose, we propose a multi-task adaptive budgeting framework for online sparse crowdsensing, called MTAB-SC. This framework consists of three parts: training data updating, data inference, and data collection. First, we propose a multi-task data updating method to keep models up-to-date. Second, we design a data inference network for multi-task data joint inference. Finally, to allocate suitable budgets for each task and facilitate collaborative data collection across multiple tasks, we propose an Adaptive Budgeting for Collaborative Data Collection model (AB-CoDC). The effectiveness of our proposals is demonstrated through extensive experiments on two real-world datasets. |
|
PCC_C010 | |
报告题目:Edge-Assisted Spectrum Sharing for Freshness-Aware Industrial Wireless Networks: A Learning-Based Approach
报告摘要:Information freshness is essential to industrial wireless networks (IWNs) and can be quantified by the age-ofinformation (AoI) metric. This paper addresses an AoI-aware spectrum sharing (AgeS) problem in IWNs, where multiple device-to-device (D2D) links opportunistically access the spectrum to satisfy their AoI constraints while maximizing primal links’ throughput. Particularly, we orchestrate the access of D2D links in a distributed manner. Since distributed scheduling results in incomplete observation, D2D links share the spectrum with uncertainty on the transmission environment. Therefore, we propose a distributed scheduling scheme, called D-age, to deal with the transmission uncertainty in the AgeS problem, where an adaptation of actor-critic method is adopted with AoI constraints tackled in the dual domain. To address the non-stationary environment and multi-agent credit assignment issue, cooperative multi-agent reinforcement learning (MARL) approach is developed, where multiple local actors are designed to guide D2D links to make real-time decisions via distributed scheduling policies, which are evaluated by an edge-assisted global critic with action-aware advantage functions. Integrated with graph attention networks (GATs), the critic selectively learns contextual information by assigning different importances to neighboring links, which enables the evaluation of scheduling policies in a scalable and computation-efficient manner. Theoretical guarantee of the time-averaged AoI constraints is provided and the effectiveness of D-age in terms of both AoI violation ratio and the capacity of primal links is demonstrated by simulation. |
|
PCC_C011 | |
报告题目:Citywide LoRa Network Deployment and Operation: Measurements, Analysis, and Implications
报告摘要:LoRa, as a representative Low-Power Wide-Area Network (LPWAN) technology, holds tremendous potential for various city and industrial applications. However, as there are few real large-scale deployments, it is unclear whether and how well LoRa can eventually meet its prospects. In this paper, we demystify the real performance of LoRa by deploying and measuring a citywide LoRa network, named CityWAN, which consists of 100 gateways and 19,821 LoRa end nodes, covering an area of 130 km^2 for 12 applications. Our measurement focuses on the following perspectives: (i) Performance of applications running on the citywide LoRa network; (ii) Infrastructure efficiency and deployment optimization; (iii) Physical layer signal features and link performance; (iv) Energy profiling and cost estimation for LoRa applications. The results reveal that LoRa performance in urban settings is bottlenecked by the prevalent blind spots, and there is a gap between the gateway efficiency and network coverage for the infrastructure deployment. Besides, we find that LoRa links at the physical layer are susceptible to environmental variations, and LoRa and other LPWANs show diverse costs for different scenarios. Our measurement provides insights for large-scale LoRa network deployment and also for future academic research to fully unleash the potential of LoRa. |
PCC_S001 | |
---|---|
报告题目:RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFID
报告摘要:Eavesdropping on human sound is one of the most common but harmful ways to threaten personal privacy. As one of the most essential accessories, headsets have been widely used in common online conversations, such as online calls, video meetings, etc. The metal coil vibration patterns of headset speakers/microphones have been proven to be highly correlated with the speaker-produced/microphone-received sound content. This paper presents an online conversation eavesdropping system, RFSpy, which uses only one RFID tag attached on a headset to alternately sense the metal coil vibrations of headset speaker and microphone for eavesdropping on speaker-produced and microphone-received sound. In some accessible scenarios, such as meeting rooms, offices, etc., assuming attackers secretly attach a small, battery-free RFID tag under one ear cushion of an eavesdropped user’s headset without being noticed. Meanwhile, RFID readers are camouflaged as decorations placed in/out of rooms to transmit and receive RF signals. When the eavesdropped user talks with other users online by using the headset, RFSpy first activates the RFID tag attached on the headset to capture the metal coil vibration patterns of headset speaker and microphone upon RF signals. Then, RFSpy reconstructs sound spectrograms from the RF signal-based vibration patterns for not only trained words but also untrained (i.e., out-of-vocabulary) words by utilizing a designed Sound Spectrogram Reconstruction (SSR) network. Finally, RFSpy converts the sound spectrograms to conversation content through a sound recognition API. Extensive experiments in real environments demonstrate that RFSpy can eavesdrop on online conversations with out-of-vocabulary (OOV) words effectively. |
|
PCC_S002 | |
报告题目:The EarSAVAS Dataset: Enabling Subject-Aware Vocal Activity Sensing on Earables
报告摘要:Subject-aware vocal activity sensing on wearables, which specifically recognizes and monitors the wearer's distinct vocal activities, is essential in advancing personal health monitoring and enabling context-aware applications. While recent advancements in earables present new opportunities, the absence of relevant datasets and effective methods remains a significant challenge. In this paper, we introduce EarSAVAS, the first publicly available dataset constructed specifically for subject-aware human vocal activity sensing on earables. EarSAVAS encompasses eight distinct vocal activities from both the earphone wearer and bystanders, including synchronous two-channel audio and motion data collected from 42 participants totaling 44.5 hours. Further, we propose EarVAS, a lightweight multi-modal deep learning architecture that enables efficient subject-aware vocal activity recognition on earables. To validate the reliability of EarSAVAS and the efficiency of EarVAS, we implemented two advanced benchmark models. Evaluation results on EarSAVAS reveal EarVAS's effectiveness with an accuracy of 90.84% and a Macro-AUC of 89.03%. Comprehensive ablation experiments were conducted on benchmark models and demonstrated the effectiveness of feedback microphone audio and highlighted the potential value of sensor fusion in subject-aware vocal activity sensing on earables. We hope that the proposed EarSAVAS and benchmark models can inspire other researchers to further explore efficient subject-aware human vocal activity sensing on earables. |
|
PCC_S003 | |
报告题目:Self-supervised domain exploration with an Optimal Transport Regularization for Open Set Cross-domain Speech Emotion Recognition
报告摘要:In the tasks of domain adaptation (DA) for speech emotion recognition (SER), self-supervised learning (SSL) algorithms could effectively explore domain and structural information from target domain samples, thereby mitigating domain discrepancies. However, in a general setting, when the target domain contains emotions that are never observed in the source domain, namely in open-set DA, existing SSL-based DA methods cannot maintain the robust- ness because of the interference of the extra unknown classes. To address this challenge, we propose the self-supervised domain exploration with an optimal transport (OT) regularization (SDEOTR) algorithm. First, we integrate the SSL algorithm into the SER model to mitigate the domain differences. Further, we categorize target domain samples into known and unknown groups based on the net- work’s prediction confidence. Finally, we employ OT to maximize the global probability distance between the two groups, aiming to decrease the impact of unknown emotions on the SER model. Cross-domain SER experimental results showed that our label-free SDEOTR significantly improved the performance of existing adaptive SER algorithms in open-set scenarios. |
|
PCC_S004 | |
报告题目:AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception
报告摘要:As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been demonstrated to be vulnerable to adversarial examples, which have already raised general concerns in both academia and industry. Existing attacks follow the same adversarial example generation paradigm from computer vision, i.e., overlaying the optimized additive perturbations on original voices. However, due to the additive perturbations’ nature on human audibility, balancing the stealthiness and attack capability remains a challenging problem. In this paper, we rethink the stealthiness of audio adversarial examples and turn to introduce another kind of audio distortion, i.e., reverberation, as a new perturbation format for stealthy adversarial example generation. Such convolutional adversarial perturbations are crafted as real-world impulse responses and behave as a natural reverberation for deceiving humans. Based on this idea, we propose AdvReverb to construct, optimize, and deliver phoneme-level convolutional adversarial perturbations on both speech and music carriers with a well-designed objective. |
|
PCC_S005 | |
报告题目:MoiréVision: A Generalized Moiré-based Mechanism for 6-DoF Motion Sensing
报告摘要:Ultra-high precision motion sensing leveraging computer vision (CV) is a key technology in many high-precision AR/VR applications such as precise industrial manufacture and image-guided surgery, yet conventional CV can be challenged by moiré-based sensing mechanism, thanks to moiré pattern’s high sensitivity to six degrees of freedom (6-DoF) pose changes. Unfortunately, existing moiré-based solutions, in their infancy, cannot deal with complicated curvilinear moiré patterns caused by various perspective angles. In this paper, we propose a generalized moiré-based mechanism, MoiréVision, towards practical adoptions; it relies on high-frequency gratings as visual marker to help extract the fine-grained feature points for ultra-high precision motion sensing. As the foundation of general moiré-based sensing, we propose a formulation to characterize uncontrolled curvilinear moiré patterns in practical scenarios. To deal with the problem of moiré feature interference in practice, we propose a Gabor-based algorithm to separate overlapped curvilinear moiré patterns from two dimensions. Furthermore, to extract fine-grained feature points for high-precision motion sensing, we propose a bending function-based model and a resolution-enhanced strategy to reconstruct detailed texture of moiré markers and extract moiré feature points at sub-pixel level. Extensive experimental results show that MoiréVision greatly enhances the usability and generalizability of moiré-based sensing systems in real-world applications. |
|
PCC_S006 | |
报告题目:EFEVD: Enhanced Feature Extraction for Smart Contract Vulnerability Detection
报告摘要:Because of the wide deployment of smart contracts, smart contract vulnerabilities pose a challenging risk to blockchain security. Currently, deep learning-based vulnerability detection is a very attractive solution due to its ability to identify complex patterns and features. The existing methods mainly consider the contract code content features, expert knowledge patterns, and contract code modalities. To further enhance smart contract vulnerability detection, this paper attempts to identify community features from smart contracts with similar semantic and syntactic structures, and shared features from two related vulnerability detection tasks, vulnerability classification and localization. The experimental results verify that the proposed approach significantly outperforms the state-of-the-art methods in terms of accuracy, recall, precision, and F1-score. |
|
PCC_S007 | |
报告题目:Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning
报告摘要:While decentralized training is attractive in multi-agent reinforcement learning (MARL) for its excellent scalability and robustness, its inherent coordination challenges in collaborative tasks result in numerous interactions for agents to learn good policies. To alleviate this problem, action advising methods make experienced agents share their knowledge about what to do, while less experienced agents strictly follow the received advice. However, this method of sharing and utilizing knowledge may hinder the team's exploration of better states, as agents can be unduly influenced by suboptimal or even adverse advice, especially in the early stages of learning. Inspired by the fact that humans can learn not only from the success but also from the failure of others, this paper proposes a novel knowledge sharing framework called Cautiously-Optimistic kNowledge Sharing} (CONS). CONS enables each agent to share both positive and negative knowledge and cautiously assimilate knowledge from others, thereby enhancing the efficiency of early-stage exploration and the agents' robustness to adverse advice. Moreover, considering the continuous improvement of policies, agents value negative knowledge more in the early stages of learning and shift their focus to positive knowledge in the later stages. Our framework can be easily integrated into existing Q-learning based methods without introducing additional training costs. We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance. |
|
PCC_S008 | |
报告题目:Minimizing Latency for Multi-DNN Inference on Resource-Limited CPU-Only Edge Devices
报告摘要:Despite considerable advancements in specialized hardware, the majority of IoT edge devices still rely on CPUs. The burgeoning number of IoT users amplifies the challenges associated with performing multiple Deep Neural Network inferences on these resource-limited, CPU-only edge devices. Existing strategies, including model compression, hardware acceleration, and model partitioning, often involve a trade-off in inference accuracy, are unsuitable due to hardware specificity, or lead to inefficient resource utilization. In response to these challenges, this paper introduces L-PIC (Latency Minimized Parallel Inference on CPU)—a framework expressly devised to optimize resource allocation, decrease inference latency, and maintain result accuracy on CPU-only edge devices. A series of comprehensive experiments have verified the superior efficiency and effectiveness of the L-PIC framework in comparison to the state-of-the-art method. Remarkably, compared to the state-of-the-art method, L-PIC can reduce the inference latency of multi-DNN by an average of approximately 30% across all tested scenarios. |
|
PCC_S009 | |
报告题目:Seeing through the Tactile: 3D Human Shape Estimation from Temporal In-Bed Pressure Images
报告摘要:Humans spend about one-third of their lives resting. Reconstructing human dynamics in in-bed scenarios is of considerable significance in sleep studies, bedsore monitoring, and biomedical factor extractions. However, the mainstream human pose and shape estimation methods mainly focus on visual cues, facing serious issues in non-line-of-sight environments. Since in-bed scenarios contain complicated human-environment contact, pressure-sensing bedsheets provide a non-invasive and privacy-preserving approach to capture the pressure distribution on the contact surface, and have shown prospects in many downstream tasks. However, few studies focus on in-bed human mesh recovery. To explore the potential of reconstructing human meshes from the sensed pressure distribution, we first build a high-quality temporal human in-bed pose dataset, TIP, with 152K multi-modality synchronized images. We then propose a label generation pipeline for in-bed scenarios to generate reliable 3D mesh labels with a SMPLify-based optimizer. Finally, we present PIMesh, a simple yet effective temporal human shape estimator to directly generate human meshes from pressure image sequences. We conduct various experiments to evaluate PIMesh’s performance, showing that PIMesh archives 79.17mm joint position errors on our TIP dataset. The results demonstrate that the pressure-sensing bedsheet could be a promising alternative for long-term in-bed human shape estimation. |
|
PCC_S010 | |
报告题目:基于边缘特征的视觉里程计定位方法
报告摘要:Visual odometry is important for plenty of applications such as autonomous vehicles, and robot navigation. It is challenging to conduct visual odometry in textureless scenes or environments with sudden illumination changes where popular feature-based methods or direct methods cannot work well. To address this challenge, some edge-based methods have been proposed, but they usually struggle between the efficiency and accuracy. In this work, we propose a novel visual odometry approach called EdgeVO, which is accurate, efficient, and robust. By efficiently selecting a small set of edges with certain strategies, we significantly improve the computational efficiency without sacrificing the accuracy. Compared to existing edge-based method, our method can significantly reduce the computational complexity while maintaining similar accuracy or even achieving better accuracy. This is attributed to that our method removes useless or noisy edges. Experimental results on the TUM datasets indicate that EdgeVO significantly outperforms other methods in terms of efficiency, accuracy and robustness. |