CHCI 顶会顶刊论文交流 - 人机协作与智能系统
主持人简介 | ||||||
---|---|---|---|---|---|---|
![]() 孙伟 副研究员 中国科学院软件研究所 |
简介: 孙伟,中国科学院软件研究所副研究员。博士毕业于中国科学院大学,曾在康奈尔大学进行过访问 学 习,从 事 自 然 人 机 交 互、智 慧 医 疗 等 方 面 研 究。在 人 机 交 互 国 际 顶 级 会 议(ACM CHI、UIST、 IMWUT/Ubicomp 等)发表论文十余篇,申请专利近十项。担任中国计算机学会人机交互专委会、体 视学与图像分析学会智慧医疗分委会等学会委员。参与新一代人工智能国家科技重大专项重大项目、国 家重点研发计划项目、自然科学基金重点项目,主持青年项目等国家课题,并主导了多项企业合作项目。 |
CHCI_M001 | |
---|---|
题目:Swift-Eye: Towards Anti-blink Pupil Tracking for Precise and Robust High-Frequency Near-Eye Movement Analysis with Event Camera
摘要:Eye tracking has shown great promise in many scientific fields and daily applications, ranging from the early detection of mental health disorders to foveated rendering in virtual reality (VR). These applications all call for a robust system for high-frequency near-eye movement sensing and analysis in high precision, which cannot be guaranteed by the existing eye tracking solutions with CCD/CMOS cameras. To bridge the gap, in this paper, we propose Swift-Eye, an offline precise and robust pupil estimation and tracking framework to support high-frequency near-eye movement analysis, especially when the pupil region is partially occluded. Swift-Eye is built upon the emerging event cameras to capture the high-speed movement of eyes in high temporal resolution. Then, a series of bespoke components are designed to generate high-quality near-eye movement video at a high frame rate over kilohertz and deal with the occlusion over the pupil caused by involuntary eye blinks. According to our extensive evaluations on EV-Eye, a large-scale public dataset for eye tracking using event cameras, Swift-Eye shows high robustness against significant occlusion. It can improve the IoU and F1-score of the pupil estimation by 20% and 12.5% respectively, compared with the second-best competing approach, when over 80% of the pupil region is occluded by the eyelid. Lastly, it provides continuous and smooth traces of pupils in extremely high temporal resolution and can support high-frequency eye movement analysis and a number of potential applications, such as mental health diagnosis, behaviour-brain association, etc. The implementation details and source codes can be found at https://github.com/ztysdu/Swift-Eye |
|
CHCI_M002 | |
题目:No need to integrate action information during coarse semantic processing of man-made tools
摘要:Action representation of man-made tools consists of two subtypes: structural action representation concerning how to grasp an object, and functional action representation concerning the skilled use of an object. Compared to structural action representation, functional action representation plays the dominant role in fine-grained (i.e., basic level) object recognition. However, it remains unclear whether the two types of action representation are involved differently in the coarse semantic processing in which the object is recognized at a superordinate level (i.e., living/non-living). Here we conducted three experiments using the priming paradigm, in which video clips displaying structural and functional action hand gestures were used as prime stimuli and grayscale photos of man-made tools were used as target stimuli. Participants recognized the target objects at the basic level in Experiment 1 (i.e., naming task) and at the superordinate level in Experiments 2 and 3 (i.e., categorization task). We observed a significant priming effect for functional action prime-target pairs only in the naming task. In contrast, no priming effect was found in either the naming or the categorization task for the structural action prime-target pairs (Experiment 2), even when the categorization task was preceded by a preliminary action imitation of the prime gestures (Experiment 3). Our results suggest that only functional action information is retrieved during fine-grained object processing. In contrast, coarse semantic processing does not require the integration of either structural or functional action information. |
|
CHCI_M003 | |
题目:EmoTake: Exploring Drivers' Emotion for Takeover Behavior Prediction
摘要:The blossoming semi-automated vehicles allow drivers to engage in various non-driving-related tasks, which may stimulate diverse emotions, thus affecting takeover safety. Though the effects of emotion on takeover behavior have recently been examined, how to effectively obtain and utilize drivers’ emotions for predicting takeover behavior remains largely unexplored. We propose EmoTake, a deep learning-empowered system that explores drivers’ emotional and physical states to predict takeover readiness, reaction time, and quality. The key enabler is a deep neural framework that extracts drivers’ fine-grained body movements from a camera and interprets them into drivers’ multi-channel emotional and physical information (e.g., facial expression, and head pose) for prediction. Our study (N = 26) verifies the efficiency of EmoTake and shows that: 1) facial expression benefits prediction; 2) emotions have diverse impacts on takeovers. Our findings provide insights into takeover prediction and in-vehicle emotion regulation. |
|
CHCI_M004 | |
题目:Co-designing the Collaborative Digital Musical Instruments for Group Music Therapy
摘要:Digital Musical Instruments (DMIs) have been integrated into group music therapy, providing therapists with alternative ways to engage in musical dialogues with their clients. However, existing DMIs used in group settings are primarily designed for individual use and often overlook the social dynamics inherent in group therapy. Recognizing the crucial role of social interaction in the effectiveness of group therapy, we argue that Collaborative Digital Musical Instruments (CDMIs), seamlessly integrating social interaction with musical expression, hold significant potential to enhance group music therapy. To better tailor CDMIs for group music therapy, we engaged in a co-design process with music therapists, designing and practicing group therapy sessions involving the prototype ComString. In the end, we reflected on the co-design case to suggest future directions for designing CDMIs in group music therapy. |
|
CHCI_M005 | |
题目:Towards benchmarking and assessing visual naturalness of physical world adversarial attacks
摘要:Physical world adversarial attack is a highly practical and threatening attack, which fools real world deep learning systems by generating conspicuous and maliciously crafted real world artifacts. In physical world attacks, evaluating naturalness is highly emphasized since human can easily detect and remove unnatural attacks. However, current studies evaluate naturalness in a case-by-case fashion, which suffers from errors, bias and inconsistencies. In this paper, we take the first step to benchmark and assess visual naturalness of physical world attacks, taking autonomous driving scenario as the first attempt. First, to benchmark attack naturalness, we contribute the first Physical Attack Naturalness (PAN) dataset with human rating and gaze. PAN verifies several insights for the first time: naturalness is (disparately) affected by contextual features (i.e., environmental and semantic variations) and correlates with behavioral feature (i.e., gaze signal). Second, to automatically assess attack naturalness that aligns with human ratings, we further introduce Dual Prior Alignment (DPA) network, which aims to embed human knowledge into model reasoning process. Specifically, DPA imitates human reasoning in naturalness assessment by rating prior alignment and mimics human gaze behavior by attentive prior alignment. We hope our work fosters researches to improve and automatically assess naturalness of physical world attacks. Our code and dataset can be found at https://github.com/zhangsn-19/PAN |
|
CHCI_M006 | |
题目:G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
摘要:Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze — a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables — remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users’ gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users’ query language and a gaze-voice coordination pattern in users’ natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cuttingedge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems. |
|
CHCI_M007 | |
题目:See Widely, Think Wisely: Toward Designing a Generative Multi-agent System to Burst Filter Bubbles
摘要:The proliferation of AI-powered search and recommendation systems has accelerated the formation of “filter bubbles” that reinforce people’s biases and narrow their perspectives. Previous research has attempted to address this issue by increasing the diversity of information exposure, which is often hindered by a lack of user motivation to engage with. In this study, we took a human-centered approach to explore how Large Language Models (LLMs) could assist users in embracing more diverse perspectives. We developed a prototype featuring LLM-powered multi-agent characters that users could interact with while reading social media content. We conducted a participatory design study with 18 participants and found that multi-agent dialogues with gamification incentives could motivate users to engage with opposing viewpoints. Additionally, progressive interactions with assessment tasks could promote thoughtful consideration. Based on these findings, we provided design implications with future work outlooks for leveraging LLMs to help users burst their filter bubbles. |
|
CHCI_M008 | |
题目:PM-Vis: A Visual Analytics System for Tracing and Analyzing the Evolution of Pottery Motifs
摘要:In Chinese archaeological research, analyzing the evolution of motifs in ancient pottery is crucial for studying the spread and growth of cultures across various eras and regions. However, such analyses are often challenging due to the complexities of identifying motifs with evolutionary connections that may manifest concurrent changes in appearance, space, and time, compounded by ineffective documentation. We propose PM-Vis, a visual analytics system for tracing and analyzing the evolution of pottery motifs. PM-Vis is anchored in a “selection-organization-documentation” workflow. In the selection stage, we design a three-fold projection paired with a motif-based search mechanism, displaying the appearance similarity and temporal and spatial proximities of all motifs or a specific motif, aiding users in selecting motifs with evolutionary connections. The organization stage helps users establish the evolutionary sequence and segment the selected motifs into distinct evolutionary phases. Finally, the documentation stage enables users to record their observations and insights through various forms of annotation. We demonstrate the usefulness and effectiveness of PM-Vis through two case studies, expert feedback, and a user study. |
|
CHCI_M009 | |
题目:Learning from User-driven Events to Generate Automation Sequences
摘要:Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans. |
|
CHCI_M010 | |
题目:Who Should Hold Control? Rethinking Empowerment in Home Automation among Cohabitants through the Lens of Co-Design
摘要:Recent HCI research has highlighted home automation’s potential in providing residents with technology-enhanced domestic autonomy. However, in the cohabitation context, the prevalent solutionist paradigm of automated systems introduces challenges to non-experts, paradoxically marginalizing specific members. This paper reports a co-creation initiative involving cohabitants, exploring a new understanding of empowerment in home automation. Participants collaborated to construct Trigger-Action Program (TAP) schemes using card-based tools during workshops. Our findings showcase how cohabitants engaged in collective ideations and embodied different negotiation patterns, which reveals the significance of more perceptible and participatory design. We frame home automation as "problematic co-design", arguing the universal overlook of collaborative resources. Furthermore, we examine how automation systems act as obstacles and sources of empowerment through the co-design lens. The paper concludes with pragmatic recommendations for designers and researchers, emphasizing the need to foster contestability for cohabitants in the evolving home automation landscape. |
|
CHCI_M011 | |
题目:A new dynamic spatial information design framework for AR-HUD to evoke drivers’ instinctive responses and improve accident prevention
摘要:Driver’s instinctive responses and skill-based behaviors enable them to react faster and better control their vehicle in dangerous situations. This study incorporated dynamic spatial information design (DSID) in an augmented reality head-up display (AR-HUD) under manual driving conditions. By integrating the skill, rule, and knowledge (SRK) taxonomy and situation awareness (SA) theory, our AR-HUD successfully evoked drivers’ instinctive responses and improved driving safety. First, we converted symbol and sign information processed at the knowledge-based and rule-based levels, respectively, into signal information processed at the skill-based level. Then we developed four AR-HUD interfaces with different dynamic designs for use in a hazardous scenario at an intersection. Finally, we investigated each design’s impact on drivers’ SA and driving performance. Experimental results demonstrated that our DSID enhanced drivers’ SA and accident-avoidance capabilities while reducing their cognitive workload. Among the four AR-HUD interfaces, the one that incorporated all three information elements under study (i.e., lateral warning, dynamic driving space, and speedometer) performed the best. This indicates that our proposed framework has potential applications in other similar dangerous driving scenarios, thus contributing to the development of safer and more efficient driving environments. |
|
CHCI_M012 | |
题目:Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students
摘要:Students’ increasing use of Artificial Intelligence (AI) presents new challenges for assessing their mastery of knowledge and skills in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students’ AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative world where they could freely employ AI in PBL while needing to report this process to assess their skills and contributions. Our workshops yielded various scenarios of students’ use of AI in PBL and ways of analyzing such usage grounded by students’ vision of how educational goals may transform. We also found that students with different attitudes toward AI exhibited distinct preferences in how to analyze and understand their use of AI. Based on these findings, we discuss future research opportunities on student-AI interactions and understanding AI-enhanced learning. |
|
CHCI_M013 | |
题目:Beyond Numbers: Creating Analogies to Enhance Data Comprehension and Communication with Generative AI
摘要:Unfamiliar measurements usually hinder readers from grasping the scale of the numerical data, understanding the content, and feeling engaged with the context. To enhance data comprehension and communication, we leverage analogies to bridge the gap between abstract data and familiar measurements. In this work, we first conduct semi-structured interviews with design experts to identify design problems and summarize design considerations. Then, we collect an analogy dataset of 138 cases from various online sources. Based on the collected dataset, we characterize a design space for creating data analogies. Next, we build a prototype system, AnalogyMate, that automatically suggests data analogies, their corresponding design solutions, and generated visual representations powered by generative AI. The study results show the usefulness of AnalogyMate in aiding the creation process of data analogies and the effectiveness of data analogy in enhancing data comprehension and communication. |
|
CHCI_M014 | |
题目:From Gap to Synergy: Enhancing Contextual Understanding through Human-Machine Collaboration in Personalized Systems
摘要:This paper presents LangAware, a collaborative approach for constructing personalized context for context-aware applications. The need for personalization arises due to significant variations in context between individuals based on scenarios, devices, and preferences. However, there is often a notable gap between humans and machines in the understanding of how contexts are constructed, as observed in trigger-action programming studies such as IFTTT. LangAware enables end-users to participate in establishing contextual rules in-situ using natural language. The system leverages large language models (LLMs) to semantically connect low-level sensor detectors to high-level contexts and provide understandable natural language feedback for effective user involvement. We conducted a user study with 16 participants in real-life settings, which revealed an average success rate of 87.50% for defining contextual rules in a variety of 12 campus scenarios, typically accomplished within just two modifications. Furthermore, users reported a better understanding of the machine’s capabilities by interacting with LangAware. |
|
CHCI_M015 | |
题目:Space Brain: An AI Autonomous Spatial Decision System
摘要:Ubiquitous computing has proposed the idea of seamless integration of computing devices into smart spaces earlier. The proliferation of Internet of Things (IoT) technologies offers users an expanding array of device control options, including voice activation and home automation. However, these technologies necessitate user intervention for adjusting device state, relying on information derived from the environment and events within the smart space. This article introduces an AI autonomous decision system designed to address the reliance on user intervention in adjusting device states within IoT-enabled smart space. The proposed system consists of a sensing layer, transmission layer, decision layer, and execution layer, responsible for information sensing, transmission, decision generation, and device control, respectively. The decision layer incorporates a large language model (LLM) and accompanying modules to facilitate real-time decision making. This study validates the core functionalities of the system within an unmanned smart home and examines the advantages, disadvantages, potential security risks, and future development directions of the AI autonomous decision. |
CHCI 顶会顶刊论文交流 - 辅助技术与无障碍设计
主持人简介 | ||||||
---|---|---|---|---|---|---|
![]() 范明明 教授 香港科技大学 |
简介: 范明明,香港科技大学(广州)信息枢纽计算媒体与艺术学域与物联网学域助理教授、博士生导师、兼任香港科技大学综合系统与设计系和计算机科学与工程系教职。2019年于多伦多大学计算机科学系获博士学位。研究领域为智能人机交互,研究方向包括:1)智能无障碍辅具与“适老化”交互技术设计;2)人智协作;3)多模态虚拟与增强现实的交互技术。在人机交互与无障碍研究领域的国际重要学术期刊与会议上发表论文逾80篇,包括50余篇中国计算机学会(CCF)推荐的A类论文。研究多次荣获国际论文奖,包括人机交互顶会CHI最佳论文荣誉提名奖(4次)和最佳论文奖, Chinese CHI的最佳论文荣誉提名奖(2次)、ASSETS最佳作品奖、和UbiComp最佳论文荣誉提名奖。目前担任多个国内外重要学术职务,包括CHI 2025子委员会主席(Accessibility and Aging),CSCW 2023论文奖评审委员会委员,ASSETS 2024程序委员会委员,中国计算机学会(CCF)人机交互专委会执委,中国残疾人康复协会康复工程与辅具技术专业委员会委员等。 |
CHCI_A001 | |
---|---|
题目:“Can It Be Customized According to My Motor Abilities?”: Toward Designing User-Defined Head Gestures for People with Dystonia
摘要:Recent studies proposed above-the-neck gestures for people with upper-body motor impairments interacting with mobile devices without finger touch, resulting in an appropriate user-defined gesture set. However, many gestures involve sustaining eyelids in closed or open states for a period. This is challenging for people with dystonia, who have difficulty sustaining and intermitting muscle contractions. Meanwhile, other facial parts, such as the tongue and nose, can also be used to alleviate the sustained use of eyes in the interaction. Consequently, we conducted a user study inviting 16 individuals with dystonia to design gestures based on facial muscle movements for 26 common smartphone commands. We collected 416 user-defined head gestures involving facial features and shoulders. Finally, we obtained the preferred gestures set for individuals with dystonia. Participants preferred to make the gestures with their heads and use unnoticeable gestures. Our findings provide valuable references for the universal design of natural interaction technology. |
|
CHCI_A002 | |
题目:CDSD: Chinese Dysarthria Speech Database
摘要:We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition. |
|
CHCI_A003 | |
题目:Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image
摘要:Deep neural networks have demonstrated remarkable performance in medical image analysis. However, its susceptibility to spurious correlations due to shortcut learning raises concerns about network interpretability and reliability. Furthermore, shortcut learning is exacerbated in medical contexts where disease indicators are often subtle and sparse. In this paper, we propose a novel gaze-directed Vision GNN (called GD-ViG) to leverage the visual patterns of radiologists from gaze as expert knowledge, directing the network toward disease-relevant regions, and thereby mitigating shortcut learning. GD-ViG consists of a gaze map generator (GMG) and a gaze-directed classifier (GDC). Combining the global modelling ability of GNNs with the locality of CNNs, GMG generates the gaze map based on radiologists' visual patterns. Notably, it eliminates the need for real gaze data during inference, enhancing the network's practical applicability. Utilizing gaze as the expert knowledge, the GDC directs the construction of graph structures by incorporating both feature distances and gaze distances, enabling the network to focus on disease-relevant foregrounds. Thereby avoiding shortcut learning and improving the network's interpretability. The experiments on two public medical image datasets demonstrate that GD-ViG outperforms the state-of-the-art methods, and effectively mitigates shortcut learning. Our code is available at this https URL. |
|
CHCI_A004 | |
题目:Understanding the Needs of Novice Developers in Creating Self-Powered IoT
摘要:The rise of the Internet of Things (IoT) has given birth to transformative and massively deployed computing applications that raise the significant issue of energy sources. It is impractical and irresponsible to rely on wires and batteries to power trillion-level devices. One promising prediction is that energy harvesting technologies will serve as alternative power sources for IoT devices. However, we might be losing this prophecy for lack of understanding of how novice developers comprehend energy in developing IoT. In response, we conducted a mentored physical prototyping study with a two-day workshop involving eight novice developers. The study consisted of qualitative and quantitative analyses, the artifacts, interviews with both novice developers and an expert, and implications of designs for future tools. The findings reveal informational gaps that demand educational efforts and assistive features to facilitate novice developers. We present major findings from the study and implications for the design of future tools. |
|
CHCI_A005 | |
题目:Designing Scaffolding Strategies for Conversational Agents in Dialog Task of Neurocognitive Disorders Screening
摘要:Regular screening is critical for individuals at risk of neurocognitive disorders (NCDs) to receive early intervention. Conversational agents (CAs) have been adopted to administer dialog-based NCD screening tests for their scalability compared to human-administered tests. However, unique communication skills are required for CAs during NCD screening, e.g., clinicians often apply scaffolding to ensure subjects’ understanding of and engagement in screening tests. Based on scaffolding theories and analysis of clinicians’ practices from human-administered test recordings, we designed a scaffolding framework for the CA. In an exploratory wizard-of-Oz study, the CA empowered by ChatGPT administered tasks in the Grocery Shopping Dialog Task with 15 participants (10 diagnosed with NCDs). Clinical experts verified the quality of the CA’s scaffolding and we explored its effects on task understanding of the participants. Moreover, we proposed implications for the future design of CAs that enable scaffolding for scalable NCD screening. |
|
CHCI_A006 | |
题目:Bimanual Asymmetric Coordination in a Three-Dimensional Zooming Task Based on Leap Motion: Implications for Upper Limb Rehabilitation
摘要:This paper explores the practicability of bimanual asymmetric interaction and the factors affecting it when manipulating virtual three-dimensional objects through Leap, a novel hand motion capture device. This research has the potential to aid in the rehabilitation of people with upper limb function impairments. The participants were asked to enlarge a virtual three-dimensional box device to a predefined specified scale. Three influencing factors were addressed, i.e., task difficulty, task allocation and interactive mode. The results indicate that all factors have significant effects on movement time. However, analysis of variance tests shows that there are significant effects on the error rate and spatial patterns due to task difficulty and interactive mode, while no significant effects are found for task allocation. Additionally, task allocation is observed to have significant effects on the time of each hand from zooming phase onset to peak velocity. The action of the nondominant hand is coarser than that of the dominant hand. Interestingly, the velocities of both hands synchronized as the task difficulty level increased, even though the limbs moved at quite different speeds in the initial stage. This research provides insights into how one hand coordinates with the other in terms of the temporal aspects of movement kinematics and thus can help in designing rehabilitative devices that interact with the healthy hand. |
|
CHCI_A007 | |
题目:A Multi-modal Toolkit to Support DIY Assistive Technology Creation for Blind and Low Vision People
摘要:We design and build A11yBits, a tangible toolkit that empowers blind and low vision (BLV) people to easily create personalized do-it-yourself assistive technologies (DIY-ATs). A11yBits includes (1) a series of Sensing modules to detect both environmental information and user commands, (2) a set of Feedback modules to send multi-modal feedback, and (3) two Base modules (Sensing Base and Feedback Base) to power and connect the sensing and feedback modules. The toolkit enables accessible and easy assembly via a “plug-and-play” mechanism. BLV users can select and assemble their preferred modules to create personalized DIY-ATs. |
|
CHCI_A008 | |
题目:ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12
摘要:As Computational Thinking (CT) continues to permeate younger age groups in K-12 education, established CT platforms such as Scratch face challenges in catering to these younger learners, particularly those in the elementary school (ages 6-12). Through formative investigation with Scratch experts, we uncover three key obstacles to children’s autonomous Scratch learning: artist’s block in project planning, bounded creativity in asset creation, and inadequate coding guidance during implementation. To address these barriers, we introduce ChatScratch, an AI-augmented system to facilitate autonomous programming learning for young children. ChatScratch employs structured interactive storyboards and visual cues to overcome artist’s block, integrates digital drawing and advanced image generation technologies to elevate creativity, and leverages Scratch-specialized Large Language Models (LLMs) for professional coding guidance. Our study shows that, compared to Scratch, ChatScratch efficiently fosters autonomous programming learning, and contributes to the creation of high-quality, personally meaningful Scratch projects for children. |
|
CHCI_A009 | |
题目:OdorAgent Generate Odor Sequences for Movies Based on Large Language Model
摘要:Numerous studies have shown that integrating scents into movies enhances viewer engagement and immersion. However, creating such olfactory experiences often requires professional perfumers to match scents, limiting their widespread use. To address this, we propose OdorAgent which combines a LLM with a text-image model to automate video-odor matching. The generation framework is in four dimensions: subject matter, emotion, space, and time. We applied it to a specific movie and conducted user studies to evaluate and compare the effectiveness of different system elements. The results indicate that OdorAgent possesses significant scene adaptability and enables inexperienced individuals to design odor experiences for video and images. |
|
CHCI_A010 | |
题目:A Novel Refreshable Braille Display Based on the Layered Electromagnetic Driving Mechanism of Braille Dots
摘要:In the digital era, Braille displays enable visually impaired people to easily access information. Different from traditional piezoelectric Braille displays, a novel electromagnetic Braille display is realized in this study. The novel display has the advantages of a stable performance, a long service life and a low cost and is based on an innovative layered electromagnetic driving mechanism of Braille dots, which can achieve a dense arrangement of Braille dots and provide a sufficient support force for them. The T-shaped screw compression spring, which causes the Braille dots to fall back instantaneously, is optimized to achieve a high refresh frequency and to enable visually impaired people to read Braille quickly. The experimental results show that under an input voltage of 6 V, the Braille display can work stably and reliably and provide a good fingertip touch; the Braille dot support force is greater than 150 mN, the maximum refresh frequency can reach 50 Hz, and the operating temperature is lower than 32 °C. Therefore, this cost-effective Braille display is expected to benefit a vast number of low-income visually impaired people in developing countries and improve their learning, working and living conditions. |
|
CHCI_A011 | |
题目:“There is a Job Prepared for Me Here”: Understanding How Short Video and Live-streaming Platforms Empower Ageing Job Seekers in China
摘要:In recent years, the global unemployment rate has remained persistently high. Compounding this issue, the ageing population in China often encounters additional challenges in finding employment due to prevalent age discrimination in daily life. However, with the advent of social media, there has been a rise in the popularity of short videos and live-streams for recruiting ageing workers. To better understand the motivations of ageing job seekers to engage with these video-based recruitment methods and to explore the extent to which such platforms can empower them, we conducted an interview-based study with ageing job seekers who have had exposure to these short recruitment videos and live-streaming channels. Our findings reveal that these platforms can provide a job-seeking choice that is particularly friendly to ageing job seekers, effectively improving their disadvantaged situation. |
|
CHCI_A012 | |
题目:Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
摘要:Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained over the fine-tuned Whisper. |
|
CHCI_A013 | |
题目:Learning from Hybrid Craft: Investigating and Reflecting on Innovating and Enlivening Traditional Craft through Literature Review
摘要:The key to preserving traditional crafts lies in living transmission, which is inseparable from sustaining artistic production, audience consumption, and progressive innovation with the physical media. As HCI researchers, we focus on the hybrid crafts field, which involves numerous cross-disciplinary integration cases between traditional craftsmanship and digital technology at the physical level, providing inspiration for innovating and enlivening traditional crafts. We conducted a multi-perspective review of 85 hybrid craft articles related to traditional crafts over the past decade, considering aspects such as craft categories, digital technology, target users, and research areas. Through reflection, we propose a design framework for fostering innovation and revitalizing traditional crafts. This paper aims to offer insight into the innovation and enlivenment of traditional crafts through a hybrid craft perspective while also serving as a first review of the hybrid craft field from the traditional craftsmanship perspective. |
|
CHCI_A014 | |
题目:CataAnno: An Ancient Catalog Annotator to Uphold Annotation Unification by Relevant Recommendation
摘要:Classical bibliography, by scrutinizing preserved catalogs from both official archives and personal collections of accumulated books, examines the books throughout history, thereby elucidating cultural development across historical periods. In this work, we collaborate with domain experts to accomplish the task of data annotation concerning Chinese ancient catalogs. We introduce the CataAnno system that facilitates users in completing annotations more efficiently through cross-linked views, recommendation methods and convenient annotation interactions. The recommendation method can learn the background knowledge and annotation patterns that experts subconsciously integrate into the data during prior annotation processes. CataAnno searches for the most relevant examples previously annotated and recommends to the user. Meanwhile, the cross-linked views assist users in comprehending the correlations between entries and offer explanations for these recommendations. Evaluation and expert feedback confirm that the CataAnno system, by offering high-quality recommendations and visualizing the relationships between entries, can mitigate the necessity for specialized knowledge during the annotation process. This results in enhanced accuracy and consistency in annotations, thereby enhancing the overall efficiency. |
|
CHCI_A015 | |
题目:"Voices Help Correlate Signs and Words": Analyzing Deaf and Hard-of-Hearing (DHH) TikTokers’ Content, Practices, and Pitfalls
摘要:Video-sharing platforms such as TikTok have offered new opportunities for d/Deaf and hard-of-hearing (DHH) people to create public-facing content using sign language – an integral part of DHH culture. Besides sign language, DHH creators deal with a variety of modalities when creating videos, such as captions and audio. However, hardly any work has comprehensively addressed DHH creators’ multimodal practices with the lay public’s reactions taken into account. In this paper, we systematically analyzed 308 DHH-authored TikTok videos using a mixed-methods approach, focusing on DHH TikTokers’ content, practices, pitfalls, and viewer engagement. Our findings highlight that while voice features such as synchronous voices are scant and challenging for DHH TikTokers, they may help promote viewer engagement. Other empirical findings, including the distributions of topics, practices, pitfalls, and their correlations with viewer engagement, further lead to actionable suggestions for DHH TikTokers and video-sharing platforms. |
CHCI 顶会顶刊论文交流 - 交互技术与输入方法
主持人简介 | ||||||
---|---|---|---|---|---|---|
![]() 赵凯星 助理教授 西北工业大学 |
简介: 赵凯星,西北工业大学软件学院助理教授,计算机学院博士后。赵凯星于 2021 年 7 月获得法国图卢兹大学人机交互方向博士学位,目前担任中国计算机学会人机交互专业委员会执行委员、中国图象图形学学会可视化专业委员会正式委员,并担任人机交互和普适计算领域顶级会议及期刊(如CHI、CSCW及IMWUT)的 PC Member 和 AC 等。近两年来主持各类横纵向科研项目十余项,并发表二十余篇高水平论文。 |
CHCI_I001 | |
---|---|
题目:SnapInflatables: Designing Inflatables with Snap-through Instability for Responsive Interaction
摘要:Snap-through instability, like the rapid closure of the Venus flytrap, is gaining attention in robotics and HCI. It offers rapid shape reconfiguration, self-sensing, actuation, and enhanced haptic feedback. However, conventional snap-through structures face limitations in fabrication efficiency, scale, and tunability. We introduce SnapInflatables, enabling safe, multi-scale interaction with adjustable sensitivity and force reactions, utilizing the snap-through instability of inflatables. We designed six types of heat-sealing structures enabling versatile snap-through passive motion of inflatables with diverse reaction and trigger directions. A block structure enables ultra-sensitive states for rapid energy release and force amplification. The motion range is facilitated by geometry parameters, while force feedback properties are tunable through internal pressure settings. Based on experiments, we developed a design tool for creating desired inflatable snap-through shapes and motions, offering previews and inflation simulations. Example applications, including a self-locking medical stretcher, interactive animals, a bounce button, and a large-scale light demonstrate enhanced passive interaction with inflatables. |
|
CHCI_I002 | |
题目:EmTex: Prototyping Textile-Based Interfaces through An Embroidered Construction Kit
摘要:As electronic textiles have become more advanced in sensing, actuating, and manufacturing, incorporating smartness into fabrics has become of special interest to ubiquitous computing and interaction researchers and designers. However, innovating smart textile interfaces for numerous input and output modalities usually requires expert-level knowledge of specific materials, fabrication, and protocols. This paper presents EmTex, a construction kit based on embroidered textiles, patterned with dedicated sensing, actuating, and connecting components to facilitate the design and prototyping of smart textile interfaces. With machine embroidery, EmTex is compatible with a wide range of threads and underlay fabrics, proficient in various stitches to control the electric parameters, and capable of integrating versatile and reliable interaction functionalities with aesthetic patterns and precise designs. EmTex consists of 28 textile-based sensors, actuators, connectors, and displays, presented with standardized visual and tactile effects. Along with a visual programming tool, EmTex enables the prototyping of everyday textile interfaces for diverse life-living scenarios, that embody their touch input, and visual and haptic output properties. With EmTex, we conducted a workshop and invited 25 designers and makers to create freeform textile interfaces. Our findings revealed that EmTex helped the participants explore novel interaction opportunities with various smart textile prototypes. We also identified challenges EmTex shall face for practical use in promoting the design innovation of smart textiles. |
|
CHCI_I003 | |
题目:SpeciFingers: Finger Identification and Error Correction on Capacitive Touchscreens
摘要:The inadequate use of finger properties has limited the input space of touch interaction. By leveraging the category of contacting fingers, finger-specific interaction is able to expand input vocabulary. However, accurate finger identification remains challenging, as it requires either additional sensors or limited sets of identifiable fingers to achieve ideal accuracy in previous works. We introduce SpeciFingers, a novel approach to identify fingers with the capacitive raw data on touchscreens. We apply a neural network of an encoder-decoder architecture, which captures the spatio-temporal features in capacitive image sequences. To assist users in recovering from misidentification, we propose a correction mechanism to replace the existing undo-redo process. Also, we present a design space of finger-specific interaction with example interaction techniques. In particular, we designed and implemented a use case of optimizing the performance in pointing on small targets. We evaluated our identification model and error correction mechanism in our use case. |
|
CHCI_I004 | |
题目:SpaceGTN: A Time-Agnostic Graph Transformer Network for Handwritten Diagram Recognition and Segmentation
摘要:Online handwriting recognition is pivotal in domains like note-taking, education, healthcare, and office tasks. Existing diagram recognition algorithms mainly rely on the temporal information of strokes, resulting in a decline in recognition performance when dealing with notes that have been modified or have no temporal information. The current datasets are drawn based on templates and cannot reflect the real free-drawing situation. To address these challenges, we present SpaceGTN, a time-agnostic Graph Transformer Network, leveraging spatial integration and removing the need for temporal data. Extensive experiments on multiple datasets have demonstrated that our method consistently outperforms existing methods and achieves state-of-the-art performance. We also propose a pipeline that seamlessly connects offline and online handwritten diagrams. By integrating a stroke restoration technique with SpaceGTN, it enables intelligent editing of previously uneditable offline diagrams at the stroke level. In addition, we have also launched the first online handwritten diagram dataset, OHSD, which is collected using a free-drawing method and comes with modification annotations. |
|
CHCI_I005 | |
题目:Interaction Proxy Manager: Semantic Model Generation and Run-time Support for Reconstructing Ubiquitous User Interfaces of Mobile Services
摘要:Emerging terminals, such as smartwatches, true wireless earphones, in-vehicle computers, etc., are complementing our portals to ubiquitous information services. However, the current ecology of information services, encapsulated into millions of mobile apps, is largely restricted to smartphones; accommodating them to new devices requires tremendous and almost unbearable engineering efforts. Interaction Proxy, firstly proposed as an accessible technique, is a potential solution to mitigate this problem. Rather than re-building an entire application, Interaction Proxy constructs an alternative user interface that intercepts and translates interaction events and states between users and the original app's interface. However, in such a system, one key challenge is how to robustly and efficiently "communicate" with the original interface given the instability and dynamicity of mobile apps (e.g., dynamic application status and unstable layout). To handle this, we first define UI-Independent Application Description (UIAD), a reverse-engineered semantic model of mobile services, and then propose Interaction Proxy Manager (IPManager), which is responsible for synchronizing and managing the original apps' interface, and providing a concise programming interface that exposes information and method entries of the concerned mobile services. In this way, developers can build alternative interfaces without dealing with the complexity of communicating with the original app's interfaces. In this paper, we elaborate the design and implementation of our IPManager, and demonstrate its effectiveness by developing three typical proxies, mobile-smartwatch, mobile-vehicle and mobile-voice. We conclude by discussing the value of our approach to promote ubiquitous computing, as well as its limitations. |
|
CHCI_I006 | |
题目:MagnaDip Kit:A User-Friendly Toolkit for Streamlined Fabrication of Electromagnetic Responsive Textiles
摘要:Smart materials play an essential role in enhancing the efficiency and diversity of human-computer interaction (HCI). This paper focuses on the domain of flexible smart materials. We introduce MagnaDip Kit, a toolkit designed for creating magnetic textiles, aiming at democratizing the innovation of new materials, facilitating the wider adoption of smart materials in everyday applications. The MagnaDip Kit ensures a straightforward and user-friendly manufacturing process, and serves as a resource for designers to employ smart materials in prototypes. Combining interdisciplinary knowledge from materials science and design, we aim to provide a tangible production experience, enabling the expansion of novel interaction modalities beyond traditional computing. Based on the toolkit's output, we further integrated the characteristics of electromagnetic responsive textiles to create a light-responsive interactive prototype, demonstrating one of the applications of smart materials. Relying on CHI2024, we seek feedback from an international audience to refine the toolkit and conduct additional workshops. |
|
CHCI_I007 | |
题目:TouchEditor: Interaction Design and Evaluation of a Flexible Touchpad for Text Editing of Head-Mounted Displays in Speech-unfriendly Environments
摘要:A text editing solution that adapts to speech-unfriendly (inconvenient to speak or difficult to recognize speech) environments is essential for head-mounted displays (HMDs) to work universally. For existing schemes, e.g., touch bar, virtual keyboard and physical keyboard, there are shortcomings such as insufficient speed, uncomfortable experience or restrictions on user location and posture. To mitigate these restrictions, we propose TouchEditor, a novel text editing system for HMDs based on a flexible piezoresistive film sensor, supporting cursor positioning, text selection, text retyping and editing commands (i.e., Copy, Paste, Delete, etc.). Through literature overview and heuristic study, we design a pressure-controlled menu and a shortcut gesture set for entering editing commands, and propose an area-and-pressure-based method for cursor positioning and text selection that skillfully maps gestures in different areas and with different strengths to cursor movements with different directions and granularities. The evaluation results show that TouchEditor i) adapts to various contents and scenes well with a stable correction speed of 0.075 corrections per second; ii) achieves 95.4% gesture recognition accuracy; iii) reaches a considerable level with a mobile phone in text selection tasks. The comparison results with the speech-dependent EYEditor and the built-in touch bar further prove the flexibility and robustness of TouchEditor in speech-unfriendly environments. |
|
CHCI_I008 | |
题目:Eye-Hand Typing: Eye Gaze Assisted Finger Typing via Bayesian Processes in AR
摘要:Nowadays, AR HMDs are widely used in scenarios such as intelligent manufacturing and digital factories. In a factory environment, fast and accurate text input is crucial for operators' efficiency and task completion quality. However, the traditional AR keyboard may not meet this requirement, and the noisy environment is unsuitable for voice input. In this article, we introduce Eye-Hand Typing, an intelligent AR keyboard. We leverage the speed advantage of eye gaze and use a Bayesian process based on the information of gaze points to infer users' text input intentions. We improve the underlying keyboard algorithm without changing user input habits, thereby improving factory users' text input speed and accuracy. In real-time applications, when the user's gaze point is on the keyboard, the Bayesian process can predict the most likely characters, vocabulary, or commands that the user will input based on the position and duration of the gaze point and input history. The system can enlarge and highlight recommended text input options based on the predicted results, thereby improving user input efficiency. A user study showed that compared with the current HoloLens 2 system keyboard, Eye-Hand Typing could reduce input error rates by 28.31 % and improve text input speed by 14.5%. It also outperformed a gaze-only technique, being 43.05% more accurate and 39.55% faster. And it was no significant compromise in eye fatigue. Users also showed positive preferences. |
|
CHCI_I009 | |
题目:TrackPose: Towards Stable and User Adaptive Finger Pose Estimation on Capacitive Touchscreens
摘要:Several studies have explored the estimation of finger pose/angle to enhance the expressiveness of touchscreens. However, the accuracy of previous algorithms is limited by large estimation errors, and the sequential output angles are unstable, making it difficult to meet the demands of practical applications. We believe the defect arises from improper rotation representation, the lack of time-series modeling, and the difficulty in accommodating individual differences among users. To address these issues, we conduct in-depth study of rotation representation for the 2D pose problem by minimizing the errors between representation space and original space. A deep learning model, TrackPose, using a self-attention mechanism is proposed for time-series modeling to improve accuracy and stability of finger pose. A registration application on a mobile phone is developed to collect touchscreen images of each new user without the use of optical tracking device. The combination of the three measures mentioned above has resulted in a 33% reduction in the angle estimation error, 47% for the yaw angle especially. Additionally, the instability of sequential estimations, measured by the proposed metric MAEΔ, is reduced by 62%. User study further confirms the effectiveness of our proposed algorithm. |
|
CHCI_I010 | |
题目:SwivelTouch: Boosting Touchscreen Input with 3D Finger Rotation Gesture
摘要:Today, touchscreens stand as the most prevalent input devices of mobile computing devices (smartphones, tablets, smartwatches). Yet, compared with desktop or laptop computers, the limited shortcut keys and physical buttons on touchscreen devices, coupled with the fat finger problem, often lead to slower and more error-prone input and navigation, especially when dealing with text editing and other complex interaction tasks. We introduce an innovative gesture set based on finger rotations in the yaw, pitch, and roll directions on a touchscreen, diverging significantly from traditional two-dimensional interactions and promising to expand the gesture library. Despite active research in estimation of finger angles, however, the previous work faces substantial challenges, including significant estimation errors and unstable sequential outputs. Variability in user behavior further complicates the isolation of movements to a single rotational axis, leading to accidental disturbances and screen coordinate shifts that interfere with the existing sliding gestures. Consequently, the direct application of finger angle estimation algorithms for recognizing three-dimensional rotational gestures is impractical. SwivelTouch leverages the analysis of finger movement characteristics on the touchscreen captured through original capacitive image sequences, which aims to rapidly and accurately identify these advanced 3D gestures, clearly differentiating them from conventional touch interactions like tapping and sliding, thus enhancing user interaction with touch devices and meanwhile compatible with existing 2D gestures. User study further confirms that the implementation of SwivelTouch significantly enhances the efficiency of text editing on smartphones. |
|
CHCI_I011 | |
题目:Perceptually Inspired C0-Continuity Haptic Shape Display with Trichamber Soft Actuators
摘要:Shape display devices composed of actuation pixels enable dynamic rendering of surface morphological features, which have important roles in virtual reality and metaverse applications. The traditional pin-array solution produces sidestep-like structures between neighboring pins and normally relies on high-density pins to obtain curved surfaces. It remains a challenge to achieve continuous curved surfaces using a small number of actuated units. To address the challenge, we resort to the concept of surface continuity in computational geometry and develop a C0-continuity shape display device with trichamber fiber-reinforced soft actuators. Each trichamber unit produces three-dimensional (3D) deformation consisting of elongation, pitch, and yaw rotation, thus ensuring rendered surface continuity using low-resolution actuation units. Inspired by human tactile discrimination threshold on height and angle gradients between adjacent units, we proposed the mathematical criteria of C0-continuity shape display and compared the maximal number of distinguishable shapes using the proposed device in comparison with typical pin-array. We then established a shape control model considering the nonlinearity of soft materials to characterize and control the soft device to display C0-continuity shapes. Experimental results showed that the proposed device with nine trichamber units could render typical sets of distinguishable C0-continuity shape sequence changes. We envision that the concept of C0-continuity shape display with 3D deformation capability could improve the fidelity of the rendered shapes in many metaverse scenarios such as touching human organs in medical palpation simulations. |
|
CHCI_I012 | |
题目:SimUser: Generating Usability Feedback by Simulating Various Users Interacting with Mobile Applications
摘要:The conflict between the rapid iteration demand of prototyping and the time-consuming nature of user tests has led researchers to adopt AI methods to identify usability issues. However, these AI-driven methods concentrate on evaluating the feasibility of a system, while often overlooking the influence of specified user characteristics and usage contexts. Our work proposes a tool named SimUser based on large language models (LLMs) with the Chain-of-Thought structure and user modeling method. It generates usability feedback by simulating the interaction between users and applications, which is influenced by user characteristics and contextual factors. The empirical study (48 human users and 21 designers) validated that in the context of a simple smartwatch interface, SimUser could generate heuristic usability feedback with the similarity varying from 35.7% to 100% according to the user groups and usability category. Our work provides insights into simulating users by LLM to improve future design activities. |
|
CHCI_I013 | |
题目:Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
摘要:Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge, and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneously spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies. |
|
CHCI_I014 | |
题目:Designing Unobtrusive Modulated Electrotactile Feedback on Fingertip Edge to Assist Blind and Low Vision (BLV) People in Comprehending Charts
摘要:Charts are crucial in conveying information across various fields but are inaccessible to blind and low vision (BLV) people without assistive technology. Chart comprehension tools leveraging haptic feedback have been used widely but are often bulky, expensive, and static, rendering them inefficient for conveying chart data. To increase device portability, enable multitasking, and provide efficient assistance in chart comprehension, we introduce a novel system that delivers unobtrusive modulated electrotactile feedback directly to the fingertip edge. Our three-part study with twelve participants confirmed the effectiveness of this system, demonstrating that electrotactile feedback, when applied for 0.5 seconds with a 0.12-second interval, provides the most accurate position and direction recognition. Furthermore, our electrotactile device has proven valuable in assisting BLV participants in comprehending four commonly used charts: line charts, scatterplots, bar charts, and pie charts. We also delve into the implications of our findings on recognition enhancement, presentation modes, and function synergy. |
|
CHCI_I015 | |
题目:Grip-Reach-Touch-Repeat: A Refined Model of Grasp to Encompass One-Handed Interaction with Arbitrary Form Factor Devices
摘要:We extend grasp models to encompass one-handed interaction with arbitrary shaped touchscreen devices. Current models focus on how objects are stably held by external forces. However, with touchscreen devices, we postulate that users do a trade-off between holding securely and exploring interactively. To verify this, we first conducted a qualitative study which asked participants to grasp 3D printed objects while considering its different interactivity. Results of the study confirm our hypothesis and reveal obvious change in postures. To further verify this trade-off and design interactions, we developed a simulation software capable of computing the stability of a grasp and its reachability. We conducted the second study based on the observed predominant grasps to validate our software with a glove. Results also confirm a consistent trade-off between stability and reachability. We conclude by discussing how this research can help designing computational tools focusing on hand-held interactions with arbitrary shaped touchscreen devices. |
CHCI 顶会顶刊论文交流 - 虚拟与增强现实技术
主持人简介 | ||||||
---|---|---|---|---|---|---|
![]() 石伟男 助理研究员 清华大学 |
简介: 石伟男,清华大学计算机系助理研究员,2016年、2021年于清华大学获得学士、博士学位。主要研究方向是智能人机交互,包括文本输入、情境感知、可穿戴设备和无障碍等方面工作。研发的基于手指运动控制能力⻉叶斯模型的智能文本输入方法,在搜狗输入法中作为关键特性得以应用,服务用户规模超6亿,日均输入900亿字,显著提升纠错字符串的召回率和线上纠错的点选率;提出世界上首款盲人键盘,根据盲人用户输入进行字符级纠错,显著降低输入错误率。曾获得ACM CHI Honorable Mention Award、中国人因工程设计大赛一等奖等。 |
CHCI_V001 | |
---|---|
题目:Understanding the Impact of Longitudinal VR Training on Users with Mild Cognitive Impairment Using fNIRS and Behavioral Data
摘要:With the growing needs on rehabilitation of the mild cognitive impairment (MCI) users group and the advantages of virtual reality (VR) technologies in cognitive training, the development of VR-based rehabilitation training methods has become a hot spot recently. However, the challenges in accurately measuring users’ needs and quantifying training system efficacy are still not well resolved, especially for longitudinal tracking. In this study, a VR-based cognitive training and evaluation system is designed and implemented, targeting at fulfilling the rehabilitation needs of MCI users. It evaluates the impact of longitudinal VR-based training on MCI users with a number of feedback methodologies including brain activation indicators, brain network connectivity indicators, behavioral indicators and the Montreal Cognitive Assessment (MoCA) scale scores, extracted from multi-modal data collected while training. A two-month longitudinal tracking ergonomics experiment was conducted to validate the usability of the feedback methodologies and to explore the influence of the training duration on the rehabilitation efficacy. The results showed that our proposed VR-based cognitive training and evaluation system had a positively significant impact on the rehabilitation of the MCI group. Meanwhile, the multi-source feedbacks can also help the updates and iterations of VR-based rehabilitation training systems. Finally, this study provides guidance for the selection of rehabilitation cycles and emphasizes the importance of quantitative studies with longitudinal follow-up in assessing rehabilitation efficacy. |
|
CHCI_V002 | |
题目:Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics
摘要:Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion. |
|
CHCI_V003 | |
题目:Dynamic Scene Adjustment Mechanism for Manipulating User Experience in VR
摘要:With the progression of VR tech, virtual interactive environments are becoming increasingly realistic and controllable. Research has substantiated the influence of VR environmental variables on user experience and engagement. Concurrently, real-time user status monitoring advancements have unlocked dynamic adjustments to VR environments through user interaction with real-time status and feedback, increasing researchers’ focus on enhancing user experience and engagement by adjusting VR environmental variables. This paper introduces an interactive paradigm for VR environments called the Dynamic Scene Adjustment (DSA) mechanism, which seeks to modify the VR environmental variables in real-time according to the user’s status and performance to enhance user engagement and experience. We selected the perspective of the impact of visual environment variables on player status, embedding the DSA mechanism into a music VR game with brain-computer interaction for specific VR tasks. Experimental findings affirm that incorporating the DSA mechanism into the VR game enhances the user’s engagement and performance, thereby strongly validating the rationality of the proposed DSA approach. This work can assist researchers think about dynamic regulation in VR environments from a new perspective and will shed light on the design of VR healing, VR education, VR games, and other fields. |
|
CHCI_V004 | |
题目:InkBrush: A Sketching Tool for 3D Ink Painting
摘要:InkBrush is a new sketch-based 3D drawing tool for creating 3D ink paintings using free-form 3D ink strokes. It offers a digital calligraphy brush and various editing tools to generate realistic ink-like brush strokes with attributes like hairy edges, ink drips, and scattered dots. Users can adjust parameters such as moisture, color, darkness, dryness, and stroke style to customize the appearance of the brush strokes. The development of InkBrush was guided by a design study involving artists and designers. It was developed as a plugin for Blender, a popular 3D modeling tool, and its effectiveness and usability were evaluated through a user study involving 75 participants. Preliminary feedback from the participants was overwhelmingly positive, indicating that InkBrush was intuitive and easy to use. Following this, we also sought in-depth assessments from experts in ink painting and 3D design. Their evaluations further demonstrated the effectiveness of InkBrush. |
|
CHCI_V005 | |
题目:PepperPose: Full-Body Pose Estimation with a Companion Robot
摘要:Accurate full-body pose estimation across diverse actions in a user-friendly and location-agnostic manner paves the way for interactive applications in realms like sports, fitness, and healthcare. This task becomes challenging in real-world scenarios due to factors like the user’s dynamic positioning, the diversity of actions, and the varying acceptability of the pose-capturing system. In this context, we present PepperPose, a novel companion robot system tailored for optimized pose estimation. Unlike traditional methods, PepperPose actively tracks the user and refines its viewpoint, facilitating enhanced pose accuracy across different locations and actions. This allows users to enjoy a seamless action-sensing experience. Our evaluation, involving 30 participants undertaking daily functioning and exercise actions in a home-like space, underscores the robot’s promising capabilities. Moreover, we demonstrate the opportunities that PepperPose presents for human-robot interaction, its current limitations, and future developments. |
|
CHCI_V006 | |
题目:Self-Guided DMT: Exploring a Novel Paradigm of Dance Movement Therapy in Mixed Reality for Children with ASD
摘要:Children diagnosed with Autism Spectrum Disorder (ASD) often exhibit motor disorders. Dance Movement Therapy (DMT) has shown great potential for improving the motor control ability of children with ASD. However, traditional DMT methods often lack vividness and are difficult to implement effectively. To address this issue, we propose a Mixed Reality DMT approach, utilizing interactive virtual agents. This approach offers immersive training content and multi-sensory feedback. To improve the training performance of children with ASD, we introduce a novel training paradigm featuring a self-guided mode. This paradigm enables the rapid creation of a virtual twin agent of the child with ASD using a single photo to embody oneself, which can then guide oneself during training. We conducted an experiment with the participation of 24 children diagnosed with ASD (or ASD propensity), recording their training performance under various experimental conditions. Through expert rating, behavior coding of training sessions, and statistical analysis, our findings revealed that the use of the twin agent for self-guidance resulted in noticeable improvements in the training performance of children with ASD. These improvements were particularly evident in terms of enhancing movement quality and refining overall target-related responses. Our study holds clinical potential in the field of medical treatment and rehabilitation for children with ASD. |
|
CHCI_V007 | |
题目:Enhancing Positive Emotions through Interactive Virtual Reality Experiences: An EEG-Based Investigation
摘要:Virtual reality (VR), as an immersive interactive technology, holds the potential to promote feelings of well-being by evoking positive emotions. However, the underlying causes and extent of emotional responses elicited by VR remain underexplored. Accordingly, we aimed to investigate the types of interaction behaviors in VR that effectively enhance positive emotions, using electroencephalogram (EEG) signals as measurements of emotional expressions. In an exploratory study conducted on a virtual museum , we designed four interactive tasks with varying user autonomy and interaction functions. An individual emotion model based on EEG was employed to predict the promotion of positive emotions and its extent. The results indicated that simply roaming the virtual museum had no obvious impact on positive emotions. However, incorporating specific interaction functions such as doodles, emojis, and comments increased positive emotions, with the extent of the increase closely linked to the degree of user autonomy. |
|
CHCI_V008 | |
题目:Towards Building Condition-Based Cross-Modality Intention-Aware Human-AI Cooperation under VR Environment
摘要:To address critical challenges in effectively identifying user intent and forming relevant information presentations and recommendations in VR environments, we propose an innovative condition-based multi-modal human-AI cooperation framework. It highlights the intent tuples (intent, condition, intent prompt, action prompt) and 2-Large-Language-Models (2-LLMs) architecture. This design, utilizes “condition” as the core to describe tasks, dynamically match user interactions with intentions, and empower generations of various tailored multi-modal AI responses. The architecture of 2-LLMs separates the roles of intent detection and action generation, decreasing the prompt length and helping with generating appropriate responses. We implemented a VR-based intelligent furniture purchasing system based on the proposed framework and conducted a three-phase comparative user study. The results conclusively demonstrate the system’s superiority in time efficiency and accuracy, intention conveyance improvements, effective product acquisitions, and user satisfaction and cooperation preference. Our framework provides a promising approach towards personalized and efficient user experiences in VR. |
|
CHCI_V009 | |
题目:Exploring the Role of AR Cognitive Interface in Enhancing Human-Vehicle Collaborative Driving Safety: A Design Perspective
摘要:In autonomous driving vehicles, the heterogeneity between human and automation agents can cause conflicts in decision-making and behaviour due to the difference in perception of hazardous situations. Augmented Reality Human-Machine Interfaces (AR-HMI) provide an opportunity to support driving performance by enabling drivers to intuitively access shared perception and explanation of the automated vehicle. One possible approach to AR-HMI design is to simplify the information of driving tasks based on vehicle context understanding, although there is currently a lack of systematic understanding of how collaborative mechanisms or cognitive features contribute to AR-HMI information design. Therefore, this work develops an augmented reality cognitive interface design method for autonomous driving. It aims to identify novel collaborative interface information visualization and provide a common language and inspiration for the design space. |
|
CHCI_V010 | |
题目:Effects of spatial constraints and ages on children’s upper limb performance in mid-air gesture interaction
摘要:Through two controlled experiments, including a pie menu study and a target acquisition study, this paper investigates children’s performance on mid-air gesture interactions in different spatial constraints (i.e., in different orientations/distances), as well as the age effect on such interaction scenarios. The first experiment recorded children’s speeds and accuracies following certain directions under menus with different numbers of items, while the second evaluated the speed–accuracy trade-off (SAT) of children’s arm movements. We also compared performance differences between two age-related groups (i.e., 6–8 years old and 9–12 years old). Based on these experiments, we propose an improved design for UI menus based on mid-air gesture interaction for children. The improved design provides suggestions for setting appropriate directions and difficulty indexes, which makes it much easier and quicker for children to use the menus with mid-air interaction. |
|
CHCI_V011 | |
题目:Foveated Fluid Animation in Virtual Reality
摘要:Large-scale fluid simulation is widely useful in various Virtual Reality (VR) applications. While physics-based fluid animation holds the promise of generating highly realistic fluid details, it often imposes significant computational demands, particularly when simulating high-resolution fluid for VR. In this paper, we propose a novel foveated fluid simulation method that enhances both the visual quality and computational efficiency of physics-based fluid simulation in VR. To leverage the natural foveation feature of human vision, we divide the visible domain of the fluid simulation into foveal, peripheral, and boundary regions. Our foveated fluid system dynamically allocates computational resources, striking a balance between simulation accuracy and computational efficiency. We implement this approach using a multi-scale method. To evaluate the effectiveness of our approach, we have conducted subjective studies. Our findings show a significant reduction in computational resource requirements, resulting in a speedup of up to 2.27 times. It is crucial to note that our method preserves the visual quality of fluid animations at a level that is perceptually identical to full-resolution outcomes. Additionally, we investigate the impact of various metrics, including particle radius and viewing distance, on the visual effects of fluid animations. Our work provides new techniques and evaluations tailored to facilitate real-time foveated fluid simulation in VR, which can enhance the efficiency and realism of fluids in VR applications. |
|
CHCI_V012 | |
题目:Re-directed Placement: Evaluating the Re-direction of Passive Props during Reach-to-Place in Virtual Reality
摘要:Hand redirection is an effective technique that can provide users with haptic feedback in virtual reality (VR) when a disparity exists between virtual objects and their physical counterparts. Psychophysiological research has revealed the distinct motion profiles of different kinematic phases when people operate hand-object interaction. In this paper, we proposed the Redirected Placement (RP), which determines the new placement of a physical prop using a constrained optimization problem. The visual illusion is used during the "reach-to-place" kinematic phase in the proposed RP method rather than the "reach-to-grasp" phase in the typical Redirected Reach (RR) method. We conducted two experiments based on the proposed RP method. Our first experiment showed that detection thresholds are generally higher with the proposed method compared to the RR method. The second experiment evaluated the embodiment experience with hand redirection using RR-only, RP-only, and RR&RP methods. The results report an enhanced sense of embodiment with the combined use of both RR and RP techniques. Our study further indicates that a 1:1 combination ratio of RR&RP resulted in the closest subjective experience to the baseline. |
|
CHCI_V013 | |
题目:Retinotopic Foveated Rendering
摘要:Foveated rendering (FR) improves the rendering performance of virtual reality (VR) by allocating fewer computational loads in the peripheral field of view (FOV). Existing FR techniques are built based on the radially symmetric regression model of human visual acuity. However, horizontal-vertical asymmetry (HVA) and vertical meridian asymmetry (VMA) in the cortical magnification factor (CMF) of the human visual system have been evidenced by retinotopy research of neuroscience, suggesting the radially asymmetric regression of visual acuity. In this paper, we begin with functional magnetic resonance imaging (fMRI) data, construct an anisotropic CMF model of the human visual system, and then introduce the first radially asymmetric regression model of the rendering precision for FR applications. We conducted a pilot experiment to adapt the proposed model to VR head-mounted displays (HMDs). A user study demonstrates that retinotopic foveated rendering (RFR) provides participants with perceptually equal image quality compared to typical FR methods while reducing fragments shading by 27.2% averagely, leading to the acceleration of 1/6 for graphics rendering. We anticipate that our study will enhance the rendering performance of VR by bridging the gap between retinotopy research in neuroscience and computer graphics in VR. |
|
CHCI_V014 | |
题目:MagicMap: Enhancing Indoor Navigation Experience in VR Museums
摘要:Museum visitors are typically advised to follow trajectories planned by curators. Nevertheless, the diverse locomotion techniques available in Virtual Reality (VR) offer various navigation methods that are unattainable within physical museum spaces. Interestingly, these techniques have rarely been explored within museum settings. Our study aims to investigate appropriate navigation methods in VR museums. We first conducted a study in a virtual reconstruction of a local museum with the following navigation methods: a 2D minimap, a World-in-Miniature (WiM) system, and a WiM map. Our results showed that the WiM map with a point-and-select interaction technique outperformed the other two regarding ease of learning, reduced workload, lessened motion sickness, and greater user preferences. Based on the findings, we improved the WiM map and introduced MagicMap. It builds upon the WiM map and translates the curatorial principles of museum visiting into a hierarchical menu layout. Our further evaluation showed that MagicMap supported prolonged engagement in VR museums, enhanced system usability and overall user experience, and reduced users’ perceived workload. Our findings have implications for the future design of navigation systems in VR museums and complex indoor environments. |