Tanzeem Choudhury からの学び 【注:日本語訳は機械翻訳結果。今後修正予定】
1. The Mobile Sensing Platform: An Embedded Activity Recognition System 1.1 Introduction 1.2 Activity recognition systems 1.3 Hardware platform v1.0: Wireless multimodal sensing 1.3.1 Deployments (1) Communication, storage, and processor issues. --- | 1. The Mobile Sensing Platform: An Embedded Activity Recognition System (Tanzeem Choudhury, Inttel Research, tanzeem.choudhury@dartmouth.edu) et.al. (16 Authors: Intel Research, University of Washington, Stanford University) (IEEE PERVASIVE COMPUTING, April - June 2008. p.32-41) ・The MSP is a small wearable device designed for embedded activity recognition with the aim of broadly supporting context-aware ubiquitous computing applications. ・The MSP system architecture evolved in an iterative process that revealed a core set of activity recognition component requirements. 1.1 Introduction Activity-aware systems have inspired novel user interfaces and new applications in smart environments, surveillance, emergency response, and military missions. Systems that recognize human activities from body-worn sensors can further open the door to a world of healthcare applications, such as fitness monitoring, eldercare support, long-term preventive and chronic care, and cognitive assistance. Wearable systems have the advantage of being with the user continuously. So, for example, a fitness application could use real-time activity information to encourage users to perform opportunistic activities. Furthermore, the general public is more likely to accept such activity recognition systems because they are usually easy to turn off or remove. For systems implementing these applications to be practical, the underlying recognition module must detect a variety of activities that are performed routinely in many different manners by different individuals under different environmental conditions. This presents the challenge of building systems that can handle the real world’s noisy data and complexities. Furthermore, deploying the systems imposes some important constraints. The deployment must protect the user’s privacy as well as the privacy of those with whom the user comes in contact. The sensors must be lightweight and unobtrusive, and the machine-learning algorithms must be trainable without requiring extensive human supervision. These constraints have made robust recognition systems difficult to engineer. Over the past four years, we’ve been building an automatic activity recognition system using on-body sensors. The Mobile Sensing Platform (MSP) tackles several of these design and deployment challenges. Moreover, we’ve carried out several real-world deployments and user studies, using the results to improve the hardware, software design, and activity recognition algorithms. The lessons learned have broad relevance to context-aware ubiquitous computing applications. 1.2 Activity recognition systems Activity recognition systems typically have three main components: ・a low-level sensing module that continuously gathers relevant information about activities using microphones, accelerometers, light sensors, and so on; ・a feature processing and selection module that processes the raw sensor data into features that help discriminate between activities; and ・a classification module that uses the features to infer what activity an individual or group of individuals is engaged in—for example, walking, cooking, or having a conversation. A feature might be low-level information, such as frequency content and correlation coefficients, or higher-level information such as the number of people present. Because human activities are complex and sensor signals have varying amounts of noise, classification algorithms are almost always probabilistic. The MSP system architecture also consists of these three activity recognition components. However, the MSP evolved in an iterative process that revealed a core set of component requirements after several real-world deployments. The presentation here of our development process includes lessons learned at each stage and how the lessons contributed to our current system. 1.3 Hardware platform v1.0: Wireless multimodal sensing Many recent wearable systems for activity recognition place a single type of sensor, typically accelerometers, in multiple locations (anywhere from two to 12) on the body.[1],[2] However, this approach’s obtrusive usage model has limited its mass adoption. In addition, its use of a single sensor type restricts the range of activities it can recognize—for example, accelerometers are mainly useful for inferring a limited set of physical activities. An alternate approach is to use multiple sensor types—that is, multimodal sensors—and collect data from a single body location. Some older research in activity and context recognition explores this approach.[3] Recent studies have shown that the information gained from multimodal sensors can offset the information lost when sensor readings are collected from a single location.[4],[5] The sensors’ complementary cues are also useful for recognizing a wider range of activities. For example, an accelerometer and audio together can detect whether the user is sitting versus sitting and watching TV. For wide-scale adoption of activity recognition systems, we hypothesized the need for a sensing platform that ・packages multimodal sensors into a single small device, ・avoids using physiological sensors or sensors that require direct contact with the skin, and ・either integrates into a mobile device, such as a cell phone, or wirelessly transmits data to an external device. Originally, we assumed the external device would log and process the sensor streams, so the sensing platform could have limited processor capability and no local storage. To better understand the usefulness of different sensor modalities in inferring human activities, we designed and built a multimodal sensor board that simultaneously captured data from seven different sensors (see figure 1). We selected the sensors for their general usefulness (as evidenced by related work in activity inference),[4],[6],[7] small footprint, and low power consumption. The sensor board attached to Intel’s iMote, a 32-bit ARM7-based wireless node, which communicated with handheld devices, desktop computers, and cell phones via its Bluetooth RF Communication (Rfcomm) protocol, USB cable, or a compact flash bridge. The device was small and lightweight enough (1.52 oz/46 g) to wear comfortably for long periods of time. With all the sensors running continuously, the platform’s first version consumed approximately 43 mW of power. It could run for more than 12 hours on a 200 mAh Li-Polymer battery. However, when streaming data over Bluetooth to a cell phone, the battery only ran for about four hours. 1.3.1 Deployments Our initial deployments focused on gathering real-world activity traces for training activity classifiers. We recruited 15 volunteers to wear the MSPs as they went about day-to-day activities, such as walking, climbing stairs, cooking, working on a computer, and so on. The volunteers generated over 50 hours of data, which we collected over eight noncontiguous weeks.[5],[8],[9] We based our activity selection on two factors: application scenarios that interested us—specifically, those that encourage physical activity and support eldercare—and prior work in activity recognition systems. Focusing on activities already studied in existing systems helped us compare our system’s performance with that of others. In addition, we conducted a larger, longitudinal deployment to gather data on group interactions and face-to-face social networks.[10] For this deployment, we recruited 24 graduate student volunteers and gathered data on them simultaneously for one week per month over the course of a year. The result was more than 4,400 hours of sensor data that we’re currently analyzing offline. 1.3.2 Lessons learned Packaging the multimodal sensors into a small form factor was an appealing characteristic of the version 1.0 MSP platform. Unfortunately, depending on an external device for data processing and logging proved to be a problem even before any serious data collection began. (1) Communication, storage, and processor issues. The Bluetooth connectivity wasn’t reliable enough to continuously stream sensor data (including audio at 8 or 16 kHz). The packet losses and intermittent connection drops required us to switch to a wired solution where the sensor board was physically attached to a PDA via a USB cable. We used this “sensor plus iPAQ” combo to collect the data sets we’ve described. This solution worked as a temporary research prototype, but it clearly wasn’t feasible longer term because it required participants to carry a bulky, wired device combination simply to collect data. We might have been able to mitigate the drops by implementing a standard transport protocol instead of relying on Bluetooth’s Rfcomm stack. However, this would provide only a partial fix to the problems we encountered during data collection. Our initial deployments showed that storing and processing data locally would significantly increase the data quality (no packet loss) and recording duration (via compression), while reducing the physical burden on the participant. Additionally, supporting interactive applications that react according to a given activity or context called for computational power sufficient to classify sensor traces into usable activity or context labels in real time. Some behavior and health-monitoring applications might also need to store the raw sensor data or inferred behavior statistics locally for additional offline processing to address longer-term trends or to infer more complex human behavior using models. (2) Privacy. The larger group deployment reenforced the importance of considering privacy aspects of data logging. Collecting sensor data, particularly from a microphone, involves recording people in unconstrained and unpredictable situations, both public and private. The results can include recorded information about uninvolved parties without their consent—a scenario that, if raw audio is involved, is always unethical and often illegal. We therefore needed the iPAQ’s computational resources to process the raw audio data on the fly and record useful features. For example, the system needed to record enough information to infer that a conversation had occurred but not enough to reconstruct the words that were spoken.10 |