Rank Pooling Approach for Wearable Sensor-based ADLs Recognition

Muhammad Adeel Nisar1, Kimiaki Shirahama2, Frederic Li1, Xinyu Huang1, Marcin Grzegorzek1

1 Institute of Medical Informatics, University of Lübeck, Lübeck, Germany 2 Department of Informatics, Kindai University, Japan



We use wearable devices to recognize activities of daily living (ADLs) which are composed of several repetitive and concurrent short movements that have temporal dependencies. It is improbable to directly use sensor data to recognize these long term (or composite) activities because two executions of same ADL will result into largely diverse sensory data, however, they may be similar in terms of more semantic and meaningful underlying short term actions (or atomic activities). Therefore, we propose a two-level hierarchical model for the recognition of ADLs. Firstly, atomic activities are detected and their probabilistic scores are generated at the lower level. Secondly, we need to deal with the temporal transitions of the atomic activities so we use a temporal pooling method, rank pooling. Rank pooling enables us to encode the ordering of atomic activities by using their scores, at the higher level of our model. Rank pooling is performed by learning a function via ranking machines. The parameters of these ranking machines are used as features to characterize composite activities. Classifiers trained on such features effectively recognize composite activities and produce 5-13 percent of improvement in results as compared to the other popularly used techniques. We also produce a large data-set of 61 atomic and 7 composite activities for our experiments.


Contribution

This paper offers two main contributions:



Figure-A: The Figure shows an overview of our proposed two-level hierarchical model that recognizes atomic activities like sitting, standing, walking, standing up, squatting, opening and closing door at the lower level. The higher level of our recognition model recognizes the composite activities like cleaning room and preparing food.

The atomic activities are detected by the codebook approach Codebook approach [1] which outputs the probabilistic scores of each of the atomic activity and rank pooling is used to construct feature vectors for the composite activities by using these probabilistic scores. Classifiers trained on feature vectors obtained from rank pooling turn out to be very effective for distinguishing composite activities. We evaluate our results obtained by rank pooling while comparing them with the other basic pooling techniques like average and max pooling and also with HMM and LSTM pooling. We found that the rank pooling is performing 5 to 13% better in accuracy than the ones produced by other pooling techniques.

Please see here for an extension of the codebook approach in [1].



Figure-B: The Figure shows our composite activity recognition approach. The atomic scores are converted to time varying mean vectors and then provided to the ranking machines. The parameters of these ranking machines are used as feature vectors for composite activities. the standard classification is performed on these feature vectors. A detailed discussion on rank pooling can be found in our paper.



The Cognitive Village (CogAge) dataset


In our experiments, we use Cognitive Village dataset. We collected data for the daily life activities by using unobtrusive wearable devices like smart phone, smart watches and smart glasses. The details of these three wearable devices and the embedded-sensors are described below:

We use eight sensor modalities provided by these wearable devices as mentioned in the above listings. The sensors data from JINS-glasses and Huawei-watch are initially sent to the LG-G5 smart phone via Bluetooth connection and then all sensors data are further sent to our home-gateway through Rabbit-MQ by using Wi-Fi connection, where our atomic and composite activity recognition methods are executed.

We have collected data for two kinds of activities, i.e. atomic and composite activities.

Atomic Activities Dataset: The data acquisition process targets 61 different atomic activities involving 8 subjects who contributed to collect over 9700 samples. The data collection was carried out separately for training and testing phases on different days because we intended to include variations while performing the activities. In each phase the subjects were asked to wear the three devices and perform 10 executions of each activity where every execution lasts for 5 seconds. After the removal of executions where data were not appropriately recorded due to sensor errors, we use 9029 instances for our experiments. The atomic activities are split into two distinct categories: 6 state activities characterizing the posture of a subject, and 55 behavioral activities characterizing his or her behavior. The complete list of activities is provided in the table below. It can be noted that a behavioral activity can be performed while being in a particular state, e.g. rubbing hands can be performed while either sitting, standing or lying.


Table-A: Table of the 61 atomic activities of the CogAge dataset, split between state and behavioral activities.


Composite activities Dataset: The data acquisition process for 7 composite activities was performed by six subjects using the same three wearable devices. An Android-based data acquisition application was developed to collect data for composite activities. The application connected the smart watch and the smart glasses to smart phone via Bluetooth and saved the data of 8 sensory modalities locally on the smart phone's memory. In this way, it became convenient for the subjects to move with the set of devices to their kitchens, washrooms or living rooms, and perform the activities naturally. Like the data collection process of atomic activities, the data for composite activities was also collected separately for training and testing phases on different days. We collected over 1000 instances of composite activities however the experiments were performed on 890 instances because some of the instances had missing sensory data and were removed from the dataset later. The length of each activity is not fixed as analogous to real life events and it varies from 5 minutes to 30 seconds because some composite activities, like preparing food take long time to be completed and on the other hand there are some short-term composite activities for example handling medications.


Table-B: Table shows the list of composite activities and their count.

Results (Confusion Matrices)