Stafano Mauceri

Stafano Mauceri

PhD Thesis Title : One Class Time Series Classification

Supervisor: Dr Miguel Nicolau

Second Supervisor : Dr James McDermott, NUIG

External Examiner:
Professor Giuseppe Nicosia, University of Cambridge






Abstract

This thesis contributes to the state of the art of time series classification and machine learning by investigating three novel data-driven representations for time series in the context of one-class classification. The one-class assumption is useful for all classification problems where only data of a single class is available for training a classifier, or those where it is not known if novel classes may appear at prediction time or what they could look like. Notable examples that can benefit from our research are: anomaly or novelty detection, fault detection, identity authentication, etc. The common thread of our research is to represent time series as feature-vectors then used for classification. The features we extract are: (1) features constructed using dissimilarity measures; (2) features constructed using an evolutionary algorithm; (3) latent features constructed using neural networks. The proposed representations are thoroughly investigated in a variety of one-class classification experiments involving numerous benchmark methods, the 85 data-sets of the UCR/UEA archive and a data-set provided by ICON plc. The key difference between one-class classification and binary or multi-class classification is in the amount of effort needed to gather training data. Binary and multi-class classifiers require exhaustively labelled training data. This can be difficult for problems where all but the samples of one class are scarcely available and ill-defined, e.g. anomaly detection. Or again, gathering labelled data can simply be impossible due to the cost of expert labour required to construct an appropriate data-set. Conversely, one-class classifiers are trained using only samples from a single class. We present a subject authentication problem through accelerometer data as a case study that motivates our research on one-class time series classification. We argue that it is not realistic to assume we can gather labelled training data that represent well both the subject of interest and a fixed population of "others''. Hence, the need to learn a classifier using data related to the subject of interest only. We demonstrate that, with respect to the use of raw time series, feature-based representations allow substantial and compelling savings in terms of storage and computational requirements, facilitate the interpretability of the solutions found, and enable visualisation of time series data-sets. We find that these advantages come at the cost of a slight loss in terms of classification performance with respect to a 1-nearest neighbour classifier on raw data. However, by examining data-sets one by one we detail how our representations can outperform raw time series. Furthermore, for some applications, e.g. embedded systems, storage and computational requirements may be more important than a slight loss in classification performance.

Discover our Rankings and Accreditations