《AMultiresolutionSymbolicRepresentationofTimeSeries》由会员分享,可在线阅读,更多相关《AMultiresolutionSymbolicRepresentationofTimeSeries(28页珍藏版)》请在金锄头文库上搜索。
1、A Multiresolution Symbolic Representation of Time Series Vasileios MegalooikonomouVasileios MegalooikonomouQiang WangQiang WangGuo LiGuo LiChristos FaloutsosChristos FaloutsosPresented by Rui LiPresented by Rui LiAbstractIntroducing a new representation of time series, the Multiresolution Vector Qua
2、ntized (MVQ) approximationMVQ keeps both local and global MVQ keeps both local and global information about the original time information about the original time series in a hierarchical mechanismseries in a hierarchical mechanismProcessing the original time series at Processing the original time se
3、ries at multiple resolutionsmultiple resolutionsAbstract (cont.)Representation of time series is symbolic employing key subsequences and potentially allows the application of text-based retrieval techniques into the similarity analysis of time series.IntroductionTwo series should be considered simil
4、ar if they have enough non-overlapping time-ordered pairs of subsequences that are similar.Introduction (cont.)Instead of calculating the Euclidean distance, first extract key subsequences utilizing the Vector Quantization (VQ) technique and encode each time series based on the frequency of appearan
5、ce of each key subsequence.Then calculate similarities in terms of key subsequence matches.Introduction (cont.)Hierarchical mechanism: the original time series are processed at several different resolutions, and similarity analysis is performed using a weighted distance function combining all the re
6、solution levelsBackgroundMany of the previous work focus on the avoidance of false dismissals. However, in some cases the existence of too many false alarms may decrease the efficiency of retrieval.The Euclidean distance is not always the optimal distance measure.Background (cont.)For large datasets
7、, the computational complexity associated with the Euclidean distance calculation is a problem ( O(N*n) ).Euclidean distance (point-based model) is vulnerable to shape transformations such as shifting and scaling.Background (cont.)A new framework that utilizes high-level features is proposedCodebook
8、 generationCodebook generationTime series encodingTime series encodingTime series representation and retrievalTime series representation and retrievalIn order to keep both local and global information, use multiple codebooks with different resolutionsBackground (cont.)For each resolution, VQ is appl
9、ied to discover the vocabulary of subsequences (codewords)In VQ, a codeword is used to represent In VQ, a codeword is used to represent a number of similar vectors.a number of similar vectors.The Generalized Lloyd Algorithm is used to produce a “locally optimal” codebook from a training set.Backgrou
10、nd (cont.)To quantitatively measure the similarity between different time series encoded with a VQ codebook, the Histogram Model is employed. where where and refer to the appearance and refer to the appearance frequency of codeword in time series frequency of codeword in time series t t and and q q,
11、 respectively., respectively.Proposed MethodMVQ approximationPartitions each time series into equi-Partitions each time series into equi-length segments and represents each length segments and represents each segment with the most similar key segment with the most similar key subsequence from a code
12、book.subsequence from a codebook.Represent each time series as the Represent each time series as the appearance frequency of each codeword appearance frequency of each codeword in it.in it.Apply at several resolutionsApply at several resolutionsProposed Method (cont.)Codebook GenerationThe dataset i
13、s preprocessedThe dataset is preprocessedEach time series is partitioned into a Each time series is partitioned into a number of segments each of length number of segments each of length l l, and , and each segment forms a sample of the each segment forms a sample of the training set that is used to
14、 generate the training set that is used to generate the codebook.codebook.Each codeword corresponds to a key Each codeword corresponds to a key subsequencesubsequenceExample1Codewords of a 2-level codebookCodewords of a 2-level codebookProposed Method (cont.)Time Series EncodingEvery time series is
15、decomposed into Every time series is decomposed into segments of length segments of length l l. .For each segment, the closest codeword For each segment, the closest codeword in the codebook is found and the in the codebook is found and the corresponding index is used to corresponding index is used
16、to represent this segment.represent this segment.The appearance frequency of each The appearance frequency of each codeword is counted.codeword is counted.Proposed Method (cont.)Time Series Encoding (cont.)The representation of a time series is a The representation of a time series is a vector showi
17、ng the vector showing the appearance frequency of every appearance frequency of every codeword.codeword.Proposed Method (cont.)Time Series SummarizationThe codewords stand for the most The codewords stand for the most representative subsequences for the representative subsequences for the entire dat
18、aset.entire dataset.We can just check the appearance We can just check the appearance frequencies of the codewords and get an frequencies of the codewords and get an overview of the time series.overview of the time series.Example2Proposed Method (cont.)Distance Measure and Multiresolution Representa
19、tionUsing only one codebook (single Using only one codebook (single resolution) introduces problemsresolution) introduces problemsThe order among the indices of codewords is The order among the indices of codewords is not kept; some important global information not kept; some important global inform
20、ation is lostis lostIncreasing false alarmsIncreasing false alarmsProposed Method (cont.)Distance Measure and Multiresolution Representation (cont.)A hierarchical mechanism is introduced.A hierarchical mechanism is introduced.Several different resolutions are involved.Several different resolutions a
21、re involved.higher resolution higher resolution local information local informationlower resolution lower resolution global information global informationExample3Reconstruction of time series using Reconstruction of time series using different resolutionsdifferent resolutionsProposed Method (cont.)D
22、istance Measure and Multiresolution Representation (cont.)By being assigned different weights to By being assigned different weights to different resolutions, a weighted different resolutions, a weighted similarity measure (Hierarchical similarity measure (Hierarchical Histogram Model) is defined:Hi
23、stogram Model) is defined:ExperimentsBest Matches RetrievalSYNDATASYNDATA6 classes; 100 time series for each class; 60 6 classes; 100 time series for each class; 60 points for each time seriespoints for each time seriesExperiments (cont.)Best Matches Retrieval (cont.)CAMMOUSECAMMOUSE1600 points for each time series1600 points for each time seriesExperiments (cont.)Best Matches Retrieval (cont.)Comparisons with other methodsComparisons with other methodsExperiments (cont.)ClusteringSYNDATASYNDATAExperiments (cont.)Clustering (cont.)CAMMOUSECAMMOUSEThank you!