Masked time series modeling has recently gained much attention as a
self-supervised representation learning strategy for time series. Inspired by
masked image modeling in computer vision, recent works first patchify and
partially mask out time series, and then train Transformers to capture the
dependencies between patches by predicting masked patches from unmasked
patches. However, we argue that capturing such patch dependencies might not be
an optimal strategy for time series representation learning; rather, learning
to embed patches independently results in better time series representations.
Specifically, we propose to use 1) the simple patch reconstruction task, which
autoencode each patch without looking at other patches, and 2) the simple
patch-wise MLP that embeds each patch independently. In addition, we introduce
complementary contrastive learning to hierarchically capture adjacent time
series information efficiently. Our proposed method improves time series
forecasting and classification performance compared to state-of-the-art
Transformer-based models, while it is more efficient in terms of the number of
parameters and training/inference time. Code is available at this repository:
https://github.com/seunghan96/pits.

Expert Commentary: Self-Supervised Representation Learning for Time Series using Patch Reconstruction and Contrastive Learning

Masked time series modeling, a self-supervised representation learning strategy for time series, has gained significant attention in recent years. Inspired by similar techniques in computer vision, researchers have been applying patch-based masking and Transformers to capture dependencies between patches in time series data. However, in this article, the authors argue that this approach may not be the most optimal strategy for time series representation learning.

The proposed method presented in this article suggests two key modifications to improve time series representation learning. Firstly, instead of predicting masked patches from unmasked patches, the authors propose a simple patch reconstruction task where each patch is autoencoded without considering other patches. This approach allows for independent embedding of each patch, resulting in better time series representations.

Furthermore, the authors introduce complementary contrastive learning to hierarchically capture adjacent time series information efficiently. Contrastive learning has been proven effective in various domains, and its application to time series data allows for better capturing of temporal dependencies and patterns.

This method demonstrates improved performance in both time series forecasting and classification compared to existing Transformer-based models. Additionally, it offers computational efficiency with a reduced number of parameters and improved training/inference time.

The multi-disciplinary nature of this work is worth mentioning. It combines concepts from self-supervised learning, computer vision (patch-based modeling), natural language processing (Transformer architectures), and contrastive learning. This interdisciplinary approach allows for the transfer of knowledge and techniques across domains, leading to new insights and improved performance.

One related work that can be referenced is the use of masked language modeling (MLM) in natural language processing, particularly in the context of Transformer-based models like BERT. MLM involves predicting masked words in a sentence, similar to the approach of predicting masked patches in the masked time series modeling. The success of MLM has led to significant advancements in language understanding tasks, and the proposed method in this article draws inspiration from this success to improve time series representation learning.

In conclusion, this article presents a novel approach to self-supervised representation learning for time series data. By leveraging patch reconstruction, patch-wise MLP embeddings, and complementary contrastive learning, significant improvements in time series forecasting and classification performance can be achieved. The multi-disciplinary nature of this work demonstrates the potential for cross-domain knowledge transfer and innovation.

Read the original article