**This is an old revision of the document!**

https://arxiv.org/abs/1612.06370v1.pdf Learning Features by Watching Objects Move

https://arxiv.org/abs/1612.05596 Event-driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines

https://arxiv.org/abs/1610.09513 Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences

https://arxiv.org/abs/1611.06678v1 Deep Temporal Linear Encoding Networks

We present a new video representation, called temporal linear encoding (TLE) and embedded inside of CNNs as a new layer, which captures the appearance and motion throughout entire videos. It encodes this aggregated information into a robust video feature representation, via end-to-end learning. Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and © they model feature interactions in a more expressive way and without loss of information.

https://arxiv.org/abs/1611.06624v1 Temporal Generative Adversarial Nets

The temporal generator consists of 1D deconvolutional layers and outputs a set of latent variables, each of which corresponds to a frame in the generated video, and the image generator transforms them into a video with 2D deconvolutional layers.

https://arxiv.org/abs/1608.00486v3 Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification

We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences.

https://arxiv.org/abs/1608.08242v1 Temporal Convolutional Networks: A Unified Approach to Action Segmentation

We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

https://arxiv.org/pdf/1609.09444v2.pdf Contextual RNN-GANs for Abstract Reasoning Diagram Generation

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting, simulation, or video generation. Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence. For this, we develop a novel Contextual Generative Adversarial Network based on Recurrent Neural Networks (Context-RNN-GANs), where both the generator and the discriminator modules are based on contextual history (modeled as RNNs) and the adversarial discriminator guides the generator to produce realistic images for the particular time step in the image sequence.

https://arxiv.org/abs/1612.01254 Deep Symbolic Representation Learning for Heterogeneous Time-series Classification

https://arxiv.org/abs/1603.06995 Multi-Scale Convolutional Neural Networks for Time Series Classification

https://arxiv.org/pdf/1612.05596v2.pdf Neuromorphic Deep Learning Machines

https://arxiv.org/abs/1702.04649 Generative Temporal Models with Memory

We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs.

https://arxiv.org/abs/1703.01541 Soft-DTW: a Differentiable Loss Function for Time-Series

We propose in this paper a differentiable learning loss between time series. Our proposal builds upon the celebrated Dynamic Time Warping (DTW) discrepancy. Unlike the Euclidean distance, DTW is able to compare asynchronous time series of varying size and is robust to elastic transformations in time. To be robust to such invariances, DTW computes a minimal cost alignment between time series using dynamic programming. Our work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs. We show in this paper that soft-DTW is a differentiable loss function, and that both its value and its gradient can be computed with quadratic time/space complexity (DTW has quadratic time and linear space complexity). We show that our regularization is particularly well suited to average and cluster time series under the DTW geometry, a task for which our proposal significantly outperforms existing baselines (Petitjean et al., 2011). Next, we propose to tune the parameters of a machine that outputs time series by minimizing its fit with ground-truth labels in a soft-DTW sense.

https://pdfs.semanticscholar.org/b94c/cb595375bf57617575454b418fc6371b1d7c.pdf Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks

We propose a novel deep learning framework for multivariate time series classification. We conduct two groups of experiments on real-world data sets from different application domains. The final results show that our model is not only more effi- cient than the state of the art but also competitive in accuracy.

https://aaltd16.irisa.fr/files/2016/08/AALTD16_paper_9.pdf Data Augmentation for Time Series Classification using Convolutional Neural Networks

To improve the performance of this CNN when faced with small training sets, we propose two approaches to artificially increase the size of training sets. The first one is based on data-augmentation techniques. The second one consists in mixing different training sets and learning the network in a semi-supervised way. We show that these two approaches improve the overall classification performance. As a future work, we intend to improve the warping approach by considering more warping ratios and use more datasets to learn better feature extractors.

https://arxiv.org/pdf/1702.03584v1.pdf Similarity Preserving Representation Learning for Time Series Analysis

hms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with equal or unequal lengths to a matrix format. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation. Therefore, the learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of n time series, we first construct an n×n partially observed similarity matrix by randomly sampling O(n log n) pairs of time series and computing their pairwise similarities.

https://arxiv.org/pdf/1611.06455v4.pdf Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

https://arxiv.org/abs/1610.04783v1 Similarity Learning for Time Series Classification

https://arxiv.org/pdf/1610.07258.pdf Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization

We propose a new model based on the deconvolutional networks and SAX discretization to learn the representation for multivariate time series. Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning. We design a network structure specifically to capture the cross-channel correlation with deconvolution, forcing the pooling operation to perform the dimension reduction along each position in the individual channel. SAX discretization is applied on the feature vectors to further extract the bag of features. We show how this representation and bag of features helps on classification. A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach. We further build the Markov matrix from the discretized representation to visualize the time series as complex networks, which show more statistical properties and clear class-specific structures with respect to different labels.

https://github.com/codeaudit/deepDiagnosis

https://github.com/clinicalml/structuredinference https://arxiv.org/abs/1609.09869 Structured Inference Networks for Nonlinear State Space Models

Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.

https://arxiv.org/abs/1511.05121 Deep Kalman Filters

https://arxiv.org/pdf/1611.05267.pdf Temporal Convolutional Networks for Action Segmentation and Detection

We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks

https://github.com/pbashivan/EEGLearn A set of functions for supervised feature learning/classification of mental states from EEG based on “EEG images”. This code can be used to construct sequence of images (EEG movie snippets) from ongoing EEG activities and to classify between different cognitive states through recurrent-convolutional neural nets. More generally it could be used to discover patterns in multi-channel timeseries recordings with known spatial relationship between sensors.

https://arxiv.org/pdf/1606.01865.pdf Recurrent Neural Networks for Multivariate Time Series with Missing Values

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provides useful insights for better understanding and utilization of missing values in time series analysis.

http://staff.ustc.edu.cn/~cheneh/paper_pdf/2016/YiZheng-FCS2016.pdf Exploiting Multi-Channels Deep Convolutional Neural Networks for Multivariate Time Series Classification

This model first learns features from individual univariate time series in each channel, and combines information from all channels as feature representation at the final layer. Then, the learnt features are applied into a Multilayer Perceptron (MLP) for classification. Finally, the extensive experiments on real world data sets show that our model is not only more efficient than the state of the art but also competitive in accuracy. This study implies that feature learning is worth to be investigated for the problem of time series classification.

https://arxiv.org/pdf/1702.03584v2.pdf Similarity Preserving Representation Learning for Time Series Analysis

The learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of n time series, we first construct an n×n partially observed similarity matrix by randomly sampling O(n log n) pairs of time series and computing their pairwise similarities. We then propose an extremely effi- cient algorithm that solves a highly non-convex and NP-hard problem to learn new features based on the partially observed similarity matrix. We use the learned features to conduct experiments on both data classification and clustering tasks.

https://arxiv.org/abs/1703.04691v2 Conditional Time Series Forecasting with Convolutional Neural Networks

Conditional time series forecasting based on the recent WaveNet architecture. The proposed network contains stacks of dilated convolutions that widen the receptive field of the forecast; multiple convolutional filters are applied in parallel to separate time series and allow for the fast processing of data and the exploitation of the correlation structure between the multivariate time series. The performance of the deep convolutional neural network is analyzed on various multivariate time series including commodities data and stock indices and compared to that of the well-known autoregressive model and a fully convolutional network. We show that our network is able to effectively learn dependencies between the series without the need of long historical time series and significantly outperforms the baseline neural forecasting models.

https://arxiv.org/pdf/1703.07015v1.pdf Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) to extract short-term local dependency patterns among variables, and the Recurrent Neural Network (RNN) to discover long-term patterns and trends. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods.

https://arxiv.org/abs/1703.09938v3 Grouped Convolutional Neural Networks for Multivariate Time Series

Our algorithms exploit the covariance structure over multiple time series to partition input volume into groups. The first algorithm learns the group CNN structures explicitly by clustering individual input sequences. The second algorithm learns the group CNN structures implicitly from the error backpropagation. In experiments with two real-world datasets, we demonstrate that our group CNNs outperform existing CNN based regression methods.

http://www.cs.toronto.edu/~graves/icml_2006.pdf Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

https://arxiv.org/abs/1704.04110 DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

https://arxiv.org/abs/1704.06199v1 Dynamic Graph Convolutional Networks

Our goal is to jointly exploit structured data and temporal information through the use of a neural network model.

https://arxiv.org/abs/1606.00972v2 Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet

We show that a spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-temporal filters to capture spatial-temporal patterns of different scales. The model can be learned from the training video sequences by an “analysis by synthesis” learning algorithm that iterates the following two steps. Step 1 synthesizes video sequences from the currently learned model. Step 2 then updates the model parameters based on the difference between the synthesized video sequences and the observed training sequences. We show that the learning algorithm can synthesize realistic dynamic patterns.

http://www.stat.ucla.edu/~jxie/STGConvNet/STGConvNet.html

https://arxiv.org/pdf/1705.09137v2.pdf Neural Decomposition of Time-Series Data for Effective Generalization

We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components.

http://arxiv.org/abs/1706.02735v1 CortexNet: a Generic Network Family for Robust Visual Temporal Representations

CortexNet, which features not only bottom-up feed-forward connections, but also it models the abundant top-down feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes - the unsupervised MatchNet and weakly supervised TempoNet modes - where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://engineering.purdue.edu/elab/CortexNet/.

https://arxiv.org/abs/1703.04122 Autoregressive Convolutional Neural Networks for Asynchronous Time Series

We propose 'Significance-Offset Convolutional Neural Network', a deep convolutional network architecture for multivariate time series regression. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of sub-predictors while the weights are data-dependent functions learnt through a convolutional network.The architecture was designed for applications on asynchronous time series with low signal-to-noise ratio and hence is evaluated on such datasets: a hedge fund proprietary dataset of over2 million quotes for a credit derivative index andan artificially generated noisy autoregressive series. The proposed architecture achieves promising results compared to convolutional and recur-rent neural networks. The code for the numerical experiments and the architecture implementation will be shared online to make the research reproducible.

https://arxiv.org/pdf/1706.08838v1.pdf TimeNet: Pre-trained deep recurrent neural network for time series classification

https://arxiv.org/abs/1709.01907v1 Deep and Confident Prediction for Time Series at Uber

At Uber, probabilistic time series forecasting is used for robust prediction of number of trips during special events, driver incentive allocation, as well as real-time anomaly detection across millions of metrics. Classical time series models are often used in conjunction with a probabilistic formulation for uncertainty estimation. However, such models are hard to tune, scale, and add exogenous variables to. Motivated by the recent resurgence of Long Short Term Memory networks, we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation. We provide detailed experiments of the proposed solution on completed trips data, and successfully apply it to large-scale time series anomaly detection at Uber.

https://arxiv.org/pdf/1707.01926.pdf Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow.

https://arxiv.org/abs/1702.04649 https://arxiv.org/abs/1702.04649

We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs.