Predicting Deeper into the Future of Semantic Segmentation

We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames.

We explored five different models for this task relying on RGB and/or segmentations from previous frames. For prediction beyond a single future frame, we considered batch models that predict all future frames at once, and autoregressive models that sequentially predict the future frames. We found that autoregressive training produces the best results for our problem, and that models predicting in the segmentation space work better than those relying on the RGB frames.