Continual Learning

Note: The objective of this project is to develop a model that jointly predicts future video frames and motion (optical flow) in a sequential manner from the same hidden state, and answers user questions regarding the events observed during the video.

Currently, the models are cheating. They memorize the past frame(s) and optical flow(s) and show those as the prediction of the next video frame(s) and optical flow(s). I am working on to fix that issue.

Next Frame(s) Prediction

Next Frame(s) Prediction Gif

Optical Flow(s) Prediction

Optical Flows(s) Prediction Gif

Data

The full dataset can be downloaded from here: http://clevrer.csail.mit.edu.

Training

The training process is summarized in the figures below.

Flow Reconstruction Model Image Reconstruction Model

Flow Reconstruction                         Image Reconstruction

Pipeline

Pipeline

After installing the libraries listed in requirements.txt, the training process can be started using the following code:

python train.py\
    --num_predictions 3\
    --embed_dim 512\
    --hidden_size 512\
    --stride 1\
    --num_frames 127\
    --resize_img 224\
    --patch_size 32

These are all optional parameters, and the code can also run with the simpler command:

python train.py