We propose a new approach for accelerating the video editing process by identifying good moments in time to cut unedited videos. We formulate this problem as a classification task and propose a self-supervised scheme that only requires pre-existing edited videos for training, of which there is large and diverse data readily available. We then propose a contrastive learning framework to train a 3D ResNet model to predict good regions to cut.
We thank Fred Morstatter and Aram Galstyan for helpful discussions, USC Advanced Research Computing for compute resoruces. This website is in part based on a template of Michaël Gharbi.