×
2022/12/07 · This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. We randomly mask out the spatial-temporal tubes of input video.
www.semanticscholar.org からのSimVTP: Simple Video Text Pre-training with Masked Autoencoders.
SimVTP is a Simple Video-Text Pretraining framework via masked autoencoders that achieves surprisingly good results on MSRVTT, which is far above recent ...
This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. We randomly mask out the spatial-temporal tubes of input video.
This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. Ranked #18 on Moment Retrieval on Charades-STA · Contrastive ...
We propose a novel evaluation task for video-text understanding, namely retrieval from counterfactually augmented data (RCAD), and a new Feint6K dataset, to ...
SimVTP: Simple video text pre-training with masked autoencoders. Y Ma, T Yang, Y Shan, X Li. arXiv preprint arXiv:2212.03490, 2022. 19, 2022 ; Multi-branch cross ...
This repository contains a collection of state-of-the-art self-supervised learning in video approaches for various downstream tasks.
2023/12/12 · For (1), we propose a two-stage process for pre-training a video encoder: (a) image-to-short video adaptation, and (b) short-to-long video, ...
This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. Ranked #18 on Moment Retrieval on Charades-STA · Contrastive ...
SimVTP: Simple Video Text Pre-training with Masked Autoencoders. (SimVTP) ... SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training.