SimVTP: Simple Video Text Pre-training with Masked Autoencoders.

[2212.03490] SimVTP: Simple Video Text Pre-training with Masked ...

2022/12/07 · This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. We randomly mask out the spatial-temporal tubes of input video.

Simple Video Text Pre-training with Masked Autoencoders

www.semanticscholar.org › paper

SimVTP is a Simple Video-Text Pretraining framework via masked autoencoders that achieves surprisingly good results on MSRVTT, which is far above recent ...

Simple Video Text Pre-training with Masked Autoencoders

www.researchgate.net › publication › 36...

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. We randomly mask out the spatial-temporal tubes of input video.

Yue Ma | Papers With Code

paperswithcode.com › author › yue-ma

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. Ranked #18 on Moment Retrieval on Charades-STA · Contrastive ...

Feint6K

feint6k.github.io

We propose a novel evaluation task for video-text understanding, namely retrieval from counterfactually augmented data (RCAD), and a new Feint6K dataset, to ...

Yue Ma (马跃) - Google Scholar

scholar.google.com › citations

SimVTP: Simple video text pre-training with masked autoencoders. Y Ma, T Yang, Y Shan, X Li. arXiv preprint arXiv:2212.03490, 2022. 19, 2022 ; Multi-branch cross ...

Awesome Self-Supervised Learning in Videos - GitHub

github.com › Malitha123 › awesome-vid...

This repository contains a collection of state-of-the-art self-supervised learning in video approaches for various downstream tasks.

[PDF] arXiv:2312.07395v1 [cs.CV] 12 Dec 2023

arxiv.org › pdf

2023/12/12 · For (1), we propose a two-stage process for pre-training a video encoder: (a) image-to-short video adaptation, and (b) short-to-long video, ...

Tianyu Yang | Papers With Code

paperswithcode.com › author › tianyu-ya...

This paper presents SimVTP: a Simple Video-Text Pretraining framework via masked autoencoders. Ranked #18 on Moment Retrieval on Charades-STA · Contrastive ...

Foundation Models for Video Understanding: A Survey - GitHub

github.com › NeeluMadan › ViFM_Survey

SimVTP: Simple Video Text Pre-training with Masked Autoencoders. (SimVTP) ... SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training.