Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild

Paper Link: https://arxiv.org/pdf/2412.00811

In this paper, we propose a new dataset and algorithm for video moment retrieval, which effectively relieves the high cost of human annotations. Our experiments highlight that:

Compared to the fully supervised approach SimBase, Our ReCorrect model achieves 81.3% and 86.7% of its performance in zero-shot and unsupervised settings.
This narrow performance gap underscores the potential of our Vid-Morp dataset to address the critical challenge of VMR's heavy reliance on manual annotations.

Quick Start

To run the code, use the following command, which integrates the evaluation process for 1) zero-shot, 2) unsupervised, and 3) fully-supervised setting.

python main.py --cfg ./experiment/charades/recorrect_eval_configs_on_ZeroShot+Unsup+Full.json --eval

You do not need any extra downloading to run the code, as the repository is self-contained with necessary features and checkpoints.

CLIP features are available in the data/charades/feat directory.
Pre-trained checkpoints are located in ckpt/charades
- zero_shot.ckpt: zero-shot model.
- unsup.ckpt: unsupervised model.
- full_sup.ckpt: fully supervised model.

Fully Supervised Setting

Method	R@0.1	R@0.2	R@0.3	mIoU
SimBase	77.77	66.48	44.01	56.15
ReCorrect (Ours)	78.55	68.39	45.78	57.42

Zero-Shot Setting

Method	R@0.1	R@0.2	R@0.3	mIoU
ReCorrect	66.54	51.15	28.54	45.63
% of SimBase	85.6%	76.9%	64.8%	81.3%

Unsupervised Setting

Method	R@0.1	R@0.2	R@0.3	mIoU
ReCorrect	70.96	54.42	31.10	48.66
% of SimBase	91.2%	81.9%	70.7%	86.7%

Motivation

A crucial challenge in video moment retrieval is its heavy reliance on extensive manual annoations for training. To overcome this, we introduce a large scale dataset for Video Moment Retrieval Pretraining (Vid-Morp), collected with minimal human involvement. Vid-Morp comprises over 50K in-the-wild videos and 200K pseudo training samples. Models pretrained on Vid-Morp significantly relieve the annotation costs and demonstrate strong generalizability across diverse downstream settings.

Dataset

Dataset Download

To access the dataset download link, please send an email to peijun001@e.ntu.edu.sg. Note the dataset is only for academic usage.

Comparison to Existing Dataset

Citation

If you use our code or dataset in your research, please cite with:

@article{bao2024vid,
  title={Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild},
  author={Bao, Peijun and Kong, Chenqi and Shao, Zihao and Ng, Boon Poh and Er, Meng Hwa and Kot, Alex C},
  journal={arXiv preprint arXiv:2412.00811},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ckpt/charades		ckpt/charades
data/charades		data/charades
experiment/charades		experiment/charades
fig		fig
lib		lib
README.md		README.md
_init_paths.py		_init_paths.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild

Quick Start

Motivation

Dataset

Dataset Download

Comparison to Existing Dataset

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild

Quick Start

Motivation

Dataset

Dataset Download

Comparison to Existing Dataset

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages