MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
Minghan LI, Shuai LI, Wangmeng XAING, Lei ZHANG
[arXiv]
Updates
March 28, 2023: Code and paper are now available!
Installation
See installation instructions.
Getting Started
We provide a script train_net.py, that is made to train all the configs provided in MDQE.
Before training: To train a model with "train_net.py" on VIS, first setup the corresponding datasets following Preparing Datasets for MDQE.
Then download pretrained weights in the Model Zoo into the path 'pretrained/coco/*.pth', and run:
python train_net.py --num-gpus 8 \
--config-file configs/R50_ovis_360.yaml
To evaluate a model's performance, use
python train_net.py \
--config-file configs/R50_ovis_360.yaml \
--eval-only \
MODEL.WEIGHTS /path/to/checkpoint_file
Model Zoo
Pretrained weights on COCO
| Name | R50 | Swin-L |
|---|---|---|
| MDQE | model, config | model, config |
OVIS
| Name | Backbone | Frames | AP | Download |
|---|---|---|---|---|
| MDQE | R50 | f4+360p | 29.2 | model, config |
| MDQE | R50 | f4+7260p | 33.0 | model, config |
| MDQE | Swin-L | f2+480p | 41.0 | model, config |
| MDQE | Swin-L | f2+640p | 42.6 | model, config |
YouTubeVIS-2021
| Name | Backbone | Frames | AP | Download |
|---|---|---|---|---|
| MDQE | R50 | f4+360p | 44.5 | model, config |
| MDQE | Swin-L | f3+360p | 56.2 | model, config |
YouTubeVIS-2019
| Name | Backbone | Frames | AP | Download |
|---|---|---|---|---|
| MDQE | R50 | f4+360p | 47.3 | model, config |
| MDQE | Swin-L | f3+360p | 63.0 | model, config |
License
The majority of MDQE is licensed under the Apache-2.0 License. However, portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), VITA(Apache-2.0 License), and Deformable-DETR(Apache-2.0 License).
Citing MDQE
If you use MDQE in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.
@misc{li2023mdqe,
title={MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos},
author={Minghan Li and Shuai Li and Wangmeng Xiang and Lei Zhang},
year={2023},
eprint={2303.14395},
archivePrefix={arXiv},
primaryClass={cs.CV}
}Acknowledgement
Our code is largely based on Detectron2, IFC, Deformable DETR and VITA. We are truly grateful for their excellent work.

