Visual Representation
- Masked Autoencoders Are Scalable Vision Learners-2021<Paper>
- Efficient Self-supervised Vision Transformers for Representation Learning-2021<Paper>
- Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations-2021<Paper><Code-PyTorch>
Video Understanding
- ViViT: A Video Vision Transformer-Arxiv2021(<Paper><Code-PyTorch>)
- Video Swin Transformer-Arxiv2021<Paper><Code-PyTorch>
- AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE-ICLR2021<Paper><Code-PyTorch>
- A Survey of Transformers-Arxiv2021<Paper>
- Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features-CBMI2021<Paper>
- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision-ICML2021<Paper><Code-PyTorch>
- Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers-CVPR2021<Paper>
- TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval-2021<Paper><Code-PyTorch>
- Cross-Modal Retrieval Augmentation for Multi-Modal Classification-2021<Paper>
- Continual learning in cross-modal retrieval-CVPR2021<Paper>
- Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning-CVPR2021<[Paper]><Code>
- Visual Semantic Role Labeling for Video Understanding-CVPR2021<Paper><Code-PyTorch>
- Perceiver: General Perception with Iterative Attention-2021<Paper>
- Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning-CVPR2021<Paper><Code-PyTorch>
- Hyperbolic Visual Embedding Learning for Zero-Shot Recognition-CVPR2020<Paper><Code-PyTorch>
- Retrieve Fast, Rerank Smart:Cooperative and Joint Approaches for Improved Cross-Modal Retrieval-2021<Paper><Code-PyTorch>
- What is Multimodality?<Paper>
- Multi-modal Transformer for Video Retrieval-ECCV2020<Paper><Code-PyTorch>
- Support-set Bottlenecks for Video-text Representation Learning-ICLR2021<Paper>
- Dual Encoding for Video Retrieval by Text-TPAMI2021<Paper><Code-PyTorch>
- Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling-CVPR2021<Paper><Code>
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations-ICLR2020<Paper><Code-PyTorch>
- Transformer is All You Need:Multimodal Multitask Learning with a Unified Transformer-2021<Paper><Code>
- COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning-NeurIPS2020<Paper><Code-PyTorch>
- Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning-CVPR2020<Paper><Code-PyTorch>
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers-EMNLP2019<Paper><Code-PyTorch>
- VisualBERT: A Simple and Performant Baseline for Vision and Language-2019<Paper><Code-PyTorch>
- Video SemNet: Memory-Augmented Video Semantic Network-NIPS2017<Paper>
- Self-Supervised Video Representation Learning by Pace Prediction-ECCV2020<Paper><Code-PyTorch>
- SalSum: Saliency-based Video Summarization using Generative Adversarial Networks-2020<Paper>
- Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition-2020<Paper><Code-PyTorch><Zhihu>
- Classification of Important Segments in Educational Videos using Multimodal Features-CIKM2020<Paper><Code-Keras>
- Attentive and Adversarial Learning for Video Summarization-WACV2019<Paper><Code-PyTorch>
- Digital Video Summarization Techniques: A Survey-2020<Paper>
- Emerging Trends of Multimodal Research in Vision and Language-2020<Paper>
- Exploring global diverse attention via pairwise temporal relation for video summarization-2020<Paper>
- Multi-modal Dense Video Captioning-CVPR Workshops 2020<Paper><Code-PyTorch><Project>
- Accuracy and Performance Comparison of Video Action Recognition Approaches-HPEC2020<Paper>
- What Makes Training Multi-Modal Classification Networks Hard?-CVPR2020<Paper>
- [DMASum] Query Twice: Dual Mixture Attention Meta Learning for Video Summarization-ACM2020<Paper>
- [CHAN] Convolutional Hierarchical Attention Network for Query-Focused Video Summarization<Paper><Code-PyTorch>
- [ILS-SUMM] ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization-ICPR2020<Paper><Code>
- Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward-AAAI2018<Paper><Code-PyTorch>
- [SUM-GAN] Unsupervised video summarization with adversarial lstm networks-CVPR2017<Paper><Code-PyTorch>
- Enhancing Video Summarization via Vision-Language Embedding-CVPR2017<Paper>
- Query-adaptive Video Summarization via Quality-aware Relevance Estimation-ICCV2017<Paper><Code-Theano>
- Temporal Tessellation: A Unified Approach for Video Analysis-ICCV2017<Paper><Code-Tensorflow>
- Video Summarization with Long Short-term Memory-ECCV2016<Paper><Code-Theano><Code-Keras>
- Video Summarization using Deep Semantic Features-ACCV2016<Paper><Code-Chainer>
- [SA-LSTM] Describing Videos by Exploiting Temporal Structure-ICCV2015<Paper><Code-Theano><Code-PyTorch>
- [3D-ResNet] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?-CVPR2018<Paper><Code-PyTorch>
- [Hidden Two-Stream] Hidden Two-Stream Convolutional Networks for Action Recognition-ACCV2018<Paper><Code-PyTorch>
- [FlowNet2] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks<[Paper(https://arxiv.org/pdf/1612.01925.pdf)]><Code-PyTorch>
- [TSN] Temporal Segment Networks Towards Good Practices for Deep Action Recognition-ECCV2016<Paper><Code-Caffe><Code-PyTorch>
- Towards Good Practices for Very Deep Two-Stream ConvNets<Paper><Code-Caffe>
- [Two-Stream] Two-Stream Convolutional Networks for Action Recognition in Videos-NIPS2014<Paper>
- [C3D] Learning Spatiotemporal Features with 3D Convolutional Networks-ICCV2015<Paper><Code-Caffe><Code-Tensorflow><Code-PyTorch>
- [NetVLAD] NetVLAD: CNN architecture for weakly supervised place recognition-CVPR2016<Paper><Code-Matlab><Code-PyTorch>
Semantic Segmentation
1.[AdaptSegNet] Learning to Adapt Structured Output Space for Semantic Segmentation-CVPR2018<Paper><Code-PyTorch>
2.[DAM/DCM] Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss-IJCAI2018<Paper>
3.[FCAN] Fully Convolutional Adaptation Networks for Semantic Segmentation-CVPR2018<Paper>
4.[DenseASPP] DenseASPP for Semantic Segmentation in Street Scenes-CVPR2018<Paper><Code-PyTorch>
5.Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation-CVPR2018<Paper>
6.[AotofocusLayer] Autofocus Layer for Semantic Segmentation-MICCAI2018<Paper><Code-PyTorch>
7.[PDV-Net] Automatic Segmentation of Pulmonary Lobes Using a Progressive Dense V-Network-MICCAI2018<Paper>
8.[RR-SegSE] Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI-MICCAI2018<Paper><Code>
9.[HD-Net] Fine-Grained Segmentation Using Hierarchical Dilated Neural Networks-MICCAI2018<Paper>
10.[U-JAPA-Net] 3D U-JAPA-Net: Mixture of Convolutional Networks for Abdominal Multi-organ CT Segmentation-MICCAI2018<Paper>
11.[CompNet] CompNet: Complementary Segmentation Network for Brain MRI Extraction-MICCAI2018<Paper><Code-Keras>
12.Deep Learning-Based Boundary Detection for Model-Based Segmentation with Application to MR Prostate Segmentation-MICCAI2018<Paper>
13.[RS-Net] RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours-MICCAI2018<Paper><Code>
14.CT-Realistic Lung Nodule Simulation from 3D Conditional Generative Adversarial Networks for Robust Lung Segmentation-MICCAI2018<Paper>
15.[CB-GANs] Learning Data Augmentation for Brain Tumor Segmentation with Coarse-to-Fine Generative Adversarial Networks-MICCAI2018<Paper>
16.[FSENet] Focus, Segment and Erase: An Efficient Network for Multi-Label Brain Tumor Segmentation-ECCV2018<Paper><Code-PyTorch>
17.[DeepLabv3+] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation-ECCV2018<Paper><Code-Tensorflow>
18.[ExFuse] ExFuse: Enhancing Feature Fusion for Semantic Segmentation-ECCV2018<Paper>
19.[ESPNet] ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation-ECCV2018<Paper><Code-PyTorch>
20.[EncNet] Context Encoding for Semantic Segmentation-CVPR2018<Paper><Code-PyTorch>
21.[PSPNet] Pyramid Scene Parsing Network-CVPR2017<Paper><Code-Caffe>
22.[DANet] Dual Attention Network for Scene Segmentation-CVPR2019<Paper><Code-PyTorch>
23.[BiSeNet] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation-ECCV-2018<Paper><Code-PyTorch>
24.[Fast-SCNN] Fast-SCNN: Fast Semantic Segmentation Network-2019<Paper><Code-PyTorch>
25.[ICNet] ICNet for Real-Time Semantic Segmentation on High-Resolution Images-ECCV2018<Paper><Code-PyTorch><Code-Caffe>
26.[DUNet] Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation-CVPR2019<Paper><Code-PyTorch>
27.[ENet] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation-2016<Paper><Code-Caffe><Code-PyTorch><Code-Tensorflow>
28.[CCNet] CCNet: Criss-Cross Attention for Semantic Segmentation-2018<Paper><Code-PyTorch>
29.[OCNet] OCNet: Object Context Network for Scene Parsing-2018<Paper><Code-PyTorch>
30.[HRNet] High-Resolution Representations for Labeling Pixels and Regions-2019<Paper><Code-PyTorch>
Panoptic Segmentation
1.[Panoptic FPN] Panoptic Feature Pyramid Networks-Arxiv2019<Paper>
Super-Resolution
1.[mDCSRN] Efficient and Accurate MRI Super-Resolution using a Generative Adversarial Network and 3D Multi-Level Densely Connected Network-MICCAI2018<Paper>
2.[RDN] Residual Dense Network for Image Super-Resolution-CVPR2018<Paper><Code-Torch><Code-PyTorch><Code-Tensorflow>
Networks Architecture
1.[DLA] Deep Layer Aggregation-CVPR2018<Paper><Code-PyTorch>
2.[DualSkipNet] Dual Skipping Networks-CVPR2018<Paper>
3.[SkipNet] SkipNet: Learning Dynamic Routing in Convolutional Networks-ECCV2018<Paper><Code-PyTorch>
4.[DRN] Dilated Residual Networks-CVPR2017<Paper><Code-PyTorch>
5.[CapsNet] Dynamic Routing Between Capsules-NIPS2017<Paper><Code-Tensorflow>
6.[BlockQNN] Practical Block-wise Neural Network Architecture Generation-CVPR2018<Paper>
7.[MobileNetV2] MobileNetV2: Inverted Residuals and Linear Bottlenecks-CVPR2018<Paper><Code-Tensorflow>
8.[Non-Local] Non-local Neural Networks-CVPR2018<Paper><Code>
Loss Function
1.[FocalLoss]Focal Loss for Dense Object Detection-ICCV2017<Paper><Code-Caffe2>