Paper-reading List

Visual Representation

Masked Autoencoders Are Scalable Vision Learners-2021<Paper>
Efficient Self-supervised Vision Transformers for Representation Learning-2021<Paper>
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations-2021<Paper><Code-PyTorch>

Video Understanding

ViViT: A Video Vision Transformer-Arxiv2021(<Paper><Code-PyTorch>)
Video Swin Transformer-Arxiv2021<Paper><Code-PyTorch>
AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE-ICLR2021<Paper><Code-PyTorch>
A Survey of Transformers-Arxiv2021<Paper>
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features-CBMI2021<Paper>
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision-ICML2021<Paper><Code-PyTorch>
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers-CVPR2021<Paper>
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval-2021<Paper><Code-PyTorch>
Cross-Modal Retrieval Augmentation for Multi-Modal Classification-2021<Paper>
Continual learning in cross-modal retrieval-CVPR2021<Paper>
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning-CVPR2021<[Paper]><Code>
Visual Semantic Role Labeling for Video Understanding-CVPR2021<Paper><Code-PyTorch>
Perceiver: General Perception with Iterative Attention-2021<Paper>
Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning-CVPR2021<Paper><Code-PyTorch>
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition-CVPR2020<Paper><Code-PyTorch>
Retrieve Fast, Rerank Smart:Cooperative and Joint Approaches for Improved Cross-Modal Retrieval-2021<Paper><Code-PyTorch>
What is Multimodality?<Paper>
Multi-modal Transformer for Video Retrieval-ECCV2020<Paper><Code-PyTorch>
Support-set Bottlenecks for Video-text Representation Learning-ICLR2021<Paper>
Dual Encoding for Video Retrieval by Text-TPAMI2021<Paper><Code-PyTorch>
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling-CVPR2021<Paper><Code>
VL-BERT: Pre-training of Generic Visual-Linguistic Representations-ICLR2020<Paper><Code-PyTorch>
Transformer is All You Need:Multimodal Multitask Learning with a Unified Transformer-2021<Paper><Code>
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning-NeurIPS2020<Paper><Code-PyTorch>
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning-CVPR2020<Paper><Code-PyTorch>
LXMERT: Learning Cross-Modality Encoder Representations from Transformers-EMNLP2019<Paper><Code-PyTorch>
VisualBERT: A Simple and Performant Baseline for Vision and Language-2019<Paper><Code-PyTorch>
Video SemNet: Memory-Augmented Video Semantic Network-NIPS2017<Paper>
Self-Supervised Video Representation Learning by Pace Prediction-ECCV2020<Paper><Code-PyTorch>
SalSum: Saliency-based Video Summarization using Generative Adversarial Networks-2020<Paper>
Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition-2020<Paper><Code-PyTorch><Zhihu>
Classification of Important Segments in Educational Videos using Multimodal Features-CIKM2020<Paper><Code-Keras>
Attentive and Adversarial Learning for Video Summarization-WACV2019<Paper><Code-PyTorch>
Digital Video Summarization Techniques: A Survey-2020<Paper>
Emerging Trends of Multimodal Research in Vision and Language-2020<Paper>
Exploring global diverse attention via pairwise temporal relation for video summarization-2020<Paper>
Multi-modal Dense Video Captioning-CVPR Workshops 2020<Paper><Code-PyTorch><Project>
Accuracy and Performance Comparison of Video Action Recognition Approaches-HPEC2020<Paper>
What Makes Training Multi-Modal Classification Networks Hard?-CVPR2020<Paper>
[DMASum] Query Twice: Dual Mixture Attention Meta Learning for Video Summarization-ACM2020<Paper>
[CHAN] Convolutional Hierarchical Attention Network for Query-Focused Video Summarization<Paper><Code-PyTorch>
[ILS-SUMM] ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization-ICPR2020<Paper><Code>
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward-AAAI2018<Paper><Code-PyTorch>
[SUM-GAN] Unsupervised video summarization with adversarial lstm networks-CVPR2017<Paper><Code-PyTorch>
Enhancing Video Summarization via Vision-Language Embedding-CVPR2017<Paper>
Query-adaptive Video Summarization via Quality-aware Relevance Estimation-ICCV2017<Paper><Code-Theano>
Temporal Tessellation: A Unified Approach for Video Analysis-ICCV2017<Paper><Code-Tensorflow>
Video Summarization with Long Short-term Memory-ECCV2016<Paper><Code-Theano><Code-Keras>
Video Summarization using Deep Semantic Features-ACCV2016<Paper><Code-Chainer>
[SA-LSTM] Describing Videos by Exploiting Temporal Structure-ICCV2015<Paper><Code-Theano><Code-PyTorch>
[3D-ResNet] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?-CVPR2018<Paper><Code-PyTorch>
[Hidden Two-Stream] Hidden Two-Stream Convolutional Networks for Action Recognition-ACCV2018<Paper><Code-PyTorch>
[FlowNet2] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks<[Paper(https://arxiv.org/pdf/1612.01925.pdf)]><Code-PyTorch>
[TSN] Temporal Segment Networks Towards Good Practices for Deep Action Recognition-ECCV2016<Paper><Code-Caffe><Code-PyTorch>
Towards Good Practices for Very Deep Two-Stream ConvNets<Paper><Code-Caffe>
[Two-Stream] Two-Stream Convolutional Networks for Action Recognition in Videos-NIPS2014<Paper>
[C3D] Learning Spatiotemporal Features with 3D Convolutional Networks-ICCV2015<Paper><Code-Caffe><Code-Tensorflow><Code-PyTorch>
[NetVLAD] NetVLAD: CNN architecture for weakly supervised place recognition-CVPR2016<Paper><Code-Matlab><Code-PyTorch>

Semantic Segmentation

1.[AdaptSegNet] Learning to Adapt Structured Output Space for Semantic Segmentation-CVPR2018<Paper><Code-PyTorch>
2.[DAM/DCM] Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss-IJCAI2018<Paper>
3.[FCAN] Fully Convolutional Adaptation Networks for Semantic Segmentation-CVPR2018<Paper>
4.[DenseASPP] DenseASPP for Semantic Segmentation in Street Scenes-CVPR2018<Paper><Code-PyTorch>
5.Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation-CVPR2018<Paper>
6.[AotofocusLayer] Autofocus Layer for Semantic Segmentation-MICCAI2018<Paper><Code-PyTorch>
7.[PDV-Net] Automatic Segmentation of Pulmonary Lobes Using a Progressive Dense V-Network-MICCAI2018<Paper>
8.[RR-SegSE] Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI-MICCAI2018<Paper><Code>
9.[HD-Net] Fine-Grained Segmentation Using Hierarchical Dilated Neural Networks-MICCAI2018<Paper>
10.[U-JAPA-Net] 3D U-JAPA-Net: Mixture of Convolutional Networks for Abdominal Multi-organ CT Segmentation-MICCAI2018<Paper>
11.[CompNet] CompNet: Complementary Segmentation Network for Brain MRI Extraction-MICCAI2018<Paper><Code-Keras>
12.Deep Learning-Based Boundary Detection for Model-Based Segmentation with Application to MR Prostate Segmentation-MICCAI2018<Paper>
13.[RS-Net] RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours-MICCAI2018<Paper><Code>
14.CT-Realistic Lung Nodule Simulation from 3D Conditional Generative Adversarial Networks for Robust Lung Segmentation-MICCAI2018<Paper>
15.[CB-GANs] Learning Data Augmentation for Brain Tumor Segmentation with Coarse-to-Fine Generative Adversarial Networks-MICCAI2018<Paper>
16.[FSENet] Focus, Segment and Erase: An Efficient Network for Multi-Label Brain Tumor Segmentation-ECCV2018<Paper><Code-PyTorch>
17.[DeepLabv3+] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation-ECCV2018<Paper><Code-Tensorflow>
18.[ExFuse] ExFuse: Enhancing Feature Fusion for Semantic Segmentation-ECCV2018<Paper>
19.[ESPNet] ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation-ECCV2018<Paper><Code-PyTorch>
20.[EncNet] Context Encoding for Semantic Segmentation-CVPR2018<Paper><Code-PyTorch>
21.[PSPNet] Pyramid Scene Parsing Network-CVPR2017<Paper><Code-Caffe>
22.[DANet] Dual Attention Network for Scene Segmentation-CVPR2019<Paper><Code-PyTorch>
23.[BiSeNet] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation-ECCV-2018<Paper><Code-PyTorch>
24.[Fast-SCNN] Fast-SCNN: Fast Semantic Segmentation Network-2019<Paper><Code-PyTorch>
25.[ICNet] ICNet for Real-Time Semantic Segmentation on High-Resolution Images-ECCV2018<Paper><Code-PyTorch><Code-Caffe>
26.[DUNet] Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation-CVPR2019<Paper><Code-PyTorch>
27.[ENet] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation-2016<Paper><Code-Caffe><Code-PyTorch><Code-Tensorflow>
28.[CCNet] CCNet: Criss-Cross Attention for Semantic Segmentation-2018<Paper><Code-PyTorch>
29.[OCNet] OCNet: Object Context Network for Scene Parsing-2018<Paper><Code-PyTorch>
30.[HRNet] High-Resolution Representations for Labeling Pixels and Regions-2019<Paper><Code-PyTorch>

Panoptic Segmentation

1.[Panoptic FPN] Panoptic Feature Pyramid Networks-Arxiv2019<Paper>

Super-Resolution

1.[mDCSRN] Efficient and Accurate MRI Super-Resolution using a Generative Adversarial Network and 3D Multi-Level Densely Connected Network-MICCAI2018<Paper>
2.[RDN] Residual Dense Network for Image Super-Resolution-CVPR2018<Paper><Code-Torch><Code-PyTorch><Code-Tensorflow>

Networks Architecture

1.[DLA] Deep Layer Aggregation-CVPR2018<Paper><Code-PyTorch>
2.[DualSkipNet] Dual Skipping Networks-CVPR2018<Paper>
3.[SkipNet] SkipNet: Learning Dynamic Routing in Convolutional Networks-ECCV2018<Paper><Code-PyTorch>
4.[DRN] Dilated Residual Networks-CVPR2017<Paper><Code-PyTorch>
5.[CapsNet] Dynamic Routing Between Capsules-NIPS2017<Paper><Code-Tensorflow>
6.[BlockQNN] Practical Block-wise Neural Network Architecture Generation-CVPR2018<Paper>
7.[MobileNetV2] MobileNetV2: Inverted Residuals and Linear Bottlenecks-CVPR2018<Paper><Code-Tensorflow>
8.[Non-Local] Non-local Neural Networks-CVPR2018<Paper><Code>

Loss Function

1.[FocalLoss]Focal Loss for Dense Object Detection-ICCV2017<Paper><Code-Caffe2>