Seeing in the Dark: A Teacher–Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning
This page contains all the Datasets and Code bases (experiments and evaluations) involved in experimenting and establishing our newly proposed ActLumos framework for video action recognition in the Dark.
The official repository of the paper with supplementary: ActLumos |
This project is a carried out in Monash University, Malaysia campus.
Project Members -
Sharana Dharshikgan Suresh Dass (Monash University, Malaysia)
Hrishav Bakul Barua (Monash University, Australia and TCS Research, Kolkata, India),
Ganesh Krishnasami (Monash University, Malaysia)
Raveendran Paramesran (University of Malaya, Malaysia)
Raphaël C.-W. Phan (Monash University, Malaysia).
This work is supported by the Global Research Excellence Scholarship
, Monash University, Malaysia. This research is also supported, in part, by the prestigious Global Excellence and Mobility Scholarship (GEMS)
, Monash University (Malaysia & Melbourne, Australia).
Action recognition in dark or low-light videos is challenging due to severe visibility degradation that obscures critical spatiotemporal cues. This paper presents ActLumos, a teacher–student framework that achieves single-stream inference efficiency with multi-stream-level accuracy. The teacher network processes dual inputs—original dark frames and Retinex-enhanced frames—through weight-shared R(2+1)D-34 backbones and dynamically fuses them using a Dynamic Feature Fusion (DFF) module, guided by a supervised contrastive loss (SupCon) to enhance class separability. The student network, using only dark frames, is pretrained with self-supervision on unlabeled clips and fine-tuned via knowledge distillation from the teacher, inheriting its multistream knowledge. ActLumos achieves state-of-the-art results with 96.92% (Top-1) on ARID V1.0, 88.27% on ARID V1.5, and 48.96% on Dark48. Ablation studies confirm the effectiveness of each component, demonstrating superior dark-video recognition without additional inference cost.
Self-supervised vs supervised contrastive learning. Left (self-supervised): the
anchor clip (class Pick) has only its own augmented view as a positive (
We use gamma correction and retinex-enhanced images alongside raw dark frames as input to our pipeline. Examples of dark frames (top) and their retinex-enhanced counterparts (middle) and gamma-corrected frames, across actions (pour, pick, walk, stand, drink). Here the dark frames are from ARID dataset.
ICLR 2015
| VGG-TS
- Very Deep Convolutional Networks for Large-Scale Image Recognition | Code
ECCV 2016
| TSN
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition | Code
CVPR 2017
| I3D-RGB
, I3D Two-stream
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | Code
ICCV 2017
| Pseudo-3D-199
- Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks | Code
CVPR 2018
| R(2+1)D
- A closer look at spatiotemporal convolutions for action recognition | Code
CVPR 2018
| 3D-ResNet-18
, 3D-ResNet-50
, 3D-ResNet-101
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? | Code
NAACL 2019
| BERT
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Code
CVPRW 2021
| DarkLight-ResNeXt-101
, DarkLight-R(2+1)D-34
- DarkLight Networks for Action Recognition in the Dark | Code
CVPRW 2021
| MRAN
- Delta Sampling R-BERT for Limited Data and Low-Light Action Recognition | Code
IEEE TAI 2022
| R(2+1)D-GCN+BERT
- Action Recognition in Dark Videos using Spatio-temporal Features and Bidirectional Encoder Representations from Transformers | Code
AAAI 2023
| SCI + R(2+1)D-GCN
- Two-Streams: Dark and Light Networks with Graph Convolution for Action Recognition from Dark Videos | Code
IEEE TIP 2023
| DTCM
- DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos | Code
ECCV 2024
| WiiD
- Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination | Code
ICASSP 2025
| MFDL
- Advancing Dark Action Recognition via Modality Fusion and Dark-to-Light Diffusion Model | Code
DL-HAR 2020
| ARID V1.0
& ARID V1.5
| ARID: A New Dataset for Recognizing Action in the Dark | Link1; Link2
ACCV 2024
| ELLAR
| An Action Recognition Dataset for Extremely Low-Light Conditions with Dual Gamma Adaptive Modulation | Link
IEEE TIP 2023
| Dark48
| Dark-48: a dark video dataset for action recognition in the dark | Link
Table 1: Top-1 and Top-5 accuracy results on ARID V1.0 for several competitive methods and our proposed approach
Table 2: Top-1 and Top-5 accuracy results on ARID V1.5 for several competitive methods and our proposed approach
Table 3: Top-1 and Top-5 accuracy results on Dark48 for several competitive methods and our proposed approach
Effect of unlabeled SSL pretraining source on downstream Top-1 accuracy for ARID V1.0, ARID V1.5, and Dark48. Each group compares SSL on ARID-only, Dark48-only, and ARID+Dark48 (combined). Red dashed lines denote the KD-only (no SSL) baseline for that dataset. Numbers above bars show absolute accuracy, with the improvement over KD-only shown at the bottom (near the x-axis) of the chart. Combined pretraining is best across all datasets, and in-domain SSL consistently outperforms cross- domain SSL
For more details and experimental results please check out the paper!!
If you find our work (i.e. the code, the theory/concept, or the dataset) useful for your research or development activities, please consider citing our work as follows:
@article{dass2025md,
title={Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning},
author={Dass, Sharana Dharshikgan Suresh and Barua, Hrishav Bakul and Krishnasamy, Ganesh and Paramesran, Raveendran and Phan, Raphael C-W},
journal={arXiv preprint arXiv:2502.03724},
year={2025}
}
----------------------------------------------------------------------------------------
Copyright 2024 | All the authors and contributors of this repository as mentioned above.
----------------------------------------------------------------------------------------
Please check the License Agreement.