Skip to content

Theory, Experiments, Dataset, and Code for our newly proposed Deep Learning Method ActLumos for Action Recognition in Dark or Under-exposed Videos

License

Notifications You must be signed in to change notification settings

HrishavBakulBarua/ActLumos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 

Repository files navigation

We will release the code soon!!

Seeing in the Dark: A Teacher–Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning

My Image

This page contains all the Datasets and Code bases (experiments and evaluations) involved in experimenting and establishing our newly proposed ActLumos framework for video action recognition in the Dark.

The official repository of the paper with supplementary: ActLumos | Visitor Count

About the project

This project is a carried out in Monash University, Malaysia campus.

Project Members -
Sharana Dharshikgan Suresh Dass (Monash University, Malaysia)
Hrishav Bakul Barua (Monash University, Australia and TCS Research, Kolkata, India),
Ganesh Krishnasami (Monash University, Malaysia)
Raveendran Paramesran (University of Malaya, Malaysia)
Raphaël C.-W. Phan (Monash University, Malaysia).

Funding details

This work is supported by the Global Research Excellence Scholarship, Monash University, Malaysia. This research is also supported, in part, by the prestigious Global Excellence and Mobility Scholarship (GEMS), Monash University (Malaysia & Melbourne, Australia).

Overview

Action recognition in dark or low-light videos is challenging due to severe visibility degradation that obscures critical spatiotemporal cues. This paper presents ActLumos, a teacher–student framework that achieves single-stream inference efficiency with multi-stream-level accuracy. The teacher network processes dual inputs—original dark frames and Retinex-enhanced frames—through weight-shared R(2+1)D-34 backbones and dynamically fuses them using a Dynamic Feature Fusion (DFF) module, guided by a supervised contrastive loss (SupCon) to enhance class separability. The student network, using only dark frames, is pretrained with self-supervision on unlabeled clips and fine-tuned via knowledge distillation from the teacher, inheriting its multistream knowledge. ActLumos achieves state-of-the-art results with 96.92% (Top-1) on ARID V1.0, 88.27% on ARID V1.5, and 48.96% on Dark48. Ablation studies confirm the effectiveness of each component, demonstrating superior dark-video recognition without additional inference cost.

Overall Architecture

My Image

Constrastive Learning

Self-supervised vs supervised contrastive learning. Left (self-supervised): the anchor clip (class Pick) has only its own augmented view as a positive ($${\color{pink}pink}$$ edge), all other clips in the batch are treated as negatives ($${\color{green}green}$$). Right (supervised): with labels, every clip from the same class Pick including dark and retinex views of different instances is a positive ($${\color{pink}pink}$$), while clips from other classes are negatives ($${\color{green}green}$$).

My Image

The Dynamic Feature Fusion (DFF) module proposed in our Architecture

My Image

Dataset samples

We use gamma correction and retinex-enhanced images alongside raw dark frames as input to our pipeline. Examples of dark frames (top) and their retinex-enhanced counterparts (middle) and gamma-corrected frames, across actions (pour, pick, walk, stand, drink). Here the dark frames are from ARID dataset.

My Image

Our work utilizes the following:

Basic Neural Network and DL models

ICLR 2015 | VGG-TS - Very Deep Convolutional Networks for Large-Scale Image Recognition | Code

ECCV 2016 | TSN - Temporal Segment Networks: Towards Good Practices for Deep Action Recognition | Code

CVPR 2017 | I3D-RGB, I3D Two-stream - Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | Code

ICCV 2017 | Pseudo-3D-199 - Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks | Code

CVPR 2018 | R(2+1)D - A closer look at spatiotemporal convolutions for action recognition | Code

CVPR 2018 | 3D-ResNet-18, 3D-ResNet-50, 3D-ResNet-101 - Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? | Code

NAACL 2019 | BERT - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Code

State-of-the-art learning models for action/activity recognition in the dark

CVPRW 2021 | DarkLight-ResNeXt-101, DarkLight-R(2+1)D-34 - DarkLight Networks for Action Recognition in the Dark | Code

CVPRW 2021 | MRAN - Delta Sampling R-BERT for Limited Data and Low-Light Action Recognition | Code

IEEE TAI 2022 | R(2+1)D-GCN+BERT - Action Recognition in Dark Videos using Spatio-temporal Features and Bidirectional Encoder Representations from Transformers | Code

AAAI 2023 | SCI + R(2+1)D-GCN - Two-Streams: Dark and Light Networks with Graph Convolution for Action Recognition from Dark Videos | Code

IEEE TIP 2023 | DTCM - DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos | Code

ECCV 2024 | WiiD - Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination | Code

ICASSP 2025 | MFDL - Advancing Dark Action Recognition via Modality Fusion and Dark-to-Light Diffusion Model | Code

Action recognition datasets for dark videos

DL-HAR 2020 | ARID V1.0 & ARID V1.5 | ARID: A New Dataset for Recognizing Action in the Dark | Link1; Link2

ACCV 2024 | ELLAR | An Action Recognition Dataset for Extremely Low-Light Conditions with Dual Gamma Adaptive Modulation | Link

IEEE TIP 2023 | Dark48 | Dark-48: a dark video dataset for action recognition in the dark | Link

Experiments and Results

Table 1: Top-1 and Top-5 accuracy results on ARID V1.0 for several competitive methods and our proposed approach

My Image

Table 2: Top-1 and Top-5 accuracy results on ARID V1.5 for several competitive methods and our proposed approach

My Image

Table 3: Top-1 and Top-5 accuracy results on Dark48 for several competitive methods and our proposed approach

My Image

Effect of unlabeled SSL pretraining source on downstream Top-1 accuracy for ARID V1.0, ARID V1.5, and Dark48. Each group compares SSL on ARID-only, Dark48-only, and ARID+Dark48 (combined). Red dashed lines denote the KD-only (no SSL) baseline for that dataset. Numbers above bars show absolute accuracy, with the improvement over KD-only shown at the bottom (near the x-axis) of the chart. Combined pretraining is best across all datasets, and in-domain SSL consistently outperforms cross- domain SSL

My Image

For more details and experimental results please check out the paper!!

Citation

If you find our work (i.e. the code, the theory/concept, or the dataset) useful for your research or development activities, please consider citing our work as follows:

@article{dass2025md,
  title={Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning},
  author={Dass, Sharana Dharshikgan Suresh and Barua, Hrishav Bakul and Krishnasamy, Ganesh and Paramesran, Raveendran and Phan, Raphael C-W},
  journal={arXiv preprint arXiv:2502.03724},
  year={2025}
}

License and Copyright

----------------------------------------------------------------------------------------
Copyright 2024 | All the authors and contributors of this repository as mentioned above.
----------------------------------------------------------------------------------------

Please check the License Agreement.

About

Theory, Experiments, Dataset, and Code for our newly proposed Deep Learning Method ActLumos for Action Recognition in Dark or Under-exposed Videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published