SimROD: A Simple Baseline for Raw Object Detection with Global and Local Enhancements

1. Wuhan University   2. Intellindust AI Lab
*Corresponding Author
ArXiv 2025

The overview of our proposed SimROD. SimROD takes a packed RAW image as input and first learns a global gamma transformation through the Global Gamma Enhancement (GGE) module. The transformed data is then processed by Green-Guided Local Enhancement (GGLE) to enhance local details. SimROD outperformed the state-of-the-art method with just 1% of the parameters.

Abstract

Most visual models are designed for sRGB images, yet RAW data offers significant advantages for object detection by preserving sensor information before ISP processing. This enables improved detection accuracy and more efficient hardware designs by bypassing the ISP. However, RAW object detection is challenging due to limited training data, unbalanced pixel distributions, and sensor noise. To address this, we propose SimROD, a lightweight and effective approach for RAW object detection. We introduce a Global Gamma Enhancement (GGE) module, which applies a learnable global gamma transformation with only four parameters, improving feature representation while keeping the model efficient. Additionally, we leverage the green channel's richer signal to enhance local details, aligning with the human eye’s sensitivity and Bayer filter design. Extensive experiments on multiple RAW object detection datasets and detectors demonstrate that SimROD outperforms state-of-the-art methods like RAW-Adapter and DIAP while maintaining efficiency. Our work highlights the potential of RAW data for real-world object detection.

Motivation

Key Insights in SimROD. The green channel in RAW data carries more detailed information. The percentages indicate the proportion of color pixels with the highest intensity in the RGB channels—higher values mean richer details and lower noise in challenging lighting conditions.


Left: We evaluate RAW object detection on the LOD dataset using individual color channels—green (G), red (R), and blue (B)—with the state-of-the-art DIAP method. The results highlight the superior performance of G. Right: G has a significantly higher SNR than R and B, suggesting it may be more resistant to noise in extreme lighting conditions, potentially improving robustness.

Experimental Results

Results with YoloX-Tiny, following DIAP's benchmark. Performance metrics (AP and AP50) for YoloX-Tiny across different methods. The best performance for each dataset is highlighted in bold. The table also includes the number of parameters (in millions). † indicates results reproduced using the official code. N/A means the model did not converge.


Results with RetinaNet-R50, following RAW-Adapter's benchmark. Performance metrics (AP and AP50) for RetinaNet-R50 across different methods. The best performance for each dataset is highlighted in bold. The table also includes the number of parameters (in millions). † indicates results reproduced using the official code. N/A means the model did not converge.

BibTeX

@misc{xie2025simr,
        title={SimROD: A Simple Baseline for Raw Object Detection with Global and Local Enhancements},
        author={Haiyang Xie, Xi Shen , Shihua Huang, Qirui Wang and Zheng Wang},
        archivePrefix={arXiv},
        year={2025},
  }