We introduce OpenSDID, a large-scale dataset specifically curated for the OpenSDI challenge. Our dataset design addresses the three core requirements essential for open-world spotting of AI-generated content: user diversity, model innovation, and manipulation scope.
OpenSDID comprises 300,000 images, evenly distributed between real and fake samples, divided into training and testing sets. Refer to the CVPR'25 paper for more dataset details.
An OpenSDID pipeline for local modification on real image content: (A) Sampling real images from the Megalith-10M dataset, (B) Generating textual instructions for editing using Vision Language Models (VLMs), (C) Creating visual masks for modification through segmentation models, and (D) Producing AI-generated images with image generators based on the instructions and masks. For global image content generation, OpenSDID merely uses (B) and (D) without using real images to produce masks.
Some samples of OpenSDID.
Method | SD1.5 IoU | SD1.5 F1 | SD2.1 IoU | SD2.1 F1 | SDXL IoU | SDXL F1 | SD3 IoU | SD3 F1 | Flux.1 IoU | Flux.1 F1 | AVG IoU | AVG F1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MVSS-Net [1] | 0.5785 | 0.6533 | 0.4490 | 0.5176 | 0.1467 | 0.1851 | 0.2692 | 0.3271 | 0.0479 | 0.0636 | 0.2983 | 0.3493 |
CAT-Net [2] | 0.6636 | 0.7480 | 0.5458 | 0.6232 | 0.2550 | 0.3074 | 0.3555 | 0.4207 | 0.0497 | 0.0658 | 0.3739 | 0.4330 |
PSCC-Net [3] | 0.5470 | 0.6422 | 0.3667 | 0.4479 | 0.1973 | 0.2605 | 0.2926 | 0.3728 | 0.0816 | 0.1156 | 0.2970 | 0.3678 |
ObjectFormer [4] | 0.5119 | 0.6568 | 0.4739 | 0.4144 | 0.0741 | 0.0984 | 0.0941 | 0.1258 | 0.0529 | 0.0731 | 0.2414 | 0.2737 |
TruFor [5] | 0.6342 | 0.7100 | 0.5467 | 0.6188 | 0.2655 | 0.3185 | 0.3229 | 0.3852 | 0.0760 | 0.0970 | 0.3691 | 0.4259 |
DeCLIP [6] | 0.3718 | 0.4344 | 0.3569 | 0.4187 | 0.1459 | 0.1822 | 0.2734 | 0.3344 | 0.1121 | 0.1429 | 0.2520 | 0.3025 |
IML-ViT [7] | 0.6651 | 0.7362 | 0.4479 | 0.5063 | 0.2149 | 0.2597 | 0.2363 | 0.2835 | 0.0611 | 0.0791 | 0.3251 | 0.3730 |
MaskCLIP [14] | 0.6712 | 0.7563 | 0.5550 | 0.6289 | 0.3098 | 0.3700 | 0.4375 | 0.5121 | 0.1622 | 0.2034 | 0.4271 | 0.4941 |
Method | SD1.5 F1 | SD1.5 Acc | SD2.1 F1 | SD2.1 Acc | SDXL F1 | SDXL Acc | SD3 F1 | SD3 Acc | Flux.1 F1 | Flux.1 Acc | AVG F1 | AVG Acc |
---|---|---|---|---|---|---|---|---|---|---|---|---|
CNNDet [8] | 0.8460 | 0.8504 | 0.7156 | 0.7594 | 0.5970 | 0.6872 | 0.5627 | 0.6708 | 0.3572 | 0.5757 | 0.6157 | 0.7087 |
GramNet [9] | 0.8051 | 0.8035 | 0.7401 | 0.7666 | 0.6528 | 0.7076 | 0.6435 | 0.7029 | 0.5200 | 0.6337 | 0.6723 | 0.7229 |
FreqNet [10] | 0.7588 | 0.7770 | 0.6097 | 0.6837 | 0.5315 | 0.6402 | 0.5350 | 0.6437 | 0.3847 | 0.5708 | 0.5639 | 0.6631 |
NPR [11] | 0.7941 | 0.7928 | 0.8167 | 0.8184 | 0.7212 | 0.7428 | 0.7343 | 0.7547 | 0.6762 | 0.7136 | 0.7485 | 0.7645 |
UniFD [12] | 0.7745 | 0.7760 | 0.8062 | 0.8192 | 0.7074 | 0.7483 | 0.7109 | 0.7517 | 0.6110 | 0.6906 | 0.7220 | 0.7572 |
RINE [13] | 0.9108 | 0.9098 | 0.8747 | 0.8812 | 0.7343 | 0.7876 | 0.7205 | 0.7678 | 0.5586 | 0.6702 | 0.7598 | 0.8033 |
MVSS-Net [1] | 0.9347 | 0.9365 | 0.7927 | 0.8233 | 0.5985 | 0.7042 | 0.6280 | 0.7213 | 0.2759 | 0.5678 | 0.6460 | 0.7506 |
CAT-Net [2] | 0.9615 | 0.9615 | 0.7932 | 0.8246 | 0.6476 | 0.7334 | 0.6526 | 0.7361 | 0.2266 | 0.5526 | 0.6563 | 0.7616 |
PSCC-Net [3] | 0.9607 | 0.9614 | 0.7685 | 0.8094 | 0.5570 | 0.6881 | 0.5978 | 0.7089 | 0.5177 | 0.6704 | 0.6803 | 0.7676 |
ObjectFormer [4] | 0.7172 | 0.7522 | 0.6679 | 0.7255 | 0.4919 | 0.6292 | 0.4832 | 0.6254 | 0.3792 | 0.5805 | 0.5479 | 0.6626 |
TruFor [5] | 0.9012 | 0.9773 | 0.3593 | 0.5562 | 0.5804 | 0.6641 | 0.5973 | 0.6751 | 0.4912 | 0.6162 | 0.5859 | 0.6978 |
DeCLIP [6] | 0.8068 | 0.7831 | 0.8402 | 0.8277 | 0.7069 | 0.7055 | 0.6993 | 0.6840 | 0.5177 | 0.6561 | 0.7142 | 0.7313 |
IML-ViT [7] | 0.9447 | 0.7573 | 0.6970 | 0.6119 | 0.4098 | 0.4995 | 0.4469 | 0.5125 | 0.1820 | 0.4362 | 0.5361 | 0.5635 |
MaskCLIP [14] | 0.9264 | 0.9272 | 0.8871 | 0.8945 | 0.7802 | 0.8122 | 0.7307 | 0.7801 | 0.5649 | 0.6850 | 0.7779 | 0.8198 |
[1] Xinru Chen, Chengbo Dong, Jiaqi Ji, Juan Cao, and Xirong Li. Image manipulation detection by multi-view multi-scale supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[2] Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision (IJCV), 2022.
[3] Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2022.
[4] Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. Objectformer for image manipulation detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[5] Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[6] Stefan Smeu, Elisabeta Oneata, and Dan Oneata. Declip: Decoding clip representations for deepfake localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.
[7] Xiaochen Ma, Bo Du, Xianggen Liu, Ahmed Y Al Hammadi, and Jizhe Zhou. Iml-vit: Image manipulation localization by vision transformer. In Association for the Advancement of Artificial Intelligence (AAAI), 2024.
[8] Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[9] Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[10] Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space learning, In Association for the Advancement of Artificial Intelligence (AAAI), 2024.
[11] Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection, In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[12] Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[13] Christos Koutlis and Symeon Papadopoulos. Leveraging representations from intermediate encoder-blocks for synthetic image detection. In European Conference on Computer Vision (ECCV), 2024.
[14] Yabin Wang, Zhiwu Huang, Xiaopeng Hong. OpenSDI: Spotting Diffusion-Generated Images in the Open World. In Computer Vision and Pattern Recognition (CVPR), 2025.
The following are qualitative result examples of MaskCLIP and other methods on the OpenSDID dataset. It showcases the detection and localization performance on images generated by different diffusion models.
SD1.5 Result Example
SD2 Result Example
SDXL Result Example
SD3 Result Example
Flux.1 Result Example
The OpenSDID dataset and the code for MaskCLIP are open-sourced on GitHub:
If you use the OpenSDID dataset or MaskCLIP model in your research, please cite our paper:
@InProceedings{wang2025opensdi,
author={Wang, Yabin and Huang, Zhiwu and Hong, Xiaopeng},
title={OpenSDI: Spotting Diffusion-Generated Images in the Open World},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}