OpenSDI: Spotting Diffusion-Generated Images in the Open World

OpenSDID Dataset

We introduce OpenSDID, a large-scale dataset specifically curated for the OpenSDI challenge. Our dataset design addresses the three core requirements essential for open-world spotting of AI-generated content: user diversity, model innovation, and manipulation scope.

OpenSDID comprises 300,000 images, evenly distributed between real and fake samples, divided into training and testing sets. Refer to the paper for more dataset details.

OpenSDID Dataset Pipeline

How we create OpenSDID Dataset.

Leaderboard

Pixel-level Localization Performance

Method SD1.5 IoU SD1.5 F1 SD2.1 IoU SD2.1 F1 SDXL IoU SDXL F1 SD3 IoU SD3 F1 Flux.1 IoU Flux.1 F1 AVG IoU AVG F1
MVSS-Net 0.5785 0.6533 0.4490 0.5176 0.1467 0.1851 0.2692 0.3271 0.0479 0.0636 0.2983 0.3493
CAT-Net 0.6636 0.7480 0.5458 0.6232 0.2550 0.3074 0.3555 0.4207 0.0497 0.0658 0.3739 0.4330
PSCC-Net 0.5470 0.6422 0.3667 0.4479 0.1973 0.2605 0.2926 0.3728 0.0816 0.1156 0.2970 0.3678
ObjectFormer 0.5119 0.6568 0.4739 0.4144 0.0741 0.0984 0.0941 0.1258 0.0529 0.0731 0.2414 0.2737
TruFor 0.6342 0.7100 0.5467 0.6188 0.2655 0.3185 0.3229 0.3852 0.0760 0.0970 0.3691 0.4259
DeCLIP 0.3718 0.4344 0.3569 0.4187 0.1459 0.1822 0.2734 0.3344 0.1121 0.1429 0.2520 0.3025
IML-ViT 0.6651 0.7362 0.4479 0.5063 0.2149 0.2597 0.2363 0.2835 0.0611 0.0791 0.3251 0.3730
MaskCLIP 0.6712 0.7563 0.5550 0.6289 0.3098 0.3700 0.4375 0.5121 0.1622 0.2034 0.4271 0.4941

Image-level Detection Performance

Method SD1.5 F1 SD1.5 Acc SD2.1 F1 SD2.1 Acc SDXL F1 SDXL Acc SD3 F1 SD3 Acc Flux.1 F1 Flux.1 Acc AVG F1 AVG Acc
CNNDet 0.8460 0.8504 0.7156 0.7594 0.5970 0.6872 0.5627 0.6708 0.3572 0.5757 0.6157 0.7087
GramNet 0.8051 0.8035 0.7401 0.7666 0.6528 0.7076 0.6435 0.7029 0.5200 0.6337 0.6723 0.7229
FreqNet 0.7588 0.7770 0.6097 0.6837 0.5315 0.6402 0.5350 0.6437 0.3847 0.5708 0.5639 0.6631
NPR 0.7941 0.7928 0.8167 0.8184 0.7212 0.7428 0.7343 0.7547 0.6762 0.7136 0.7485 0.7645
UniFD 0.7745 0.7760 0.8062 0.8192 0.7074 0.7483 0.7109 0.7517 0.6110 0.6906 0.7220 0.7572
RINE 0.9108 0.9098 0.8747 0.8812 0.7343 0.7876 0.7205 0.7678 0.5586 0.6702 0.7598 0.8033
MVSS-Net 0.9347 0.9365 0.7927 0.8233 0.5985 0.7042 0.6280 0.7213 0.2759 0.5678 0.6460 0.7506
CAT-Net 0.9615 0.9615 0.7932 0.8246 0.6476 0.7334 0.6526 0.7361 0.2266 0.5526 0.6563 0.7616
PSCC-Net 0.9607 0.9614 0.7685 0.8094 0.5570 0.6881 0.5978 0.7089 0.5177 0.6704 0.6803 0.7676
ObjectFormer 0.7172 0.7522 0.6679 0.7255 0.4919 0.6292 0.4832 0.6254 0.3792 0.5805 0.5479 0.6626
TruFor 0.9012 0.9773 0.3593 0.5562 0.5804 0.6641 0.5973 0.6751 0.4912 0.6162 0.5859 0.6978
DeCLIP 0.8068 0.7831 0.8402 0.8277 0.7069 0.7055 0.6993 0.6840 0.5177 0.6561 0.7142 0.7313
IML-ViT 0.9447 0.7573 0.6970 0.6119 0.4098 0.4995 0.4469 0.5125 0.1820 0.4362 0.5361 0.5635
MaskCLIP 0.9264 0.9272 0.8871 0.8945 0.7802 0.8122 0.7307 0.7801 0.5649 0.6850 0.7779 0.8198

Results Showcase

The following are qualitative result examples of MaskCLIP and other methods on the OpenSDID dataset. It showcases the detection and localization performance on images generated by different diffusion models.

SD1.5 Result Example

SD1.5 Result Example

SDXL Result Example

SD2 Result Example

SDXL Result Example

SDXL Result Example

SDXL Result Example

SD3 Result Example

Flux.1 Result Example

Flux.1 Result Example

Download & Code

The OpenSDID dataset and the code for MaskCLIP are open-sourced on GitHub:

https://github.com/iamwangyabin/OpenSDI

Citation

If you use the OpenSDID dataset or MaskCLIP model in your research, please cite our paper:


@article{wang2024opensdi,
  title={OpenSDI: Spotting Diffusion-Generated Images in the Open World},
  author={Wang, Yabin and Huang, Zhiwu and Hong, Xiaopeng},
  journal={arXiv preprint arXiv},
  year={2024}
}