GLASS is a object-centric representation learning method that performs multiple downstream tasks like object discovery, compositional generation, conditional generation. and property prediction. We show results for all the tasks.
GLASS outperforms existing object-centric learning (OCL) methods on the task of object discovery.
GLASS obtains cleaner boundaries and better object-level segmentation compared to existing OCL methods.
GLASS is the first object-centric model to enable compositional generation (addition and removal of objects) for realistic scenes.
Object Removal: GLASS is able to remove the highlted object (red) from the scene while preserving the rest of the scene.
Object Addition: GLASS is able to add the highlighted object (red) to a new scene while preserving the rest of the scene.
GLASS outperforms StableLSD on the task of conditional generation. Producing much higher quality images than StableLSD.
GLASS outperforms StableLSD on the task of object-level property prediction. Note: StableLSD is the closest model in terms of downstream task capabilities.
@inproceedings{singh2025glass,
author = {Krishnakant Singh and Simone Scahub-Meyer and Stefan Roth},
title = {GLASS: Guided Latent Slot Diffusion for Object-Centric Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
}