CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning#

This work introduces CGIL, a simple baseline designed to address the challenge of learning from a stream of data while preserving the zero-shot capabilities of Vision-Language models. Our approach leverages feature-level generative replay to learn class-specific prompts, enabling the adaptation of the textual encoder without compromising zero-shot performance.