I wanted to experiment with combining image generation and vision models to see how images evolve when each description is used as the next prompt for the generator. My goal was to observe whether this feedback loop causes images to diverge in unexpected ways or if they remain focused on a few prominent features. Through this approach, I’m exploring the “memories” embedded in the images and prompts, examining how they preserve information and whether they fixate on specific details or gradually introduce new elements. This dataset, which I’m calling Synthetic-Cyclic-Perception, captures this process, revealing how model perception and creativity unfold over cycles.
Synthetic-Cyclic-Perception is a synthetic visual dataset created by iteratively generating and describing images in a cyclic fashion. Each cycle begins with a seed prompt that feeds into a diffusion model to generate an initial image. A vision model then provides a detailed description of the generated image, which becomes the prompt for the next cycle. This process is repeated across multiple cycles and batches, creating a dataset where images and their descriptions evolve progressively.
Dataset Composition
- Seed Prompts: The dataset generation starts with carefully selected, action-oriented prompts
Sample:- “A person kayaking on a calm river, surrounded by lush greenery, capturing the essence of a peaceful afternoon in nature.”,
- “A chef in a bustling kitchen, expertly preparing a gourmet dish, with vibrant colors of fresh vegetables and aromatic herbs, creating a lively and energetic atmosphere.”,
- “Two children playing in a sunny park, laughing and running around, with colorful kites flying high in the blue sky, evoking a sense of joy and freedom.”,
- “A group of friends gathered around a campfire at night, sharing stories and roasting marshmallows, illuminated by the warm glow of the flames, creating a cozy and adventurous vibe.”,
- “A whimsical cartoon-style illustration of a cat wearing a wizard hat, casting spells with a wand, surrounded by floating magical creatures, depicting a funny and enchanting scene.”
- Cycles: Each batch begins with a unique prompt and progresses through a set number of cycles, updating the image based on the most recent description.
- Metadata: Each image generated includes metadata such as the prompt, the vision model’s description, filename, and batch details, saved in a centralized metadata JSON file for easy reference and analysis.
Methodology
- Image Generation: Using the stabilityai/stable-diffusion-3-medium model, an image is generated based on the current prompt.
- Vision Description: A meta-llama/Llama-3.2-11B-Vision-Instruct describes the image, offering a nuanced textual interpretation.
- Cyclic Prompt Update: The description is parsed and used as the next prompt in the cycle, thus creating an evolving image-description sequence.