Multi-modal - Strands Agents for Image Generation and Evaluation
This example demonstrates how to create a multi-agent system for generating and evaluating images. It shows how Strands agents can work with multimodal content through a workflow between specialized agents.
Overview
Section titled âOverviewâ| Feature | Description |
|---|---|
| Tools Used | generate_image, image_reader |
| Complexity | Intermediate |
| Agent Type | Multi-Agent System (2 Agents) |
| Interaction | Command Line Interface |
| Key Focus | Multimodal Content Processing |
Tool Overview
Section titled âTool OverviewâThe multimodal example utilizes two tools to work with image content.
- The
generate_imagetool enables the creation of images based on text prompts, allowing the agent to generate visual content from textual descriptions. - The
image_readertool provides the capability to analyze and interpret image content, enabling the agent to âseeâ and describe whatâs in the images.
Together, these tools create a complete pipeline for both generating and evaluating visual content through natural language interactions.
Code Structure and Implementation
Section titled âCode Structure and ImplementationâAgent Initialization
Section titled âAgent InitializationâThe example creates two specialized agents, each with a specific role in the image generation and evaluation process.
from strands import Agent, toolfrom strands_tools import generate_image, image_reader
# Artist agent that generates images based on promptsartist = Agent(tools=[generate_image],system_prompt=( "You will be instructed to generate a number of images of a given subject. Vary the prompt for each generated image to create a variety of options." "Your final output must contain ONLY a comma-separated list of the filesystem paths of generated images."))
# Critic agent that evaluates and selects the best imagecritic = Agent(tools=[image_reader],system_prompt=( "You will be provided with a list of filesystem paths, each containing an image." "Describe each image, and then choose which one is best." "Your final line of output must be as follows:" "FINAL DECISION: <path to final decision image>"))Using the Multimodal Agents
Section titled âUsing the Multimodal AgentsâThe example demonstrates a simple workflow where the agents collaborate to generate and select images:
# Generate multiple images using the artist agentresult = artist("Generate 3 images of a dog")
# Pass the image paths to the critic agent for evaluationcritic(str(result))This workflow shows how agents can be chained together, with the output of one agent becoming the input for another, creating a pipeline for multimodal content processing.
Key Features and Capabilities
Section titled âKey Features and Capabilitiesâ1. Image Generation
Section titled â1. Image GenerationâThe artist agent can generate multiple variations of images based on a text prompt:
-
Basic Generation:
Generate 3 images of a dog -
Styled Generation:
Generate 2 images of a mountain landscape in watercolor style -
Conceptual Generation:
Generate 4 images representing the concept of freedom
2. Image Evaluation
Section titled â2. Image EvaluationâThe critic agent can analyze images and provide detailed descriptions and evaluations of the images, ultimately outputting a final choice.
Example Command and Results
Section titled âExample Command and ResultsâCommand:
Generate 3 images of a whaleResponse:
I'll generate 3 different images of whales for you with varied prompts.Tool #1: generate_image
Tool #2: generate_image
Tool #3: generate_imageoutput/a_majestic_blue_whale_swimming.png, output/humpback_whale_breaching_from_the.png, output/orca_whale_pod_swimming_together.png
I'll describe each of these whale images and determine which is best. Let me examine them one by one.Tool #1: image_readerThis first image shows what appears to be a blue whale at the surface of the ocean [...]Tool #2: image_readerThe second image showcases a humpback whale in a spectacular breaching display during what appears to be sunset or sunrise [...]Tool #3: image_readerThe third image shows a pod of orca whales (also known as killer whales) swimming together [...]
Comparison and final decision:
All three images are remarkable whale photographs, each with unique strengths:
1. The blue whale image offers an intimate, close-up perspective with beautiful atmospheric lighting and a serene mood.
2. The humpback whale breaching image captures a dynamic, powerful moment with spectacular sunset colors creating an extraordinary visual impact.
3. The orca pod image showcases multiple whales in their social grouping against a dramatic arctic backdrop, emphasizing their habitat and community.
While each image is impressive, the humpback whale breaching at sunset stands out for its perfect combination of action, timing, lighting, and composition. The contrast between the dark whale and the golden sky, the dynamic motion captured at precisely the right moment, and the breathtaking sunset setting make this image particularly remarkable.
FINAL DECISION: output/humpback_whale_breaching_from_the.pngDuring its execution, the artist agent used the following prompts (which can be seen in traces or logs) to generate each image:
âA majestic blue whale swimming in deep ocean waters, sunlight filtering through the surface, photorealisticâ

âHumpback whale breaching from the water, dramatic splash, against sunset sky, wildlife photographyâ

âOrca whale pod swimming together in arctic waters, aerial view, detailed, pristine environmentâ

And the critic agent selected the humpback whale as the best image:

Extending the Example
Section titled âExtending the ExampleâHere are some ways you could extend this example:
- Workflows: This example features a very simple workflow, you could use Strands Workflow capabilities for more elaborate media production pipelines.
- Image Editing: Extend the
generate_imagetool to accept and modify input images. - User Feedback Loop: Allow users to provide feedback on the selection to improve future generations
- Integration with Other Media: Extend the system to work with other media types, such as video with Amazon Nova models.