Quickstart
This quickstart guide shows you how to create your first bidirectional streaming agent for real-time audio and text conversations. You’ll learn how to set up audio I/O, handle streaming events, use tools during conversations, and work with different model providers.
After completing this guide, you can build voice assistants, interactive chatbots, multi-modal applications, and integrate bidirectional streaming with web servers or custom I/O channels.
Prerequisites
Section titled “Prerequisites”Before starting, ensure you have:
- Python 3.10+ installed (3.12+ required for Nova Sonic)
- Audio hardware (microphone and speakers) for voice conversations
- Model provider credentials configured (AWS, OpenAI, or Google)
Install the SDK
Section titled “Install the SDK”Bidirectional streaming is included in the Strands Agents SDK as an experimental feature. Install the SDK with bidirectional streaming support:
For All Providers
Section titled “For All Providers”To install with support for all bidirectional streaming providers:
pip install "strands-agents[bidi-all]"This will install PyAudio for audio I/O and all 3 supported providers (Nova Sonic, OpenAI, and Gemini Live).
For Specific Providers
Section titled “For Specific Providers”You can also install support for specific providers only:
pip install "strands-agents[bidi]"pip install "strands-agents[bidi,bidi-openai]"pip install "strands-agents[bidi,bidi-gemini]"Platform-Specific Audio Setup
Section titled “Platform-Specific Audio Setup”brew install portaudiopip install "strands-agents[bidi-all]"sudo apt-get install portaudio19-dev python3-pyaudiopip install "strands-agents[bidi-all]"PyAudio typically installs without additional dependencies.
pip install "strands-agents[bidi-all]"Configuring Credentials
Section titled “Configuring Credentials”Bidirectional streaming supports multiple model providers. Choose one based on your needs:
Nova Sonic is Amazon’s bidirectional streaming model. Configure AWS credentials:
export AWS_ACCESS_KEY_ID=your_access_keyexport AWS_SECRET_ACCESS_KEY=your_secret_keyexport AWS_DEFAULT_REGION=us-east-1Enable Nova Sonic model access in the Amazon Bedrock console.
For OpenAI’s Realtime API, set your API key:
export OPENAI_API_KEY=your_api_keyFor Gemini Live API, set your API key:
export GOOGLE_API_KEY=your_api_keyYour First Voice Conversation
Section titled “Your First Voice Conversation”Now let’s create a simple voice-enabled agent that can have real-time conversations:
import asynciofrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModel
# Create a bidirectional streaming modelmodel = BidiNovaSonicModel()
# Create the agentagent = BidiAgent( model=model, system_prompt="You are a helpful voice assistant. Keep responses concise and natural.")
# Setup audio I/O for microphone and speakersaudio_io = BidiAudioIO()
# Run the conversationasync def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] )
asyncio.run(main())And that’s it! We now have a voice-enabled agent that can:
- Listen to your voice through the microphone
- Process speech in real-time
- Respond with natural voice output
- Handle interruptions when you start speaking
Adding Text I/O
Section titled “Adding Text I/O”Combine audio with text input/output for debugging or multi-modal interactions:
import asynciofrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.io import BidiTextIOfrom strands.experimental.bidi.models import BidiNovaSonicModel
model = BidiNovaSonicModel()agent = BidiAgent( model=model, system_prompt="You are a helpful assistant.")
# Setup both audio and text I/Oaudio_io = BidiAudioIO()text_io = BidiTextIO()
async def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()] # Both audio and text )
asyncio.run(main())Now you’ll see transcripts printed to the console while audio plays through your speakers.
Controlling Conversation Lifecycle
Section titled “Controlling Conversation Lifecycle”The run() method runs indefinitely by default. The simplest way to stop conversations is using Ctrl+C:
import asynciofrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModel
async def main(): model = BidiNovaSonicModel() agent = BidiAgent(model=model) audio_io = BidiAudioIO()
try: # Runs indefinitely until interrupted await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] ) except asyncio.CancelledError: print("\nConversation cancelled by user") finally: # stop() should only be called after run() exits await agent.stop()
asyncio.run(main())Adding Tools to Your Agent
Section titled “Adding Tools to Your Agent”Just like standard Strands agents, bidirectional agents can use tools during conversations:
import asynciofrom strands import toolfrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModelfrom strands_tools import calculator, current_time
# Define a custom tool@tooldef get_weather(location: str) -> str: """ Get the current weather for a location.
Args: location: City name or location
Returns: Weather information """ # In a real application, call a weather API return f"The weather in {location} is sunny and 72°F"
# Create agent with toolsmodel = BidiNovaSonicModel()agent = BidiAgent( model=model, tools=[calculator, current_time, get_weather], system_prompt="You are a helpful assistant with access to tools.")
audio_io = BidiAudioIO()
async def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] )
asyncio.run(main())You can now ask questions like:
- “What time is it?”
- “Calculate 25 times 48”
- “What’s the weather in San Francisco?”
The agent automatically determines when to use tools and executes them concurrently without blocking the conversation.
Model Providers
Section titled “Model Providers”Strands supports three bidirectional streaming providers:
- Nova Sonic - Amazon’s bidirectional streaming model via AWS Bedrock
- OpenAI Realtime - OpenAI’s Realtime API for voice conversations
- Gemini Live - Google’s multimodal streaming API
Each provider has different features, timeout limits, and audio quality. See the individual provider documentation for detailed configuration options.
Configuring Audio Settings
Section titled “Configuring Audio Settings”Customize audio configuration for both the model and I/O:
import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models.gemini_live import BidiGeminiLiveModel
# Configure model audio settingsmodel = BidiGeminiLiveModel( provider_config={ "audio": { "input_rate": 48000, # Higher quality input "output_rate": 24000, # Standard output "voice": "Puck" } })
# Configure I/O buffer settingsaudio_io = BidiAudioIO( input_buffer_size=10, # Max input queue size output_buffer_size=20, # Max output queue size input_frames_per_buffer=512, # Input chunk size output_frames_per_buffer=512 # Output chunk size)
agent = BidiAgent(model=model)
async def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] )
asyncio.run(main())The I/O automatically configures hardware to match the model’s audio requirements.
Handling Interruptions
Section titled “Handling Interruptions”Bidirectional agents automatically handle interruptions when users start speaking:
import asynciofrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModelfrom strands.experimental.bidi.types.events import BidiInterruptionEvent
model = BidiNovaSonicModel()agent = BidiAgent(model=model)audio_io = BidiAudioIO()
async def main(): await agent.start()
# Start receiving events async for event in agent.receive(): if isinstance(event, BidiInterruptionEvent): print(f"User interrupted: {event.reason}") # Audio output automatically cleared # Model stops generating # Ready for new input
asyncio.run(main())Interruptions are detected via voice activity detection (VAD) and handled automatically:
- User starts speaking
- Model stops generating
- Audio output buffer cleared
- Model ready for new input
Manual Start and Stop
Section titled “Manual Start and Stop”If you need more control over the agent lifecycle, you can manually call start() and stop():
import asynciofrom strands.experimental.bidi import BidiAgentfrom strands.experimental.bidi.models import BidiNovaSonicModelfrom strands.experimental.bidi.types.events import BidiResponseCompleteEvent
async def main(): model = BidiNovaSonicModel() agent = BidiAgent(model=model)
# Manually start the agent await agent.start()
try: await agent.send("What is Python?")
async for event in agent.receive(): if isinstance(event, BidiResponseCompleteEvent): break finally: # Always stop after exiting receive loop await agent.stop()
asyncio.run(main())See Controlling Conversation Lifecycle for more patterns and best practices.
Graceful Shutdown
Section titled “Graceful Shutdown”Use the experimental stop_conversation tool to allow users to end conversations naturally:
import asynciofrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModelfrom strands.experimental.bidi.tools import stop_conversation
model = BidiNovaSonicModel()agent = BidiAgent( model=model, tools=[stop_conversation], system_prompt="You are a helpful assistant. When the user says 'stop conversation', use the stop_conversation tool.")
audio_io = BidiAudioIO()
async def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] ) # Conversation ends when user says "stop conversation"
asyncio.run(main())The agent will gracefully close the connection when the user explicitly requests it.
Debug Logs
Section titled “Debug Logs”To enable debug logs in your agent, configure the strands logger:
import asyncioimport loggingfrom strands.experimental.bidi import BidiAgent, BidiAudioIOfrom strands.experimental.bidi.models import BidiNovaSonicModel
# Enable debug logslogging.getLogger("strands").setLevel(logging.DEBUG)logging.basicConfig( format="%(levelname)s | %(name)s | %(message)s", handlers=[logging.StreamHandler()])
model = BidiNovaSonicModel()agent = BidiAgent(model=model)audio_io = BidiAudioIO()
async def main(): await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output()] )
asyncio.run(main())Debug logs show:
- Connection lifecycle events
- Audio buffer operations
- Tool execution details
- Event processing flow
Common Issues
Section titled “Common Issues”Audio Feedback Loop in a Python Console
Section titled “Audio Feedback Loop in a Python Console”BidiAudioIO uses PyAudio, which does not support echo cancellation. A headset is required to prevent audio feedback loops.
No Audio Output
Section titled “No Audio Output”If you don’t hear audio:
# List available audio devicesimport pyaudiop = pyaudio.PyAudio()for i in range(p.get_device_count()): info = p.get_device_info_by_index(i) print(f"{i}: {info['name']}")
# Specify output device explicitlyaudio_io = BidiAudioIO(output_device_index=2)Microphone Not Working
Section titled “Microphone Not Working”If the agent doesn’t respond to speech:
# Specify input device explicitlyaudio_io = BidiAudioIO(input_device_index=1)
# Check system permissions (macOS)# System Preferences → Security & Privacy → MicrophoneConnection Timeouts
Section titled “Connection Timeouts”If you experience frequent disconnections:
# Use OpenAI for longer timeout (60 min vs Nova's 8 min)from strands.experimental.bidi.models import BidiOpenAIRealtimeModelmodel = BidiOpenAIRealtimeModel()
# Or handle restarts gracefullyasync for event in agent.receive(): if isinstance(event, BidiConnectionRestartEvent): print("Reconnecting...") continueNext Steps
Section titled “Next Steps”Ready to learn more? Check out these resources:
- Agent - Deep dive into BidiAgent configuration and lifecycle
- Events - Complete guide to bidirectional streaming events
- I/O Channels - Understanding and customizing input/output channels
- Model Providers:
- Nova Sonic - Amazon Bedrock’s bidirectional streaming model
- OpenAI Realtime - OpenAI’s Realtime API
- Gemini Live - Google’s Gemini Live API
- API Reference - Complete API documentation