Skip to content

Quickstart

This quickstart guide shows you how to create your first bidirectional streaming agent for real-time audio and text conversations. You’ll learn how to set up audio I/O, handle streaming events, use tools during conversations, and work with different model providers.

After completing this guide, you can build voice assistants, interactive chatbots, multi-modal applications, and integrate bidirectional streaming with web servers or custom I/O channels.

Before starting, ensure you have:

  • Python 3.10+ installed (3.12+ required for Nova Sonic)
  • Audio hardware (microphone and speakers) for voice conversations
  • Model provider credentials configured (AWS, OpenAI, or Google)

Bidirectional streaming is included in the Strands Agents SDK as an experimental feature. Install the SDK with bidirectional streaming support:

To install with support for all bidirectional streaming providers:

Terminal window
pip install "strands-agents[bidi-all]"

This will install PyAudio for audio I/O and all 3 supported providers (Nova Sonic, OpenAI, and Gemini Live).

You can also install support for specific providers only:

Terminal window
pip install "strands-agents[bidi]"
Terminal window
brew install portaudio
pip install "strands-agents[bidi-all]"

Bidirectional streaming supports multiple model providers. Choose one based on your needs:

Nova Sonic is Amazon’s bidirectional streaming model. Configure AWS credentials:

Terminal window
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1

Enable Nova Sonic model access in the Amazon Bedrock console.

Now let’s create a simple voice-enabled agent that can have real-time conversations:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
# Create a bidirectional streaming model
model = BidiNovaSonicModel()
# Create the agent
agent = BidiAgent(
model=model,
system_prompt="You are a helpful voice assistant. Keep responses concise and natural."
)
# Setup audio I/O for microphone and speakers
audio_io = BidiAudioIO()
# Run the conversation
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
asyncio.run(main())

And that’s it! We now have a voice-enabled agent that can:

  • Listen to your voice through the microphone
  • Process speech in real-time
  • Respond with natural voice output
  • Handle interruptions when you start speaking

Combine audio with text input/output for debugging or multi-modal interactions:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.io import BidiTextIO
from strands.experimental.bidi.models import BidiNovaSonicModel
model = BidiNovaSonicModel()
agent = BidiAgent(
model=model,
system_prompt="You are a helpful assistant."
)
# Setup both audio and text I/O
audio_io = BidiAudioIO()
text_io = BidiTextIO()
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output(), text_io.output()] # Both audio and text
)
asyncio.run(main())

Now you’ll see transcripts printed to the console while audio plays through your speakers.

The run() method runs indefinitely by default. The simplest way to stop conversations is using Ctrl+C:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
async def main():
model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
audio_io = BidiAudioIO()
try:
# Runs indefinitely until interrupted
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
except asyncio.CancelledError:
print("\nConversation cancelled by user")
finally:
# stop() should only be called after run() exits
await agent.stop()
asyncio.run(main())

Just like standard Strands agents, bidirectional agents can use tools during conversations:

import asyncio
from strands import tool
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands_tools import calculator, current_time
# Define a custom tool
@tool
def get_weather(location: str) -> str:
"""
Get the current weather for a location.
Args:
location: City name or location
Returns:
Weather information
"""
# In a real application, call a weather API
return f"The weather in {location} is sunny and 72°F"
# Create agent with tools
model = BidiNovaSonicModel()
agent = BidiAgent(
model=model,
tools=[calculator, current_time, get_weather],
system_prompt="You are a helpful assistant with access to tools."
)
audio_io = BidiAudioIO()
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
asyncio.run(main())

You can now ask questions like:

  • “What time is it?”
  • “Calculate 25 times 48”
  • “What’s the weather in San Francisco?”

The agent automatically determines when to use tools and executes them concurrently without blocking the conversation.

Strands supports three bidirectional streaming providers:

  • Nova Sonic - Amazon’s bidirectional streaming model via AWS Bedrock
  • OpenAI Realtime - OpenAI’s Realtime API for voice conversations
  • Gemini Live - Google’s multimodal streaming API

Each provider has different features, timeout limits, and audio quality. See the individual provider documentation for detailed configuration options.

Customize audio configuration for both the model and I/O:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models.gemini_live import BidiGeminiLiveModel
# Configure model audio settings
model = BidiGeminiLiveModel(
provider_config={
"audio": {
"input_rate": 48000, # Higher quality input
"output_rate": 24000, # Standard output
"voice": "Puck"
}
}
)
# Configure I/O buffer settings
audio_io = BidiAudioIO(
input_buffer_size=10, # Max input queue size
output_buffer_size=20, # Max output queue size
input_frames_per_buffer=512, # Input chunk size
output_frames_per_buffer=512 # Output chunk size
)
agent = BidiAgent(model=model)
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
asyncio.run(main())

The I/O automatically configures hardware to match the model’s audio requirements.

Bidirectional agents automatically handle interruptions when users start speaking:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.types.events import BidiInterruptionEvent
model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
audio_io = BidiAudioIO()
async def main():
await agent.start()
# Start receiving events
async for event in agent.receive():
if isinstance(event, BidiInterruptionEvent):
print(f"User interrupted: {event.reason}")
# Audio output automatically cleared
# Model stops generating
# Ready for new input
asyncio.run(main())

Interruptions are detected via voice activity detection (VAD) and handled automatically:

  1. User starts speaking
  2. Model stops generating
  3. Audio output buffer cleared
  4. Model ready for new input

If you need more control over the agent lifecycle, you can manually call start() and stop():

import asyncio
from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.types.events import BidiResponseCompleteEvent
async def main():
model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
# Manually start the agent
await agent.start()
try:
await agent.send("What is Python?")
async for event in agent.receive():
if isinstance(event, BidiResponseCompleteEvent):
break
finally:
# Always stop after exiting receive loop
await agent.stop()
asyncio.run(main())

See Controlling Conversation Lifecycle for more patterns and best practices.

Use the experimental stop_conversation tool to allow users to end conversations naturally:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.tools import stop_conversation
model = BidiNovaSonicModel()
agent = BidiAgent(
model=model,
tools=[stop_conversation],
system_prompt="You are a helpful assistant. When the user says 'stop conversation', use the stop_conversation tool."
)
audio_io = BidiAudioIO()
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
# Conversation ends when user says "stop conversation"
asyncio.run(main())

The agent will gracefully close the connection when the user explicitly requests it.

To enable debug logs in your agent, configure the strands logger:

import asyncio
import logging
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
# Enable debug logs
logging.getLogger("strands").setLevel(logging.DEBUG)
logging.basicConfig(
format="%(levelname)s | %(name)s | %(message)s",
handlers=[logging.StreamHandler()]
)
model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
audio_io = BidiAudioIO()
async def main():
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
asyncio.run(main())

Debug logs show:

  • Connection lifecycle events
  • Audio buffer operations
  • Tool execution details
  • Event processing flow

BidiAudioIO uses PyAudio, which does not support echo cancellation. A headset is required to prevent audio feedback loops.

If you don’t hear audio:

# List available audio devices
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
info = p.get_device_info_by_index(i)
print(f"{i}: {info['name']}")
# Specify output device explicitly
audio_io = BidiAudioIO(output_device_index=2)

If the agent doesn’t respond to speech:

# Specify input device explicitly
audio_io = BidiAudioIO(input_device_index=1)
# Check system permissions (macOS)
# System Preferences → Security & Privacy → Microphone

If you experience frequent disconnections:

# Use OpenAI for longer timeout (60 min vs Nova's 8 min)
from strands.experimental.bidi.models import BidiOpenAIRealtimeModel
model = BidiOpenAIRealtimeModel()
# Or handle restarts gracefully
async for event in agent.receive():
if isinstance(event, BidiConnectionRestartEvent):
print("Reconnecting...")
continue

Ready to learn more? Check out these resources:

  • Agent - Deep dive into BidiAgent configuration and lifecycle
  • Events - Complete guide to bidirectional streaming events
  • I/O Channels - Understanding and customizing input/output channels
  • Model Providers:
  • API Reference - Complete API documentation