Quickstart

This quickstart guide shows you how to create your first bidirectional streaming agent for real-time audio and text conversations. You’ll learn how to set up audio I/O, handle streaming events, use tools during conversations, and work with different model providers.

After completing this guide, you can build voice assistants, interactive chatbots, multi-modal applications, and integrate bidirectional streaming with web servers or custom I/O channels.

Prerequisites

Before starting, ensure you have:

Python 3.10+ installed (3.12+ required for Nova Sonic)
Audio hardware (microphone and speakers) for voice conversations
Model provider credentials configured (AWS, OpenAI, or Google)

Install the SDK

Bidirectional streaming is included in the Strands Agents SDK as an experimental feature. Install the SDK with bidirectional streaming support:

For All Providers

To install with support for all bidirectional streaming providers:

pip install "strands-agents[bidi-all]"

This will install PyAudio for audio I/O and all 3 supported providers (Nova Sonic, OpenAI, and Gemini Live).

For Specific Providers

You can also install support for specific providers only:

pip install "strands-agents[bidi]"

pip install "strands-agents[bidi,bidi-openai]"

pip install "strands-agents[bidi,bidi-gemini]"

Platform-Specific Audio Setup

brew install portaudio
pip install "strands-agents[bidi-all]"

sudo apt-get install portaudio19-dev python3-pyaudio
pip install "strands-agents[bidi-all]"

PyAudio typically installs without additional dependencies.

pip install "strands-agents[bidi-all]"

Configuring Credentials

Bidirectional streaming supports multiple model providers. Choose one based on your needs:

Nova Sonic is Amazon’s bidirectional streaming model. Configure AWS credentials:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1

Enable Nova Sonic model access in the Amazon Bedrock console.

For OpenAI’s Realtime API, set your API key:

export OPENAI_API_KEY=your_api_key

For Gemini Live API, set your API key:

export GOOGLE_API_KEY=your_api_key

Your First Voice Conversation

Now let’s create a simple voice-enabled agent that can have real-time conversations:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel

# Create a bidirectional streaming model
model = BidiNovaSonicModel()

# Create the agent
agent = BidiAgent(
    model=model,
    system_prompt="You are a helpful voice assistant. Keep responses concise and natural."
)

# Setup audio I/O for microphone and speakers
audio_io = BidiAudioIO()

# Run the conversation
async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())

And that’s it! We now have a voice-enabled agent that can:

Listen to your voice through the microphone
Process speech in real-time
Respond with natural voice output
Handle interruptions when you start speaking

Adding Text I/O

Combine audio with text input/output for debugging or multi-modal interactions:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.io import BidiTextIO
from strands.experimental.bidi.models import BidiNovaSonicModel

model = BidiNovaSonicModel()
agent = BidiAgent(
    model=model,
    system_prompt="You are a helpful assistant."
)

# Setup both audio and text I/O
audio_io = BidiAudioIO()
text_io = BidiTextIO()

async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output(), text_io.output()]  # Both audio and text
    )

asyncio.run(main())

Now you’ll see transcripts printed to the console while audio plays through your speakers.

Controlling Conversation Lifecycle

The run() method runs indefinitely by default. The simplest way to stop conversations is using Ctrl+C:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel

async def main():
    model = BidiNovaSonicModel()
    agent = BidiAgent(model=model)
    audio_io = BidiAudioIO()

    try:
        # Runs indefinitely until interrupted
        await agent.run(
            inputs=[audio_io.input()],
            outputs=[audio_io.output()]
        )
    except asyncio.CancelledError:
        print("\nConversation cancelled by user")
    finally:
        # stop() should only be called after run() exits
        await agent.stop()

asyncio.run(main())

Adding Tools to Your Agent

Just like standard Strands agents, bidirectional agents can use tools during conversations:

import asyncio
from strands import tool
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands_tools import calculator, current_time

# Define a custom tool
@tool
def get_weather(location: str) -> str:
    """
    Get the current weather for a location.

    Args:
        location: City name or location

    Returns:
        Weather information
    """
    # In a real application, call a weather API
    return f"The weather in {location} is sunny and 72°F"

# Create agent with tools
model = BidiNovaSonicModel()
agent = BidiAgent(
    model=model,
    tools=[calculator, current_time, get_weather],
    system_prompt="You are a helpful assistant with access to tools."
)

audio_io = BidiAudioIO()

async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())

You can now ask questions like:

“What time is it?”
“Calculate 25 times 48”
“What’s the weather in San Francisco?”

The agent automatically determines when to use tools and executes them concurrently without blocking the conversation.

Model Providers

Strands supports three bidirectional streaming providers:

Nova Sonic - Amazon’s bidirectional streaming model via AWS Bedrock
OpenAI Realtime - OpenAI’s Realtime API for voice conversations
Gemini Live - Google’s multimodal streaming API

Each provider has different features, timeout limits, and audio quality. See the individual provider documentation for detailed configuration options.

Configuring Audio Settings

Customize audio configuration for both the model and I/O:

import asyncio

from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models.gemini_live import BidiGeminiLiveModel

# Configure model audio settings
model = BidiGeminiLiveModel(
    provider_config={
        "audio": {
            "input_rate": 48000,   # Higher quality input
            "output_rate": 24000,  # Standard output
            "voice": "Puck"
        }
    }
)

# Configure I/O buffer settings
audio_io = BidiAudioIO(
    input_buffer_size=10,           # Max input queue size
    output_buffer_size=20,          # Max output queue size
    input_frames_per_buffer=512,   # Input chunk size
    output_frames_per_buffer=512   # Output chunk size
)

agent = BidiAgent(model=model)

async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())

The I/O automatically configures hardware to match the model’s audio requirements.

Handling Interruptions

Bidirectional agents automatically handle interruptions when users start speaking:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.types.events import BidiInterruptionEvent

model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
audio_io = BidiAudioIO()

async def main():
    await agent.start()

    # Start receiving events
    async for event in agent.receive():
        if isinstance(event, BidiInterruptionEvent):
            print(f"User interrupted: {event.reason}")
            # Audio output automatically cleared
            # Model stops generating
            # Ready for new input

asyncio.run(main())

Interruptions are detected via voice activity detection (VAD) and handled automatically:

User starts speaking
Model stops generating
Audio output buffer cleared
Model ready for new input

Manual Start and Stop

If you need more control over the agent lifecycle, you can manually call start() and stop():

import asyncio
from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.types.events import BidiResponseCompleteEvent

async def main():
    model = BidiNovaSonicModel()
    agent = BidiAgent(model=model)

    # Manually start the agent
    await agent.start()

    try:
        await agent.send("What is Python?")

        async for event in agent.receive():
            if isinstance(event, BidiResponseCompleteEvent):
                break
    finally:
        # Always stop after exiting receive loop
        await agent.stop()

asyncio.run(main())

See Controlling Conversation Lifecycle for more patterns and best practices.

Graceful Shutdown

Use the experimental stop_conversation tool to allow users to end conversations naturally:

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.tools import stop_conversation

model = BidiNovaSonicModel()
agent = BidiAgent(
    model=model,
    tools=[stop_conversation],
    system_prompt="You are a helpful assistant. When the user says 'stop conversation', use the stop_conversation tool."
)

audio_io = BidiAudioIO()

async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )
    # Conversation ends when user says "stop conversation"

asyncio.run(main())

The agent will gracefully close the connection when the user explicitly requests it.

Debug Logs

To enable debug logs in your agent, configure the strands logger:

import asyncio
import logging
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel

# Enable debug logs
logging.getLogger("strands").setLevel(logging.DEBUG)
logging.basicConfig(
    format="%(levelname)s | %(name)s | %(message)s",
    handlers=[logging.StreamHandler()]
)

model = BidiNovaSonicModel()
agent = BidiAgent(model=model)
audio_io = BidiAudioIO()

async def main():
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())

Debug logs show:

Connection lifecycle events
Audio buffer operations
Tool execution details
Event processing flow

Common Issues

Audio Feedback Loop in a Python Console

BidiAudioIO uses PyAudio, which does not support echo cancellation. A headset is required to prevent audio feedback loops.

No Audio Output

If you don’t hear audio:

# List available audio devices
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    info = p.get_device_info_by_index(i)
    print(f"{i}: {info['name']}")

# Specify output device explicitly
audio_io = BidiAudioIO(output_device_index=2)

Microphone Not Working

If the agent doesn’t respond to speech:

# Specify input device explicitly
audio_io = BidiAudioIO(input_device_index=1)

# Check system permissions (macOS)
# System Preferences → Security & Privacy → Microphone

Connection Timeouts

If you experience frequent disconnections:

# Use OpenAI for longer timeout (60 min vs Nova's 8 min)
from strands.experimental.bidi.models import BidiOpenAIRealtimeModel
model = BidiOpenAIRealtimeModel()

# Or handle restarts gracefully
async for event in agent.receive():
    if isinstance(event, BidiConnectionRestartEvent):
        print("Reconnecting...")
        continue

Next Steps

Ready to learn more? Check out these resources:

Agent - Deep dive into BidiAgent configuration and lifecycle
Events - Complete guide to bidirectional streaming events
I/O Channels - Understanding and customizing input/output channels
Model Providers:
- Nova Sonic - Amazon Bedrock’s bidirectional streaming model
- OpenAI Realtime - OpenAI’s Realtime API
- Gemini Live - Google’s Gemini Live API
API Reference - Complete API documentation