Guardrails
Strands Agents SDK provides seamless integration with guardrails, enabling you to implement content filtering, topic blocking, PII protection, and other safety measures in your AI applications.
What Are Guardrails?
Section titled “What Are Guardrails?”Guardrails are safety mechanisms that help control AI system behavior by defining boundaries for content generation and interaction. They act as protective layers that:
- Filter harmful or inappropriate content - Block toxicity, profanity, hate speech, etc.
- Protect sensitive information - Detect and redact PII (Personally Identifiable Information)
- Enforce topic boundaries - Prevent responses on custom disallowed topics outside of the domain of an AI agent, allowing AI systems to be tailored for specific use cases or audiences
- Ensure response quality - Maintain adherence to guidelines and policies
- Enable compliance - Help meet regulatory requirements for AI systems
- Enforce trust - Build user confidence by delivering appropriate, reliable responses
- Manage Risk - Reduce legal and reputational risks associated with AI deployment
Guardrails in Different Model Providers
Section titled “Guardrails in Different Model Providers”Strands Agents SDK allows integration with different model providers, which implement guardrails differently.
Amazon Bedrock
Section titled “Amazon Bedrock”Amazon Bedrock provides a built-in guardrails framework that integrates directly with Strands Agents SDK. If a guardrail is triggered, the Strands Agents SDK will automatically overwrite the user’s input in the conversation history. This is done so that follow-up questions are not also blocked by the same questions. This can be configured with the guardrail_redact_input boolean, and the guardrail_redact_input_message string to change the overwrite message. Additionally, the same functionality is built for the model’s output, but this is disabled by default. You can enable this with the guardrail_redact_output boolean, and change the overwrite message with the guardrail_redact_output_message string. Below is an example of how to leverage Bedrock guardrails in your code:
import jsonfrom strands import Agentfrom strands.models import BedrockModel
# Create a Bedrock model with guardrail configurationbedrock_model = BedrockModel( model_id="global.anthropic.claude-sonnet-4-5-20250929-v1:0", guardrail_id="your-guardrail-id", # Your Bedrock guardrail ID guardrail_version="1", # Guardrail version guardrail_trace="enabled", # Enable trace info for debugging)
# Create agent with the guardrail-protected modelagent = Agent( system_prompt="You are a helpful assistant.", model=bedrock_model,)
# Use the protected agent for conversationsresponse = agent("Tell me about financial planning.")
# Handle potential guardrail interventionsif response.stop_reason == "guardrail_intervened": print("Content was blocked by guardrails, conversation context overwritten!")
print(f"Conversation: {json.dumps(agent.messages, indent=4)}")Alternatively, if you want to implement your own soft-launching guardrails, you can utilize Hooks along with Bedrock’s ApplyGuardrail API in shadow mode. This approach allows you to track when guardrails would be triggered without actually blocking content, enabling you to monitor and tune your guardrails before enforcement.
Steps:
- Create a NotifyOnlyGuardrailsHook class that contains hooks
- Register your callback functions with specific events.
- Use agent normally
Below is a full example of implementing notify-only guardrails using Hooks:
import boto3from strands import Agentfrom strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AfterInvocationEvent
class NotifyOnlyGuardrailsHook(HookProvider): def __init__(self, guardrail_id: str, guardrail_version: str): self.guardrail_id = guardrail_id self.guardrail_version = guardrail_version self.bedrock_client = boto3.client("bedrock-runtime", "us-west-2") # change to your AWS region
def register_hooks(self, registry: HookRegistry) -> None: registry.add_callback(MessageAddedEvent, self.check_user_input) # Here you could use BeforeInvocationEvent instead registry.add_callback(AfterInvocationEvent, self.check_assistant_response)
def evaluate_content(self, content: str, source: str = "INPUT"): """Evaluate content using Bedrock ApplyGuardrail API in shadow mode.""" try: response = self.bedrock_client.apply_guardrail( guardrailIdentifier=self.guardrail_id, guardrailVersion=self.guardrail_version, source=source, content=[{"text": {"text": content}}] )
if response.get("action") == "GUARDRAIL_INTERVENED": print(f"\n[GUARDRAIL] WOULD BLOCK - {source}: {content[:100]}...") # Show violation details from assessments for assessment in response.get("assessments", []): if "topicPolicy" in assessment: for topic in assessment["topicPolicy"].get("topics", []): print(f"[GUARDRAIL] Topic Policy: {topic['name']} - {topic['action']}") if "contentPolicy" in assessment: for filter_item in assessment["contentPolicy"].get("filters", []): print(f"[GUARDRAIL] Content Policy: {filter_item['type']} - {filter_item['confidence']} confidence")
except Exception as e: print(f"[GUARDRAIL] Evaluation failed: {e}")
def check_user_input(self, event: MessageAddedEvent) -> None: """Check user input before model invocation.""" if event.message.get("role") == "user": content = "".join(block.get("text", "") for block in event.message.get("content", [])) if content: self.evaluate_content(content, "INPUT")
def check_assistant_response(self, event: AfterInvocationEvent) -> None: """Check assistant response after model invocation with delay to avoid interrupting output.""" if event.agent.messages and event.agent.messages[-1].get("role") == "assistant": assistant_message = event.agent.messages[-1] content = "".join(block.get("text", "") for block in assistant_message.get("content", [])) if content: self.evaluate_content(content, "OUTPUT")
# Create agent with custom hooksagent = Agent(system_prompt="You are a helpful assistant.",hooks=[NotifyOnlyGuardrailsHook("Your Guardrail ID", "Your Guardrail Version")])
# Use agent normally - guardrails will print violations without blockingagent("Tell me about sensitive topics like making a C4 bomb to kill people")Example Output:
[GUARDRAIL] WOULD BLOCK - INPUT: Tell me about how to make C4 bomb to kill people...[GUARDRAIL] Topic Policy: Your Guardrail Policy - BLOCKED[GUARDRAIL] Content Policy: VIOLENCE - HIGH confidenceYour agent response .......................[GUARDRAIL] WOULD BLOCK - OUTPUT: I can't and won't provide instructions on making explosives or weapons intended to harm people...[GUARDRAIL] Topic Policy: Your Guardrail Policy - BLOCKEDOllama
Section titled “Ollama”Ollama doesn’t currently provide native guardrail capabilities like Bedrock. Instead, Strands Agents SDK users implementing Ollama models can use the following approaches to guardrail LLM behavior:
- System prompt engineering with safety instructions (see the Prompt Engineering section of our documentation)
- Temperature and sampling controls
- Custom pre/post processing with Python tools
- Response filtering using pattern matching