MLX
strands-mlx is an MLX model provider for Strands Agents SDK that enables running AI agents locally on Apple Silicon. It supports inference, fine-tuning with LoRA, and vision models.
Features:
- Apple Silicon Native: Optimized for M1/M2/M3/M4 chips using Apple’s MLX framework
- LoRA Fine-tuning: Train custom adapters from agent conversations
- Vision Support: Process images, audio, and video with multimodal models
- Local Inference: Run agents completely offline without API calls
- Training Pipeline: Collect data → Split → Train → Deploy workflow
Installation
Section titled “Installation”Install strands-mlx along with the Strands Agents SDK:
pip install strands-mlx strands-agents-toolsRequirements
Section titled “Requirements”- macOS with Apple Silicon (M1/M2/M3/M4)
- Python ≤3.13
Basic Agent
Section titled “Basic Agent”from strands import Agentfrom strands_mlx import MLXModelfrom strands_tools import calculator
model = MLXModel(model_id="mlx-community/Qwen3-1.7B-4bit")agent = Agent(model=model, tools=[calculator])
agent("What is 29 * 42?")Vision Model
Section titled “Vision Model”from strands import Agentfrom strands_mlx import MLXVisionModel
model = MLXVisionModel(model_id="mlx-community/Qwen2-VL-2B-Instruct-4bit")agent = Agent(model=model)
agent("Describe: <image>photo.jpg</image>")Fine-tuning with LoRA
Section titled “Fine-tuning with LoRA”Collect training data from agent conversations and fine-tune:
from strands import Agentfrom strands_mlx import MLXModel, MLXSessionManager, dataset_splitter, mlx_trainer
# Collect training dataagent = Agent( model=MLXModel(model_id="mlx-community/Qwen3-1.7B-4bit"), session_manager=MLXSessionManager(session_id="training", storage_dir="./dataset"), tools=[dataset_splitter, mlx_trainer],)
# Have conversations (auto-saved)agent("Teach me about quantum computing")
# Split and trainagent.tool.dataset_splitter(input_path="./dataset/training.jsonl")agent.tool.mlx_trainer( action="train", config={ "model": "mlx-community/Qwen3-1.7B-4bit", "data": "./dataset/training", "adapter_path": "./adapter", "iters": 200, })
# Use trained modeltrained = MLXModel("mlx-community/Qwen3-1.7B-4bit", adapter_path="./adapter")expert_agent = Agent(model=trained)Configuration
Section titled “Configuration”Model Configuration
Section titled “Model Configuration”The MLXModel accepts the following parameters:
| Parameter | Description | Example | Required |
|---|---|---|---|
model_id | HuggingFace model ID | "mlx-community/Qwen3-1.7B-4bit" | Yes |
adapter_path | Path to LoRA adapter | "./adapter" | No |
Recommended Models
Section titled “Recommended Models”Text:
mlx-community/Qwen3-1.7B-4bit(recommended for agents)mlx-community/Qwen3-4B-4bitmlx-community/Llama-3.2-1B-4bit
Vision:
mlx-community/Qwen2-VL-2B-Instruct-4bit(recommended)mlx-community/llava-v1.6-mistral-7b-4bit
Browse more models at mlx-community on HuggingFace.
Troubleshooting
Section titled “Troubleshooting”Out of memory
Section titled “Out of memory”Use smaller quantized models or reduce batch size:
config = { "grad_checkpoint": True, "batch_size": 1, "max_seq_length": 1024}Model not found
Section titled “Model not found”Ensure you’re using a valid mlx-community model ID. Models are automatically downloaded from HuggingFace on first use.