Skip to content

MLX

strands-mlx is an MLX model provider for Strands Agents SDK that enables running AI agents locally on Apple Silicon. It supports inference, fine-tuning with LoRA, and vision models.

Features:

  • Apple Silicon Native: Optimized for M1/M2/M3/M4 chips using Apple’s MLX framework
  • LoRA Fine-tuning: Train custom adapters from agent conversations
  • Vision Support: Process images, audio, and video with multimodal models
  • Local Inference: Run agents completely offline without API calls
  • Training Pipeline: Collect data → Split → Train → Deploy workflow

Install strands-mlx along with the Strands Agents SDK:

Terminal window
pip install strands-mlx strands-agents-tools
  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python ≤3.13
from strands import Agent
from strands_mlx import MLXModel
from strands_tools import calculator
model = MLXModel(model_id="mlx-community/Qwen3-1.7B-4bit")
agent = Agent(model=model, tools=[calculator])
agent("What is 29 * 42?")
from strands import Agent
from strands_mlx import MLXVisionModel
model = MLXVisionModel(model_id="mlx-community/Qwen2-VL-2B-Instruct-4bit")
agent = Agent(model=model)
agent("Describe: <image>photo.jpg</image>")

Collect training data from agent conversations and fine-tune:

from strands import Agent
from strands_mlx import MLXModel, MLXSessionManager, dataset_splitter, mlx_trainer
# Collect training data
agent = Agent(
model=MLXModel(model_id="mlx-community/Qwen3-1.7B-4bit"),
session_manager=MLXSessionManager(session_id="training", storage_dir="./dataset"),
tools=[dataset_splitter, mlx_trainer],
)
# Have conversations (auto-saved)
agent("Teach me about quantum computing")
# Split and train
agent.tool.dataset_splitter(input_path="./dataset/training.jsonl")
agent.tool.mlx_trainer(
action="train",
config={
"model": "mlx-community/Qwen3-1.7B-4bit",
"data": "./dataset/training",
"adapter_path": "./adapter",
"iters": 200,
}
)
# Use trained model
trained = MLXModel("mlx-community/Qwen3-1.7B-4bit", adapter_path="./adapter")
expert_agent = Agent(model=trained)

The MLXModel accepts the following parameters:

ParameterDescriptionExampleRequired
model_idHuggingFace model ID"mlx-community/Qwen3-1.7B-4bit"Yes
adapter_pathPath to LoRA adapter"./adapter"No

Text:

  • mlx-community/Qwen3-1.7B-4bit (recommended for agents)
  • mlx-community/Qwen3-4B-4bit
  • mlx-community/Llama-3.2-1B-4bit

Vision:

  • mlx-community/Qwen2-VL-2B-Instruct-4bit (recommended)
  • mlx-community/llava-v1.6-mistral-7b-4bit

Browse more models at mlx-community on HuggingFace.

Use smaller quantized models or reduce batch size:

config = {
"grad_checkpoint": True,
"batch_size": 1,
"max_seq_length": 1024
}

Ensure you’re using a valid mlx-community model ID. Models are automatically downloaded from HuggingFace on first use.