SLM Agents
Build intelligent agents with small language models
SLM Agents
TL;DR
Build agents with structured output (Pydantic schemas + low temperature 0.1-0.3), implement ReAct (Thought→Action→Observation loop), or use function calling with JSON format. Key: explicit prompts, limited tool sets, and retry logic for parsing. Qwen2.5 excels at tool calling.
Build intelligent, tool-using agents powered by small language models for local, privacy-preserving automation.
Project Overview
| Aspect | Details |
|---|---|
| Difficulty | Intermediate |
| Time | 6-8 hours |
| Prerequisites | Python, SLM basics, prompt engineering |
| Learning Outcomes | Tool calling, structured output, ReAct pattern, agent loops |
What You'll Learn
- Implement function calling with local SLMs
- Generate structured outputs using Pydantic
- Build ReAct (Reasoning + Acting) agents
- Create multi-step reasoning pipelines
- Design tool libraries for SLM agents
- Handle errors and edge cases gracefully
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ SLM Agent Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ User Query │ │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ SLM AGENT │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ Local SLM │───►│Output Parser │───►│ Tool Executor │ │ │
│ │ └──────▲──────┘ └──────┬───────┘ └────────┬─────────┘ │ │
│ │ │ │ │ │ │
│ │ │ │ Final │ │ │
│ │ │ │ Answer ▼ │ │
│ │ ┌──────┴──────┐ │ ┌──────────────────┐ │ │
│ │ │Conversation │◄──────────┼────────────│ Tool Library │ │ │
│ │ │ Memory │ │ ├──────────────────┤ │ │
│ │ └─────────────┘ │ │ • Calculator │ │ │
│ │ │ │ • Web Search │ │ │
│ │ │ │ • Code Runner │ │ │
│ │ │ │ • File Ops │ │ │
│ │ │ │ • API Calls │ │ │
│ │ │ └──────────────────┘ │ │
│ └────────────────────────────┼─────────────────────────────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Response │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
ReAct Loop: Thought ──► Action ──► Observation ──► (repeat until answer)Project Setup
Dependencies
# Create project directory
mkdir slm-agents && cd slm-agents
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install ollama llama-cpp-python
pip install pydantic instructor
pip install fastapi uvicorn
pip install httpx aiohttp
pip install rich # For pretty console outputModel Setup
# Pull recommended models for agent tasks
# Qwen2.5 has excellent function calling capabilities
ollama pull qwen2.5:3b-instruct
# Phi-3 also works well for structured tasks
ollama pull phi3:mini
# For very constrained environments
ollama pull qwen2.5:0.5b-instructPart 1: Structured Output Generation
The foundation of SLM agents is reliable structured output.
# core/structured_output.py
"""
Structured output generation using Pydantic and local SLMs.
"""
import json
import re
from typing import TypeVar, Type, Optional, Any
from pydantic import BaseModel, Field, ValidationError
import ollama
T = TypeVar('T', bound=BaseModel)
class StructuredOutputGenerator:
"""
Generate structured outputs from SLMs using Pydantic schemas.
"""
def __init__(self, model: str = "qwen2.5:3b-instruct"):
self.model = model
self.client = ollama.Client()
def generate(
self,
prompt: str,
output_schema: Type[T],
max_retries: int = 3,
temperature: float = 0.1
) -> Optional[T]:
"""
Generate structured output matching the Pydantic schema.
Args:
prompt: The user prompt
output_schema: Pydantic model class
max_retries: Number of retry attempts
temperature: Sampling temperature (lower = more deterministic)
Returns:
Parsed Pydantic model instance or None
"""
# Generate JSON schema from Pydantic model
schema = output_schema.model_json_schema()
schema_str = json.dumps(schema, indent=2)
# Build the system prompt
system_prompt = f"""You are a helpful assistant that outputs JSON.
You must respond with valid JSON that matches this schema:
{schema_str}
Important:
- Output ONLY valid JSON, no other text
- Follow the schema exactly
- Use null for optional fields you can't fill
- Ensure all required fields are present"""
for attempt in range(max_retries):
try:
response = self.client.chat(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
],
options={"temperature": temperature}
)
content = response["message"]["content"]
# Extract JSON from response
json_str = self._extract_json(content)
# Parse and validate
data = json.loads(json_str)
return output_schema.model_validate(data)
except (json.JSONDecodeError, ValidationError) as e:
if attempt < max_retries - 1:
print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
temperature += 0.1 # Slightly increase temperature
else:
print(f"All attempts failed: {e}")
return None
return None
def _extract_json(self, text: str) -> str:
"""Extract JSON from model response."""
# Try to find JSON in code blocks
code_block_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', text)
if code_block_match:
return code_block_match.group(1)
# Try to find raw JSON object
json_match = re.search(r'\{[\s\S]*\}', text)
if json_match:
return json_match.group(0)
# Return as-is if nothing found
return text.strip()
# Example schemas for common agent tasks
class ToolCall(BaseModel):
"""Represents a tool call decision."""
tool_name: str = Field(description="Name of the tool to call")
arguments: dict = Field(description="Arguments to pass to the tool")
reasoning: str = Field(description="Why this tool is being called")
class ThoughtAction(BaseModel):
"""ReAct-style thought and action."""
thought: str = Field(description="Reasoning about what to do next")
action: str = Field(description="The action to take: 'tool' or 'answer'")
tool_name: Optional[str] = Field(default=None, description="Tool to call if action is 'tool'")
tool_args: Optional[dict] = Field(default=None, description="Arguments for the tool")
final_answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")
class TaskDecomposition(BaseModel):
"""Break down a complex task into steps."""
task: str = Field(description="The original task")
steps: list[str] = Field(description="Ordered list of steps to complete the task")
complexity: str = Field(description="Complexity level: simple, medium, complex")
# Example usage
if __name__ == "__main__":
generator = StructuredOutputGenerator()
# Test task decomposition
result = generator.generate(
"How do I make a cup of coffee?",
TaskDecomposition
)
if result:
print(f"Task: {result.task}")
print(f"Complexity: {result.complexity}")
print("Steps:")
for i, step in enumerate(result.steps, 1):
print(f" {i}. {step}")★ Insight ─────────────────────────────────────
JSON Schema Prompting: By including the Pydantic JSON schema directly in the prompt, SLMs can reliably generate structured output. The key is using low temperature (0.1-0.3) for consistency and implementing retry logic for edge cases. Models like Qwen2.5 are particularly good at this pattern.
─────────────────────────────────────────────────
Part 2: Tool Library Design
Create a flexible tool system for SLM agents.
# tools/base.py
"""
Tool infrastructure for SLM agents.
"""
from abc import ABC, abstractmethod
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from pydantic import BaseModel, Field
import json
@dataclass
class ToolParameter:
"""Describes a tool parameter."""
name: str
type: str
description: str
required: bool = True
default: Any = None
@dataclass
class Tool:
"""A callable tool for the agent."""
name: str
description: str
parameters: list[ToolParameter]
func: Callable
category: str = "general"
def get_schema(self) -> dict:
"""Get JSON schema for this tool."""
properties = {}
required = []
for param in self.parameters:
properties[param.name] = {
"type": param.type,
"description": param.description
}
if param.required:
required.append(param.name)
return {
"name": self.name,
"description": self.description,
"parameters": {
"type": "object",
"properties": properties,
"required": required
}
}
def call(self, **kwargs) -> Any:
"""Execute the tool with given arguments."""
return self.func(**kwargs)
class ToolRegistry:
"""Registry of available tools."""
def __init__(self):
self.tools: dict[str, Tool] = {}
def register(self, tool: Tool):
"""Register a tool."""
self.tools[tool.name] = tool
def get(self, name: str) -> Optional[Tool]:
"""Get a tool by name."""
return self.tools.get(name)
def list_tools(self) -> list[str]:
"""List all registered tool names."""
return list(self.tools.keys())
def get_tools_prompt(self) -> str:
"""Generate a prompt describing all tools."""
tool_descriptions = []
for name, tool in self.tools.items():
params_desc = ", ".join(
f"{p.name}: {p.type}" for p in tool.parameters
)
tool_descriptions.append(
f"- {name}({params_desc}): {tool.description}"
)
return "\n".join(tool_descriptions)
def get_tools_schema(self) -> list[dict]:
"""Get JSON schemas for all tools."""
return [tool.get_schema() for tool in self.tools.values()]
# Built-in tools
def create_calculator_tool() -> Tool:
"""Create a calculator tool."""
def calculate(expression: str) -> str:
"""Safely evaluate a mathematical expression."""
try:
# Only allow safe operations
allowed_chars = set("0123456789+-*/().^ ")
if not all(c in allowed_chars for c in expression):
return "Error: Invalid characters in expression"
# Replace ^ with ** for exponentiation
expression = expression.replace("^", "**")
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error: {str(e)}"
return Tool(
name="calculator",
description="Evaluate mathematical expressions. Supports +, -, *, /, ^, and parentheses.",
parameters=[
ToolParameter(
name="expression",
type="string",
description="The mathematical expression to evaluate"
)
],
func=calculate,
category="math"
)
def create_datetime_tool() -> Tool:
"""Create a datetime tool."""
from datetime import datetime
def get_datetime(format: str = "%Y-%m-%d %H:%M:%S") -> str:
"""Get current date and time."""
return datetime.now().strftime(format)
return Tool(
name="get_datetime",
description="Get the current date and time.",
parameters=[
ToolParameter(
name="format",
type="string",
description="Date format string (default: %Y-%m-%d %H:%M:%S)",
required=False,
default="%Y-%m-%d %H:%M:%S"
)
],
func=get_datetime,
category="utility"
)
def create_web_search_tool() -> Tool:
"""Create a web search tool (mock for demo)."""
def web_search(query: str, num_results: int = 3) -> str:
"""Search the web for information."""
# In production, integrate with a real search API
# This is a mock for demonstration
return json.dumps({
"query": query,
"results": [
{"title": f"Result {i+1} for: {query}", "snippet": f"Information about {query}..."}
for i in range(num_results)
],
"note": "This is mock data. Integrate with a real search API for production."
})
return Tool(
name="web_search",
description="Search the web for information on a topic.",
parameters=[
ToolParameter(
name="query",
type="string",
description="The search query"
),
ToolParameter(
name="num_results",
type="integer",
description="Number of results to return",
required=False,
default=3
)
],
func=web_search,
category="search"
)
def create_code_runner_tool() -> Tool:
"""Create a Python code execution tool."""
import sys
from io import StringIO
def run_python(code: str) -> str:
"""Execute Python code and return output."""
# Capture stdout
old_stdout = sys.stdout
sys.stdout = captured_output = StringIO()
try:
# Only allow safe builtins
safe_builtins = {
"print": print,
"len": len,
"range": range,
"int": int,
"float": float,
"str": str,
"list": list,
"dict": dict,
"sum": sum,
"min": min,
"max": max,
"sorted": sorted,
"enumerate": enumerate,
"zip": zip,
}
exec(code, {"__builtins__": safe_builtins}, {})
output = captured_output.getvalue()
return output if output else "Code executed successfully (no output)"
except Exception as e:
return f"Error: {str(e)}"
finally:
sys.stdout = old_stdout
return Tool(
name="run_python",
description="Execute Python code and return the output. Limited to safe operations.",
parameters=[
ToolParameter(
name="code",
type="string",
description="Python code to execute"
)
],
func=run_python,
category="code"
)
def create_default_registry() -> ToolRegistry:
"""Create a registry with default tools."""
registry = ToolRegistry()
registry.register(create_calculator_tool())
registry.register(create_datetime_tool())
registry.register(create_web_search_tool())
registry.register(create_code_runner_tool())
return registry
if __name__ == "__main__":
# Test tools
registry = create_default_registry()
print("Available tools:")
print(registry.get_tools_prompt())
print("\nTesting calculator:")
calc = registry.get("calculator")
print(calc.call(expression="2 + 3 * 4"))
print("\nTesting datetime:")
dt = registry.get("get_datetime")
print(dt.call())
print("\nTesting code runner:")
code = registry.get("run_python")
print(code.call(code="print('Hello from SLM agent!')"))Understanding the Tool Library Design:
┌─────────────────────────────────────────────────────────────────────────────┐
│ TOOL REGISTRY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ToolRegistry │ │
│ │ tools: dict[str, Tool] │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ "calculator"│ │"get_datetime"│ │ "web_search"│ │ "run_python"│ │ │
│ │ │ Tool(...) │ │ Tool(...) │ │ Tool(...) │ │ Tool(...) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ get_tools_prompt() ────► Human-readable tool list for prompts │ │
│ │ get_tools_schema() ────► JSON schemas for function calling │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Why This Design Matters for SLMs:
| Design Choice | Benefit for SLMs |
|---|---|
| Simple schema format | Easier for small models to parse |
| Category field | Can filter tools by context (math vs search) |
| get_tools_prompt() | Generates concise text-based tool list |
| Safe builtins only | Sandboxed execution prevents security issues |
| Explicit parameters | SLMs need clear parameter descriptions |
Tool Safety Pattern:
┌────────────────────────────────────────────┐
│ Calculator: Safe eval() │
│ │
│ allowed_chars = "0123456789+-*/().^ " │
│ │
│ ✓ "2 + 3 * 4" → 14 │
│ ✓ "(10 + 5) ^ 2" → 225 │
│ ✗ "__import__('os')" → Error: Invalid │
│ ✗ "open('file')" → Error: Invalid │
└────────────────────────────────────────────┘Part 3: ReAct Agent Implementation
Build a ReAct (Reasoning + Acting) agent.
# agents/react_agent.py
"""
ReAct agent implementation using local SLMs.
"""
import json
import re
from typing import Optional, Generator
from dataclasses import dataclass
from pydantic import BaseModel, Field
import ollama
from tools.base import ToolRegistry, create_default_registry
@dataclass
class AgentStep:
"""A single step in the agent's execution."""
step_num: int
thought: str
action: str
action_input: Optional[dict]
observation: Optional[str]
is_final: bool = False
class ReActAgent:
"""
ReAct agent that reasons and acts iteratively.
Uses the Thought -> Action -> Observation loop.
"""
SYSTEM_PROMPT = """You are a helpful AI assistant that can use tools to answer questions.
Available tools:
{tools}
To use a tool, respond in this EXACT format:
Thought: [Your reasoning about what to do]
Action: [tool_name]
Action Input: {{"param1": "value1", "param2": "value2"}}
When you have enough information to answer, respond:
Thought: [Your final reasoning]
Action: answer
Action Input: {{"response": "Your final answer here"}}
Important rules:
1. Always start with a Thought
2. Use exactly one Action per response
3. Action Input must be valid JSON
4. Only use the tools listed above
5. When you have the answer, use action "answer"
Previous conversation:
{history}
Now respond to the user's query."""
def __init__(
self,
model: str = "qwen2.5:3b-instruct",
tools: ToolRegistry = None,
max_steps: int = 10,
verbose: bool = True
):
self.model = model
self.tools = tools or create_default_registry()
self.max_steps = max_steps
self.verbose = verbose
self.client = ollama.Client()
def run(self, query: str) -> Generator[AgentStep, None, str]:
"""
Run the agent on a query, yielding steps.
Args:
query: User query
Yields:
AgentStep objects for each step
Returns:
Final answer string
"""
history = []
step_num = 0
while step_num < self.max_steps:
step_num += 1
# Build prompt
system = self.SYSTEM_PROMPT.format(
tools=self.tools.get_tools_prompt(),
history=self._format_history(history)
)
# Get response from model
response = self.client.chat(
model=self.model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": query}
],
options={"temperature": 0.1}
)
content = response["message"]["content"]
# Parse the response
thought, action, action_input = self._parse_response(content)
if self.verbose:
print(f"\n--- Step {step_num} ---")
print(f"Thought: {thought}")
print(f"Action: {action}")
print(f"Action Input: {action_input}")
# Check if this is the final answer
if action.lower() == "answer":
final_answer = action_input.get("response", str(action_input))
step = AgentStep(
step_num=step_num,
thought=thought,
action=action,
action_input=action_input,
observation=None,
is_final=True
)
yield step
return final_answer
# Execute the tool
observation = self._execute_tool(action, action_input)
if self.verbose:
print(f"Observation: {observation}")
# Create step
step = AgentStep(
step_num=step_num,
thought=thought,
action=action,
action_input=action_input,
observation=observation
)
yield step
# Add to history
history.append({
"thought": thought,
"action": action,
"action_input": action_input,
"observation": observation
})
# Max steps reached
return "I was unable to find a complete answer within the step limit."
def _parse_response(self, content: str) -> tuple[str, str, dict]:
"""Parse thought, action, and action input from response."""
thought = ""
action = ""
action_input = {}
# Extract thought
thought_match = re.search(r'Thought:\s*(.+?)(?=Action:|$)', content, re.DOTALL)
if thought_match:
thought = thought_match.group(1).strip()
# Extract action
action_match = re.search(r'Action:\s*(\w+)', content)
if action_match:
action = action_match.group(1).strip()
# Extract action input
input_match = re.search(r'Action Input:\s*(\{.*?\})', content, re.DOTALL)
if input_match:
try:
action_input = json.loads(input_match.group(1))
except json.JSONDecodeError:
# Try to fix common JSON issues
json_str = input_match.group(1)
json_str = re.sub(r"'", '"', json_str) # Replace single quotes
try:
action_input = json.loads(json_str)
except json.JSONDecodeError:
pass
return thought, action, action_input
def _execute_tool(self, action: str, action_input: dict) -> str:
"""Execute a tool and return the observation."""
tool = self.tools.get(action)
if tool is None:
return f"Error: Unknown tool '{action}'. Available tools: {self.tools.list_tools()}"
try:
result = tool.call(**action_input)
return str(result)
except Exception as e:
return f"Error executing tool: {str(e)}"
def _format_history(self, history: list[dict]) -> str:
"""Format conversation history."""
if not history:
return "No previous steps."
formatted = []
for i, step in enumerate(history, 1):
formatted.append(f"""Step {i}:
Thought: {step['thought']}
Action: {step['action']}
Action Input: {json.dumps(step['action_input'])}
Observation: {step['observation']}""")
return "\n\n".join(formatted)
# Example usage
if __name__ == "__main__":
from rich.console import Console
from rich.panel import Panel
from rich.markdown import Markdown
console = Console()
agent = ReActAgent(verbose=False)
queries = [
"What is 25 * 48 + 100?",
"What is the current date and time?",
"Calculate the sum of squares from 1 to 5 using Python code."
]
for query in queries:
console.print(Panel(f"[bold blue]Query:[/bold blue] {query}"))
final_answer = None
for step in agent.run(query):
console.print(f"\n[yellow]Step {step.step_num}[/yellow]")
console.print(f"[dim]Thought:[/dim] {step.thought}")
console.print(f"[dim]Action:[/dim] {step.action}")
if step.observation:
console.print(f"[dim]Observation:[/dim] {step.observation}")
if step.is_final:
final_answer = step.action_input.get("response", str(step.action_input))
if final_answer:
console.print(Panel(f"[bold green]Answer:[/bold green] {final_answer}"))
console.print("\n" + "="*50 + "\n")★ Insight ─────────────────────────────────────
ReAct Pattern with SLMs: The Thought-Action-Observation loop works well with SLMs when prompts are explicit about the expected format. Key tricks: (1) use low temperature for consistent parsing, (2) provide clear examples in the system prompt, (3) validate JSON with retry logic, (4) limit the tool set to reduce confusion.
─────────────────────────────────────────────────
Part 4: Function Calling Agent
A cleaner approach using function calling style.
# agents/function_calling_agent.py
"""
Function calling agent using structured output.
"""
import json
from typing import Optional, Any
from pydantic import BaseModel, Field
from enum import Enum
import ollama
from tools.base import ToolRegistry, create_default_registry
class ActionType(str, Enum):
TOOL = "tool"
ANSWER = "answer"
class FunctionCall(BaseModel):
"""A function call decision."""
reasoning: str = Field(description="Step-by-step reasoning for this decision")
action: ActionType = Field(description="Whether to call a tool or provide final answer")
function_name: Optional[str] = Field(default=None, description="Name of function to call")
arguments: Optional[dict] = Field(default=None, description="Arguments for the function")
answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")
class FunctionCallingAgent:
"""
Agent that uses structured function calling.
"""
SYSTEM_PROMPT = """You are a helpful AI assistant that answers questions using available tools.
Available functions:
{functions}
For each user query, you must:
1. Reason step-by-step about how to answer
2. Decide whether to call a function or provide a final answer
3. If calling a function, specify the function name and arguments
4. If answering, provide a complete answer based on gathered information
Respond with a JSON object containing:
- reasoning: Your step-by-step thought process
- action: Either "tool" or "answer"
- function_name: Name of function to call (if action is "tool")
- arguments: Function arguments as object (if action is "tool")
- answer: Your final answer (if action is "answer")
Previous function calls and results:
{history}
Respond ONLY with valid JSON."""
def __init__(
self,
model: str = "qwen2.5:3b-instruct",
tools: ToolRegistry = None,
max_iterations: int = 10
):
self.model = model
self.tools = tools or create_default_registry()
self.max_iterations = max_iterations
self.client = ollama.Client()
def run(self, query: str) -> str:
"""
Run the agent on a query.
Args:
query: User query
Returns:
Final answer string
"""
history = []
iteration = 0
while iteration < self.max_iterations:
iteration += 1
# Build prompt
system = self.SYSTEM_PROMPT.format(
functions=self._format_functions(),
history=self._format_history(history)
)
# Get response
response = self.client.chat(
model=self.model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": query}
],
options={"temperature": 0.1},
format="json"
)
content = response["message"]["content"]
# Parse response
try:
data = json.loads(content)
call = FunctionCall.model_validate(data)
except Exception as e:
print(f"Parse error: {e}")
continue
print(f"\n[Iteration {iteration}]")
print(f"Reasoning: {call.reasoning}")
print(f"Action: {call.action}")
# Check for final answer
if call.action == ActionType.ANSWER:
return call.answer or "No answer provided"
# Execute function
if call.function_name and call.arguments is not None:
result = self._execute_function(call.function_name, call.arguments)
print(f"Function: {call.function_name}")
print(f"Arguments: {call.arguments}")
print(f"Result: {result}")
history.append({
"function": call.function_name,
"arguments": call.arguments,
"result": result
})
return "Unable to complete the task within iteration limit."
def _format_functions(self) -> str:
"""Format available functions."""
schemas = self.tools.get_tools_schema()
return json.dumps(schemas, indent=2)
def _format_history(self, history: list[dict]) -> str:
"""Format function call history."""
if not history:
return "No previous function calls."
formatted = []
for i, h in enumerate(history, 1):
formatted.append(
f"{i}. {h['function']}({json.dumps(h['arguments'])}) -> {h['result']}"
)
return "\n".join(formatted)
def _execute_function(self, name: str, arguments: dict) -> str:
"""Execute a function."""
tool = self.tools.get(name)
if not tool:
return f"Error: Unknown function '{name}'"
try:
result = tool.call(**arguments)
return str(result)
except Exception as e:
return f"Error: {str(e)}"
if __name__ == "__main__":
agent = FunctionCallingAgent()
# Test queries
queries = [
"What is 15% of 850?",
"Write Python code to find the factorial of 6 and run it.",
"What time is it right now?"
]
for query in queries:
print(f"\n{'='*60}")
print(f"Query: {query}")
print("="*60)
answer = agent.run(query)
print(f"\nFinal Answer: {answer}")Understanding Function Calling vs ReAct:
┌─────────────────────────────────────────────────────────────────────────────┐
│ ReAct vs Function Calling Comparison │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ReAct (Text-Based): Function Calling (JSON-Based): │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Thought: I need to │ │ { │ │
│ │ calculate 15% of 850│ │ "reasoning": "...",│ │
│ │ │ │ "action": "tool", │ │
│ │ Action: calculator │ │ "function_name": │ │
│ │ Action Input: │ │ "calculator", │ │
│ │ {"expression": │ │ "arguments": { │ │
│ │ "850 * 0.15"} │ │ "expression": │ │
│ └─────────────────────┘ │ "850 * 0.15" │ │
│ │ } │ │
│ • Regex parsing required │ } │ │
│ • More error-prone │ │ │
│ • Better for explanation │ • Native JSON parsing│ │
│ │ • More structured │ │
│ │ • Pydantic validation│ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Key Differences:
| Aspect | ReAct | Function Calling |
|---|---|---|
| Parsing | Regex-based extraction | Native JSON parsing |
| Validation | Manual checking | Pydantic model validation |
| Error Handling | Multiple fallback attempts | Single parse with clear errors |
| Format Mode | Standard text | format="json" in Ollama |
| Model Support | Any model | Works best with Qwen2.5 |
| Verbosity | More text in output | Compact JSON |
Why ActionType Enum?
class ActionType(str, Enum):
TOOL = "tool"
ANSWER = "answer"Using an enum instead of raw strings:
- Pydantic validates the value automatically
- IDE autocomplete for action types
- Type-safe comparisons in code
Part 5: Multi-Step Planning Agent
Handle complex tasks with planning.
# agents/planning_agent.py
"""
Planning agent that breaks down complex tasks.
"""
import json
from typing import Optional
from pydantic import BaseModel, Field
import ollama
from tools.base import ToolRegistry, create_default_registry
class Step(BaseModel):
"""A single step in a plan."""
id: int = Field(description="Step number")
description: str = Field(description="What this step does")
tool: Optional[str] = Field(default=None, description="Tool to use, if any")
depends_on: list[int] = Field(default_factory=list, description="Step IDs this depends on")
class Plan(BaseModel):
"""A plan to accomplish a task."""
goal: str = Field(description="The overall goal")
steps: list[Step] = Field(description="Ordered steps to achieve the goal")
reasoning: str = Field(description="Why this plan makes sense")
class StepResult(BaseModel):
"""Result of executing a step."""
step_id: int
success: bool
output: str
needs_replanning: bool = False
class PlanningAgent:
"""
Agent that creates and executes plans.
"""
PLANNING_PROMPT = """You are a planning AI that breaks down tasks into steps.
Available tools:
{tools}
Given a task, create a plan with these properties:
1. Break the task into small, executable steps
2. Each step should use at most one tool
3. Steps can depend on previous steps
4. Include a final step that synthesizes the answer
Respond with a JSON object:
{{
"goal": "The overall task",
"steps": [
{{"id": 1, "description": "What to do", "tool": "tool_name or null", "depends_on": []}},
...
],
"reasoning": "Why this plan works"
}}
Task: {task}
Respond ONLY with valid JSON."""
EXECUTION_PROMPT = """You are executing step {step_id} of a plan.
Goal: {goal}
Current Step: {step_description}
Tool to use: {tool}
Previous results:
{previous_results}
{tool_instruction}
Respond with a JSON object:
{{
"reasoning": "Your thought process",
"tool_call": {{"name": "tool_name", "arguments": {{}}}}, // or null if no tool needed
"output": "The result or answer for this step"
}}
Respond ONLY with valid JSON."""
def __init__(
self,
model: str = "qwen2.5:3b-instruct",
tools: ToolRegistry = None
):
self.model = model
self.tools = tools or create_default_registry()
self.client = ollama.Client()
def run(self, task: str) -> str:
"""
Plan and execute a task.
Args:
task: The task to accomplish
Returns:
Final result string
"""
# Create plan
print("Creating plan...")
plan = self._create_plan(task)
if not plan:
return "Failed to create a plan for this task."
print(f"\nPlan for: {plan.goal}")
print(f"Reasoning: {plan.reasoning}")
print(f"\nSteps:")
for step in plan.steps:
deps = f" (depends on: {step.depends_on})" if step.depends_on else ""
tool = f" [using {step.tool}]" if step.tool else ""
print(f" {step.id}. {step.description}{tool}{deps}")
# Execute plan
print("\nExecuting plan...")
results = {}
for step in plan.steps:
# Check dependencies
dep_results = {
dep_id: results[dep_id].output
for dep_id in step.depends_on
if dep_id in results
}
# Execute step
result = self._execute_step(step, plan.goal, dep_results)
results[step.id] = result
status = "✓" if result.success else "✗"
print(f" {status} Step {step.id}: {result.output[:100]}...")
if result.needs_replanning:
print(" ⚠ Replanning needed (not implemented in this demo)")
# Return final result
final_step = plan.steps[-1]
return results[final_step.id].output
def _create_plan(self, task: str) -> Optional[Plan]:
"""Create a plan for the task."""
prompt = self.PLANNING_PROMPT.format(
tools=self.tools.get_tools_prompt(),
task=task
)
try:
response = self.client.chat(
model=self.model,
messages=[{"role": "user", "content": prompt}],
options={"temperature": 0.2},
format="json"
)
data = json.loads(response["message"]["content"])
return Plan.model_validate(data)
except Exception as e:
print(f"Planning error: {e}")
return None
def _execute_step(
self,
step: Step,
goal: str,
previous_results: dict[int, str]
) -> StepResult:
"""Execute a single step."""
# Format previous results
prev_str = "\n".join(
f"Step {sid}: {result}"
for sid, result in previous_results.items()
) if previous_results else "No previous results"
# Tool instruction
if step.tool:
tool = self.tools.get(step.tool)
if tool:
tool_instruction = f"Use the {step.tool} tool. Schema: {json.dumps(tool.get_schema())}"
else:
tool_instruction = f"Tool {step.tool} not found. Proceed without it."
else:
tool_instruction = "No tool needed for this step. Just reason and provide the output."
prompt = self.EXECUTION_PROMPT.format(
step_id=step.id,
goal=goal,
step_description=step.description,
tool=step.tool or "None",
previous_results=prev_str,
tool_instruction=tool_instruction
)
try:
response = self.client.chat(
model=self.model,
messages=[{"role": "user", "content": prompt}],
options={"temperature": 0.1},
format="json"
)
data = json.loads(response["message"]["content"])
# Execute tool if needed
output = data.get("output", "")
if step.tool and data.get("tool_call"):
tool_call = data["tool_call"]
tool = self.tools.get(tool_call["name"])
if tool:
tool_result = tool.call(**tool_call.get("arguments", {}))
output = f"{output}\nTool result: {tool_result}"
return StepResult(
step_id=step.id,
success=True,
output=output
)
except Exception as e:
return StepResult(
step_id=step.id,
success=False,
output=f"Error: {str(e)}",
needs_replanning=True
)
if __name__ == "__main__":
agent = PlanningAgent()
task = "Calculate the average of the squares of numbers 1 through 5"
print(f"Task: {task}")
print("="*60)
result = agent.run(task)
print("\n" + "="*60)
print(f"Final Result: {result}")Understanding the Planning Agent Architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PLANNING AGENT EXECUTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Task: "Calculate average of squares from 1 to 5" │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ PLANNING PHASE │ │
│ │ │ │
│ │ Input: Task + Available Tools │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ Plan: │ │ │
│ │ │ Step 1: Square each number 1-5 [run_python] │ │ │
│ │ │ Step 2: Sum the squares [calculator] │ │ │
│ │ │ Step 3: Divide by 5 for average [calculator] │ │ │
│ │ │ Step 4: Format final answer [none] │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ EXECUTION PHASE │ │
│ │ │ │
│ │ Step 1 ──► run_python: "print([x**2 for x in range(1,6)])" │ │
│ │ ◄── Result: "[1, 4, 9, 16, 25]" │ │
│ │ │ │
│ │ Step 2 ──► calculator: "1 + 4 + 9 + 16 + 25" │ │
│ │ ◄── Result: "55" │ │
│ │ │ │
│ │ Step 3 ──► calculator: "55 / 5" │ │
│ │ ◄── Result: "11.0" │ │
│ │ │ │
│ │ Step 4 ──► (no tool) Format: "The average is 11.0" │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Final Result: "The average of the squares │
│ of 1-5 is 11.0" │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Step Dependencies Explained:
┌─────────────────────────────────────────────┐
│ Step Dependencies Enable Parallelism │
│ │
│ Step 1: Get stock price depends_on: [] │
│ Step 2: Get exchange rate depends_on: [] │
│ Step 3: Convert currency depends_on: [1,2]│
│ │
│ Steps 1 and 2 can run in parallel! │
│ Step 3 waits for both to complete │
└─────────────────────────────────────────────┘
Step 1 ─────┐
├──► Step 3
Step 2 ─────┘Planning vs ReAct Trade-offs:
| Aspect | Planning Agent | ReAct Agent |
|---|---|---|
| Upfront cost | Higher (creates full plan) | Lower (step-by-step) |
| Parallelism | Possible with dependencies | Sequential only |
| Failure recovery | needs_replanning flag | Natural loop continuation |
| Token usage | More efficient for complex tasks | More efficient for simple tasks |
| Explainability | Full plan visible upfront | Reasoning visible per step |
Part 6: Agent FastAPI Server
Deploy agents as a REST API.
# server/agent_server.py
"""
FastAPI server for SLM agents.
"""
import time
import asyncio
from typing import Optional, Literal
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
import ollama
from tools.base import ToolRegistry, create_default_registry
from agents.react_agent import ReActAgent
from agents.function_calling_agent import FunctionCallingAgent
from agents.planning_agent import PlanningAgent
# Request/Response models
class AgentRequest(BaseModel):
query: str = Field(..., description="The user's query")
agent_type: Literal["react", "function_calling", "planning"] = Field(
default="react",
description="Type of agent to use"
)
model: str = Field(default="qwen2.5:3b-instruct", description="Model to use")
max_steps: int = Field(default=10, le=20, description="Maximum agent steps")
class AgentResponse(BaseModel):
answer: str
steps_taken: int
execution_time_s: float
agent_type: str
model: str
class AgentStep(BaseModel):
step_num: int
thought: str
action: str
observation: Optional[str]
class StreamingAgentResponse(BaseModel):
type: Literal["step", "answer"]
data: dict
# Async agent wrapper
class AsyncReActAgent:
"""Async wrapper for ReActAgent."""
def __init__(self, model: str = "qwen2.5:3b-instruct", max_steps: int = 10):
self.model = model
self.max_steps = max_steps
self.tools = create_default_registry()
async def run(self, query: str):
"""Run agent asynchronously."""
# Use sync agent in thread pool
agent = ReActAgent(
model=self.model,
tools=self.tools,
max_steps=self.max_steps,
verbose=False
)
loop = asyncio.get_event_loop()
steps = []
final_answer = None
def run_agent():
nonlocal final_answer
for step in agent.run(query):
steps.append(step)
if step.is_final:
final_answer = step.action_input.get("response", str(step.action_input))
return final_answer
await loop.run_in_executor(None, run_agent)
return steps, final_answer
async def run_stream(self, query: str):
"""Stream agent steps."""
agent = ReActAgent(
model=self.model,
tools=self.tools,
max_steps=self.max_steps,
verbose=False
)
loop = asyncio.get_event_loop()
def get_steps():
results = []
for step in agent.run(query):
results.append(step)
return results
steps = await loop.run_in_executor(None, get_steps)
for step in steps:
yield step
# Global state
tools: Optional[ToolRegistry] = None
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Initialize tools on startup."""
global tools
tools = create_default_registry()
print("Agent server initialized with tools:", tools.list_tools())
yield
tools = None
app = FastAPI(
title="SLM Agent Server",
description="Local AI agents powered by small language models",
version="1.0.0",
lifespan=lifespan
)
@app.get("/health")
async def health_check():
"""Health check endpoint."""
return {
"status": "healthy",
"tools_available": tools.list_tools() if tools else []
}
@app.get("/tools")
async def list_tools():
"""List available tools."""
if not tools:
raise HTTPException(status_code=503, detail="Tools not initialized")
return {
"tools": [
{
"name": name,
"schema": tools.get(name).get_schema()
}
for name in tools.list_tools()
]
}
@app.post("/run", response_model=AgentResponse)
async def run_agent(request: AgentRequest):
"""Run an agent on a query."""
start_time = time.time()
try:
if request.agent_type == "react":
agent = AsyncReActAgent(
model=request.model,
max_steps=request.max_steps
)
steps, answer = await agent.run(request.query)
steps_taken = len(steps)
elif request.agent_type == "function_calling":
agent = FunctionCallingAgent(
model=request.model,
tools=tools,
max_iterations=request.max_steps
)
loop = asyncio.get_event_loop()
answer = await loop.run_in_executor(None, agent.run, request.query)
steps_taken = request.max_steps # Approximate
elif request.agent_type == "planning":
agent = PlanningAgent(
model=request.model,
tools=tools
)
loop = asyncio.get_event_loop()
answer = await loop.run_in_executor(None, agent.run, request.query)
steps_taken = request.max_steps # Approximate
else:
raise HTTPException(status_code=400, detail=f"Unknown agent type: {request.agent_type}")
return AgentResponse(
answer=answer or "No answer generated",
steps_taken=steps_taken,
execution_time_s=time.time() - start_time,
agent_type=request.agent_type,
model=request.model
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/run/stream")
async def run_agent_stream(request: AgentRequest):
"""Run agent with streaming output."""
if request.agent_type != "react":
raise HTTPException(
status_code=400,
detail="Streaming only supported for 'react' agent"
)
agent = AsyncReActAgent(
model=request.model,
max_steps=request.max_steps
)
async def generate():
async for step in agent.run_stream(request.query):
data = {
"type": "step" if not step.is_final else "answer",
"data": {
"step_num": step.step_num,
"thought": step.thought,
"action": step.action,
"observation": step.observation
}
}
if step.is_final:
data["data"]["answer"] = step.action_input.get("response", "")
yield f"data: {data}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream"
)
@app.post("/tools/{tool_name}")
async def call_tool(tool_name: str, arguments: dict):
"""Directly call a tool."""
tool = tools.get(tool_name)
if not tool:
raise HTTPException(status_code=404, detail=f"Tool not found: {tool_name}")
try:
result = tool.call(**arguments)
return {"tool": tool_name, "result": result}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)Docker Configuration
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Expose port
EXPOSE 8000
# Start script
COPY start.sh /start.sh
RUN chmod +x /start.sh
CMD ["/start.sh"]#!/bin/bash
# start.sh
# Start Ollama in background
ollama serve &
# Wait for Ollama to be ready
sleep 5
# Pull the model
ollama pull qwen2.5:3b-instruct
# Start the server
python server/agent_server.py# docker-compose.yml
version: '3.8'
services:
slm-agent:
build: .
ports:
- "8000:8000"
volumes:
- ollama-data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
deploy:
resources:
limits:
memory: 8G
volumes:
ollama-data:Exercises
Exercise 1: Custom Tool Creation
Create a custom tool that:
- Fetches weather data (mock or real API)
- Has proper parameter validation
- Handles errors gracefully
- Test it with the ReAct agent
Exercise 2: Conversation Memory
Extend the agents to:
- Maintain conversation history across queries
- Reference previous answers
- Handle follow-up questions
Exercise 3: Tool Chain Optimization
Implement a system that:
- Detects when tools can be called in parallel
- Caches repeated tool calls
- Measures and optimizes tool execution time
Exercise 4: Agent Evaluation
Build an evaluation framework that:
- Tests agents on a benchmark dataset
- Measures success rate, steps taken, and time
- Compares different agent types and models
Summary
You've learned to build intelligent agents with SLMs:
- Structured Output: Reliable JSON generation with Pydantic schemas
- Tool Library: Flexible, extensible tool system
- ReAct Agent: Reasoning and acting in a loop
- Function Calling: Clean structured approach to tool use
- Planning Agent: Multi-step task decomposition
Key insights:
- Lower temperature (0.1-0.3) improves parsing reliability
- Clear, explicit prompts are crucial for SLM agents
- Retry logic handles occasional parsing failures
- Limit tools to reduce complexity and errors
- Start with simple tasks and gradually increase complexity
Key Concepts Recap
| Concept | What It Is | Why It Matters |
|---|---|---|
| Structured Output | JSON from LLM matching Pydantic schema | Reliable tool calls and parsing |
| ReAct Pattern | Thought→Action→Observation loop | Enables multi-step reasoning |
| Function Calling | LLM selects and parameterizes tools | Cleaner than free-form tool use |
| Tool Registry | Collection of callable tools with schemas | Easy tool management and discovery |
| Low Temperature | 0.1-0.3 for deterministic output | More consistent JSON parsing |
| JSON Format Mode | Force model to output valid JSON | Reduces parsing failures |
| Retry Logic | Re-attempt on parsing failure | Handles occasional errors gracefully |
| Tool Schema | JSON description of tool parameters | LLM knows how to call the tool |
| Planning Agent | Decompose task before execution | Handles complex multi-step tasks |
| Action Input | Arguments passed to selected tool | Must validate against tool schema |
Next Steps
- Training SLM from Scratch - Build your own small model
- Production SLM System - Deploy agents at scale