SLM Agents

TL;DR

Build agents with structured output (Pydantic schemas + low temperature 0.1-0.3), implement ReAct (Thought→Action→Observation loop), or use function calling with JSON format. Key: explicit prompts, limited tool sets, and retry logic for parsing. Qwen2.5 excels at tool calling.

Build intelligent, tool-using agents powered by small language models for local, privacy-preserving automation.

Project Overview

Aspect	Details
Difficulty	Intermediate
Time	6-8 hours
Prerequisites	Python, SLM basics, prompt engineering
Learning Outcomes	Tool calling, structured output, ReAct pattern, agent loops

What You'll Learn

Implement function calling with local SLMs
Generate structured outputs using Pydantic
Build ReAct (Reasoning + Acting) agents
Create multi-step reasoning pipelines
Design tool libraries for SLM agents
Handle errors and edge cases gracefully

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                          SLM Agent Architecture                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌───────────────┐                                                         │
│   │  User Query   │                                                         │
│   └───────┬───────┘                                                         │
│           │                                                                 │
│           ▼                                                                 │
│   ┌───────────────────────────────────────────────────────────────────┐     │
│   │                         SLM AGENT                                 │     │
│   │  ┌─────────────┐    ┌──────────────┐    ┌──────────────────┐     │     │
│   │  │  Local SLM  │───►│Output Parser │───►│  Tool Executor   │     │     │
│   │  └──────▲──────┘    └──────┬───────┘    └────────┬─────────┘     │     │
│   │         │                  │                     │               │     │
│   │         │                  │ Final               │               │     │
│   │         │                  │ Answer              ▼               │     │
│   │  ┌──────┴──────┐           │            ┌──────────────────┐     │     │
│   │  │Conversation │◄──────────┼────────────│   Tool Library   │     │     │
│   │  │   Memory    │           │            ├──────────────────┤     │     │
│   │  └─────────────┘           │            │ • Calculator     │     │     │
│   │                            │            │ • Web Search     │     │     │
│   │                            │            │ • Code Runner    │     │     │
│   │                            │            │ • File Ops       │     │     │
│   │                            │            │ • API Calls      │     │     │
│   │                            │            └──────────────────┘     │     │
│   └────────────────────────────┼─────────────────────────────────────┘     │
│                                ▼                                            │
│                        ┌───────────────┐                                    │
│                        │   Response    │                                    │
│                        └───────────────┘                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

ReAct Loop: Thought ──► Action ──► Observation ──► (repeat until answer)

Project Setup

Dependencies

# Create project directory
mkdir slm-agents && cd slm-agents

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install ollama llama-cpp-python
pip install pydantic instructor
pip install fastapi uvicorn
pip install httpx aiohttp
pip install rich  # For pretty console output

Model Setup

# Pull recommended models for agent tasks
# Qwen2.5 has excellent function calling capabilities
ollama pull qwen2.5:3b-instruct

# Phi-3 also works well for structured tasks
ollama pull phi3:mini

# For very constrained environments
ollama pull qwen2.5:0.5b-instruct

Part 1: Structured Output Generation

The foundation of SLM agents is reliable structured output.

# core/structured_output.py
"""
Structured output generation using Pydantic and local SLMs.
"""

import json
import re
from typing import TypeVar, Type, Optional, Any
from pydantic import BaseModel, Field, ValidationError
import ollama


T = TypeVar('T', bound=BaseModel)


class StructuredOutputGenerator:
    """
    Generate structured outputs from SLMs using Pydantic schemas.
    """

    def __init__(self, model: str = "qwen2.5:3b-instruct"):
        self.model = model
        self.client = ollama.Client()

    def generate(
        self,
        prompt: str,
        output_schema: Type[T],
        max_retries: int = 3,
        temperature: float = 0.1
    ) -> Optional[T]:
        """
        Generate structured output matching the Pydantic schema.

        Args:
            prompt: The user prompt
            output_schema: Pydantic model class
            max_retries: Number of retry attempts
            temperature: Sampling temperature (lower = more deterministic)

        Returns:
            Parsed Pydantic model instance or None
        """
        # Generate JSON schema from Pydantic model
        schema = output_schema.model_json_schema()
        schema_str = json.dumps(schema, indent=2)

        # Build the system prompt
        system_prompt = f"""You are a helpful assistant that outputs JSON.
You must respond with valid JSON that matches this schema:

{schema_str}

Important:
- Output ONLY valid JSON, no other text
- Follow the schema exactly
- Use null for optional fields you can't fill
- Ensure all required fields are present"""

        for attempt in range(max_retries):
            try:
                response = self.client.chat(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": prompt}
                    ],
                    options={"temperature": temperature}
                )

                content = response["message"]["content"]

                # Extract JSON from response
                json_str = self._extract_json(content)

                # Parse and validate
                data = json.loads(json_str)
                return output_schema.model_validate(data)

            except (json.JSONDecodeError, ValidationError) as e:
                if attempt < max_retries - 1:
                    print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
                    temperature += 0.1  # Slightly increase temperature
                else:
                    print(f"All attempts failed: {e}")
                    return None

        return None

    def _extract_json(self, text: str) -> str:
        """Extract JSON from model response."""
        # Try to find JSON in code blocks
        code_block_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', text)
        if code_block_match:
            return code_block_match.group(1)

        # Try to find raw JSON object
        json_match = re.search(r'\{[\s\S]*\}', text)
        if json_match:
            return json_match.group(0)

        # Return as-is if nothing found
        return text.strip()


# Example schemas for common agent tasks
class ToolCall(BaseModel):
    """Represents a tool call decision."""
    tool_name: str = Field(description="Name of the tool to call")
    arguments: dict = Field(description="Arguments to pass to the tool")
    reasoning: str = Field(description="Why this tool is being called")


class ThoughtAction(BaseModel):
    """ReAct-style thought and action."""
    thought: str = Field(description="Reasoning about what to do next")
    action: str = Field(description="The action to take: 'tool' or 'answer'")
    tool_name: Optional[str] = Field(default=None, description="Tool to call if action is 'tool'")
    tool_args: Optional[dict] = Field(default=None, description="Arguments for the tool")
    final_answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")


class TaskDecomposition(BaseModel):
    """Break down a complex task into steps."""
    task: str = Field(description="The original task")
    steps: list[str] = Field(description="Ordered list of steps to complete the task")
    complexity: str = Field(description="Complexity level: simple, medium, complex")


# Example usage
if __name__ == "__main__":
    generator = StructuredOutputGenerator()

    # Test task decomposition
    result = generator.generate(
        "How do I make a cup of coffee?",
        TaskDecomposition
    )

    if result:
        print(f"Task: {result.task}")
        print(f"Complexity: {result.complexity}")
        print("Steps:")
        for i, step in enumerate(result.steps, 1):
            print(f"  {i}. {step}")

★ Insight ───────────────────────────────────── JSON Schema Prompting: By including the Pydantic JSON schema directly in the prompt, SLMs can reliably generate structured output. The key is using low temperature (0.1-0.3) for consistency and implementing retry logic for edge cases. Models like Qwen2.5 are particularly good at this pattern. ─────────────────────────────────────────────────

Part 2: Tool Library Design

Create a flexible tool system for SLM agents.

# tools/base.py
"""
Tool infrastructure for SLM agents.
"""

from abc import ABC, abstractmethod
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from pydantic import BaseModel, Field
import json


@dataclass
class ToolParameter:
    """Describes a tool parameter."""
    name: str
    type: str
    description: str
    required: bool = True
    default: Any = None


@dataclass
class Tool:
    """A callable tool for the agent."""
    name: str
    description: str
    parameters: list[ToolParameter]
    func: Callable
    category: str = "general"

    def get_schema(self) -> dict:
        """Get JSON schema for this tool."""
        properties = {}
        required = []

        for param in self.parameters:
            properties[param.name] = {
                "type": param.type,
                "description": param.description
            }
            if param.required:
                required.append(param.name)

        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": required
            }
        }

    def call(self, **kwargs) -> Any:
        """Execute the tool with given arguments."""
        return self.func(**kwargs)


class ToolRegistry:
    """Registry of available tools."""

    def __init__(self):
        self.tools: dict[str, Tool] = {}

    def register(self, tool: Tool):
        """Register a tool."""
        self.tools[tool.name] = tool

    def get(self, name: str) -> Optional[Tool]:
        """Get a tool by name."""
        return self.tools.get(name)

    def list_tools(self) -> list[str]:
        """List all registered tool names."""
        return list(self.tools.keys())

    def get_tools_prompt(self) -> str:
        """Generate a prompt describing all tools."""
        tool_descriptions = []

        for name, tool in self.tools.items():
            params_desc = ", ".join(
                f"{p.name}: {p.type}" for p in tool.parameters
            )
            tool_descriptions.append(
                f"- {name}({params_desc}): {tool.description}"
            )

        return "\n".join(tool_descriptions)

    def get_tools_schema(self) -> list[dict]:
        """Get JSON schemas for all tools."""
        return [tool.get_schema() for tool in self.tools.values()]


# Built-in tools
def create_calculator_tool() -> Tool:
    """Create a calculator tool."""

    def calculate(expression: str) -> str:
        """Safely evaluate a mathematical expression."""
        try:
            # Only allow safe operations
            allowed_chars = set("0123456789+-*/().^ ")
            if not all(c in allowed_chars for c in expression):
                return "Error: Invalid characters in expression"

            # Replace ^ with ** for exponentiation
            expression = expression.replace("^", "**")

            result = eval(expression, {"__builtins__": {}}, {})
            return str(result)
        except Exception as e:
            return f"Error: {str(e)}"

    return Tool(
        name="calculator",
        description="Evaluate mathematical expressions. Supports +, -, *, /, ^, and parentheses.",
        parameters=[
            ToolParameter(
                name="expression",
                type="string",
                description="The mathematical expression to evaluate"
            )
        ],
        func=calculate,
        category="math"
    )


def create_datetime_tool() -> Tool:
    """Create a datetime tool."""
    from datetime import datetime

    def get_datetime(format: str = "%Y-%m-%d %H:%M:%S") -> str:
        """Get current date and time."""
        return datetime.now().strftime(format)

    return Tool(
        name="get_datetime",
        description="Get the current date and time.",
        parameters=[
            ToolParameter(
                name="format",
                type="string",
                description="Date format string (default: %Y-%m-%d %H:%M:%S)",
                required=False,
                default="%Y-%m-%d %H:%M:%S"
            )
        ],
        func=get_datetime,
        category="utility"
    )


def create_web_search_tool() -> Tool:
    """Create a web search tool (mock for demo)."""

    def web_search(query: str, num_results: int = 3) -> str:
        """Search the web for information."""
        # In production, integrate with a real search API
        # This is a mock for demonstration
        return json.dumps({
            "query": query,
            "results": [
                {"title": f"Result {i+1} for: {query}", "snippet": f"Information about {query}..."}
                for i in range(num_results)
            ],
            "note": "This is mock data. Integrate with a real search API for production."
        })

    return Tool(
        name="web_search",
        description="Search the web for information on a topic.",
        parameters=[
            ToolParameter(
                name="query",
                type="string",
                description="The search query"
            ),
            ToolParameter(
                name="num_results",
                type="integer",
                description="Number of results to return",
                required=False,
                default=3
            )
        ],
        func=web_search,
        category="search"
    )


def create_code_runner_tool() -> Tool:
    """Create a Python code execution tool."""
    import sys
    from io import StringIO

    def run_python(code: str) -> str:
        """Execute Python code and return output."""
        # Capture stdout
        old_stdout = sys.stdout
        sys.stdout = captured_output = StringIO()

        try:
            # Only allow safe builtins
            safe_builtins = {
                "print": print,
                "len": len,
                "range": range,
                "int": int,
                "float": float,
                "str": str,
                "list": list,
                "dict": dict,
                "sum": sum,
                "min": min,
                "max": max,
                "sorted": sorted,
                "enumerate": enumerate,
                "zip": zip,
            }

            exec(code, {"__builtins__": safe_builtins}, {})
            output = captured_output.getvalue()
            return output if output else "Code executed successfully (no output)"

        except Exception as e:
            return f"Error: {str(e)}"

        finally:
            sys.stdout = old_stdout

    return Tool(
        name="run_python",
        description="Execute Python code and return the output. Limited to safe operations.",
        parameters=[
            ToolParameter(
                name="code",
                type="string",
                description="Python code to execute"
            )
        ],
        func=run_python,
        category="code"
    )


def create_default_registry() -> ToolRegistry:
    """Create a registry with default tools."""
    registry = ToolRegistry()
    registry.register(create_calculator_tool())
    registry.register(create_datetime_tool())
    registry.register(create_web_search_tool())
    registry.register(create_code_runner_tool())
    return registry


if __name__ == "__main__":
    # Test tools
    registry = create_default_registry()

    print("Available tools:")
    print(registry.get_tools_prompt())

    print("\nTesting calculator:")
    calc = registry.get("calculator")
    print(calc.call(expression="2 + 3 * 4"))

    print("\nTesting datetime:")
    dt = registry.get("get_datetime")
    print(dt.call())

    print("\nTesting code runner:")
    code = registry.get("run_python")
    print(code.call(code="print('Hello from SLM agent!')"))

Understanding the Tool Library Design:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        TOOL REGISTRY ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                        ToolRegistry                                 │   │
│   │  tools: dict[str, Tool]                                            │   │
│   │                                                                     │   │
│   │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │   │
│   │  │ "calculator"│ │"get_datetime"│ │ "web_search"│ │ "run_python"│   │   │
│   │  │  Tool(...)  │ │  Tool(...)   │ │  Tool(...)  │ │  Tool(...)  │   │   │
│   │  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                   │                                         │
│                                   ▼                                         │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  get_tools_prompt()  ────►  Human-readable tool list for prompts    │   │
│   │  get_tools_schema()  ────►  JSON schemas for function calling       │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why This Design Matters for SLMs:

Design Choice	Benefit for SLMs
Simple schema format	Easier for small models to parse
Category field	Can filter tools by context (math vs search)
get_tools_prompt()	Generates concise text-based tool list
Safe builtins only	Sandboxed execution prevents security issues
Explicit parameters	SLMs need clear parameter descriptions

Tool Safety Pattern:

┌────────────────────────────────────────────┐
│ Calculator: Safe eval()                    │
│                                            │
│ allowed_chars = "0123456789+-*/().^ "      │
│                                            │
│ ✓ "2 + 3 * 4"       → 14                  │
│ ✓ "(10 + 5) ^ 2"    → 225                 │
│ ✗ "__import__('os')" → Error: Invalid     │
│ ✗ "open('file')"     → Error: Invalid     │
└────────────────────────────────────────────┘

Part 3: ReAct Agent Implementation

Build a ReAct (Reasoning + Acting) agent.

# agents/react_agent.py
"""
ReAct agent implementation using local SLMs.
"""

import json
import re
from typing import Optional, Generator
from dataclasses import dataclass
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry


@dataclass
class AgentStep:
    """A single step in the agent's execution."""
    step_num: int
    thought: str
    action: str
    action_input: Optional[dict]
    observation: Optional[str]
    is_final: bool = False


class ReActAgent:
    """
    ReAct agent that reasons and acts iteratively.

    Uses the Thought -> Action -> Observation loop.
    """

    SYSTEM_PROMPT = """You are a helpful AI assistant that can use tools to answer questions.

Available tools:
{tools}

To use a tool, respond in this EXACT format:
Thought: [Your reasoning about what to do]
Action: [tool_name]
Action Input: {{"param1": "value1", "param2": "value2"}}

When you have enough information to answer, respond:
Thought: [Your final reasoning]
Action: answer
Action Input: {{"response": "Your final answer here"}}

Important rules:
1. Always start with a Thought
2. Use exactly one Action per response
3. Action Input must be valid JSON
4. Only use the tools listed above
5. When you have the answer, use action "answer"

Previous conversation:
{history}

Now respond to the user's query."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None,
        max_steps: int = 10,
        verbose: bool = True
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.max_steps = max_steps
        self.verbose = verbose
        self.client = ollama.Client()

    def run(self, query: str) -> Generator[AgentStep, None, str]:
        """
        Run the agent on a query, yielding steps.

        Args:
            query: User query

        Yields:
            AgentStep objects for each step

        Returns:
            Final answer string
        """
        history = []
        step_num = 0

        while step_num < self.max_steps:
            step_num += 1

            # Build prompt
            system = self.SYSTEM_PROMPT.format(
                tools=self.tools.get_tools_prompt(),
                history=self._format_history(history)
            )

            # Get response from model
            response = self.client.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": query}
                ],
                options={"temperature": 0.1}
            )

            content = response["message"]["content"]

            # Parse the response
            thought, action, action_input = self._parse_response(content)

            if self.verbose:
                print(f"\n--- Step {step_num} ---")
                print(f"Thought: {thought}")
                print(f"Action: {action}")
                print(f"Action Input: {action_input}")

            # Check if this is the final answer
            if action.lower() == "answer":
                final_answer = action_input.get("response", str(action_input))
                step = AgentStep(
                    step_num=step_num,
                    thought=thought,
                    action=action,
                    action_input=action_input,
                    observation=None,
                    is_final=True
                )
                yield step
                return final_answer

            # Execute the tool
            observation = self._execute_tool(action, action_input)

            if self.verbose:
                print(f"Observation: {observation}")

            # Create step
            step = AgentStep(
                step_num=step_num,
                thought=thought,
                action=action,
                action_input=action_input,
                observation=observation
            )
            yield step

            # Add to history
            history.append({
                "thought": thought,
                "action": action,
                "action_input": action_input,
                "observation": observation
            })

        # Max steps reached
        return "I was unable to find a complete answer within the step limit."

    def _parse_response(self, content: str) -> tuple[str, str, dict]:
        """Parse thought, action, and action input from response."""
        thought = ""
        action = ""
        action_input = {}

        # Extract thought
        thought_match = re.search(r'Thought:\s*(.+?)(?=Action:|$)', content, re.DOTALL)
        if thought_match:
            thought = thought_match.group(1).strip()

        # Extract action
        action_match = re.search(r'Action:\s*(\w+)', content)
        if action_match:
            action = action_match.group(1).strip()

        # Extract action input
        input_match = re.search(r'Action Input:\s*(\{.*?\})', content, re.DOTALL)
        if input_match:
            try:
                action_input = json.loads(input_match.group(1))
            except json.JSONDecodeError:
                # Try to fix common JSON issues
                json_str = input_match.group(1)
                json_str = re.sub(r"'", '"', json_str)  # Replace single quotes
                try:
                    action_input = json.loads(json_str)
                except json.JSONDecodeError:
                    pass

        return thought, action, action_input

    def _execute_tool(self, action: str, action_input: dict) -> str:
        """Execute a tool and return the observation."""
        tool = self.tools.get(action)

        if tool is None:
            return f"Error: Unknown tool '{action}'. Available tools: {self.tools.list_tools()}"

        try:
            result = tool.call(**action_input)
            return str(result)
        except Exception as e:
            return f"Error executing tool: {str(e)}"

    def _format_history(self, history: list[dict]) -> str:
        """Format conversation history."""
        if not history:
            return "No previous steps."

        formatted = []
        for i, step in enumerate(history, 1):
            formatted.append(f"""Step {i}:
Thought: {step['thought']}
Action: {step['action']}
Action Input: {json.dumps(step['action_input'])}
Observation: {step['observation']}""")

        return "\n\n".join(formatted)


# Example usage
if __name__ == "__main__":
    from rich.console import Console
    from rich.panel import Panel
    from rich.markdown import Markdown

    console = Console()

    agent = ReActAgent(verbose=False)

    queries = [
        "What is 25 * 48 + 100?",
        "What is the current date and time?",
        "Calculate the sum of squares from 1 to 5 using Python code."
    ]

    for query in queries:
        console.print(Panel(f"[bold blue]Query:[/bold blue] {query}"))

        final_answer = None
        for step in agent.run(query):
            console.print(f"\n[yellow]Step {step.step_num}[/yellow]")
            console.print(f"[dim]Thought:[/dim] {step.thought}")
            console.print(f"[dim]Action:[/dim] {step.action}")

            if step.observation:
                console.print(f"[dim]Observation:[/dim] {step.observation}")

            if step.is_final:
                final_answer = step.action_input.get("response", str(step.action_input))

        if final_answer:
            console.print(Panel(f"[bold green]Answer:[/bold green] {final_answer}"))

        console.print("\n" + "="*50 + "\n")

★ Insight ───────────────────────────────────── ReAct Pattern with SLMs: The Thought-Action-Observation loop works well with SLMs when prompts are explicit about the expected format. Key tricks: (1) use low temperature for consistent parsing, (2) provide clear examples in the system prompt, (3) validate JSON with retry logic, (4) limit the tool set to reduce confusion. ─────────────────────────────────────────────────

Part 4: Function Calling Agent

A cleaner approach using function calling style.

# agents/function_calling_agent.py
"""
Function calling agent using structured output.
"""

import json
from typing import Optional, Any
from pydantic import BaseModel, Field
from enum import Enum
import ollama

from tools.base import ToolRegistry, create_default_registry


class ActionType(str, Enum):
    TOOL = "tool"
    ANSWER = "answer"


class FunctionCall(BaseModel):
    """A function call decision."""
    reasoning: str = Field(description="Step-by-step reasoning for this decision")
    action: ActionType = Field(description="Whether to call a tool or provide final answer")
    function_name: Optional[str] = Field(default=None, description="Name of function to call")
    arguments: Optional[dict] = Field(default=None, description="Arguments for the function")
    answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")


class FunctionCallingAgent:
    """
    Agent that uses structured function calling.
    """

    SYSTEM_PROMPT = """You are a helpful AI assistant that answers questions using available tools.

Available functions:
{functions}

For each user query, you must:
1. Reason step-by-step about how to answer
2. Decide whether to call a function or provide a final answer
3. If calling a function, specify the function name and arguments
4. If answering, provide a complete answer based on gathered information

Respond with a JSON object containing:
- reasoning: Your step-by-step thought process
- action: Either "tool" or "answer"
- function_name: Name of function to call (if action is "tool")
- arguments: Function arguments as object (if action is "tool")
- answer: Your final answer (if action is "answer")

Previous function calls and results:
{history}

Respond ONLY with valid JSON."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None,
        max_iterations: int = 10
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.max_iterations = max_iterations
        self.client = ollama.Client()

    def run(self, query: str) -> str:
        """
        Run the agent on a query.

        Args:
            query: User query

        Returns:
            Final answer string
        """
        history = []
        iteration = 0

        while iteration < self.max_iterations:
            iteration += 1

            # Build prompt
            system = self.SYSTEM_PROMPT.format(
                functions=self._format_functions(),
                history=self._format_history(history)
            )

            # Get response
            response = self.client.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": query}
                ],
                options={"temperature": 0.1},
                format="json"
            )

            content = response["message"]["content"]

            # Parse response
            try:
                data = json.loads(content)
                call = FunctionCall.model_validate(data)
            except Exception as e:
                print(f"Parse error: {e}")
                continue

            print(f"\n[Iteration {iteration}]")
            print(f"Reasoning: {call.reasoning}")
            print(f"Action: {call.action}")

            # Check for final answer
            if call.action == ActionType.ANSWER:
                return call.answer or "No answer provided"

            # Execute function
            if call.function_name and call.arguments is not None:
                result = self._execute_function(call.function_name, call.arguments)
                print(f"Function: {call.function_name}")
                print(f"Arguments: {call.arguments}")
                print(f"Result: {result}")

                history.append({
                    "function": call.function_name,
                    "arguments": call.arguments,
                    "result": result
                })

        return "Unable to complete the task within iteration limit."

    def _format_functions(self) -> str:
        """Format available functions."""
        schemas = self.tools.get_tools_schema()
        return json.dumps(schemas, indent=2)

    def _format_history(self, history: list[dict]) -> str:
        """Format function call history."""
        if not history:
            return "No previous function calls."

        formatted = []
        for i, h in enumerate(history, 1):
            formatted.append(
                f"{i}. {h['function']}({json.dumps(h['arguments'])}) -> {h['result']}"
            )
        return "\n".join(formatted)

    def _execute_function(self, name: str, arguments: dict) -> str:
        """Execute a function."""
        tool = self.tools.get(name)
        if not tool:
            return f"Error: Unknown function '{name}'"

        try:
            result = tool.call(**arguments)
            return str(result)
        except Exception as e:
            return f"Error: {str(e)}"


if __name__ == "__main__":
    agent = FunctionCallingAgent()

    # Test queries
    queries = [
        "What is 15% of 850?",
        "Write Python code to find the factorial of 6 and run it.",
        "What time is it right now?"
    ]

    for query in queries:
        print(f"\n{'='*60}")
        print(f"Query: {query}")
        print("="*60)
        answer = agent.run(query)
        print(f"\nFinal Answer: {answer}")

Understanding Function Calling vs ReAct:

┌─────────────────────────────────────────────────────────────────────────────┐
│                   ReAct vs Function Calling Comparison                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ReAct (Text-Based):              Function Calling (JSON-Based):          │
│   ┌─────────────────────┐          ┌─────────────────────┐                 │
│   │ Thought: I need to  │          │ {                   │                 │
│   │ calculate 15% of 850│          │   "reasoning": "...",│                 │
│   │                     │          │   "action": "tool", │                 │
│   │ Action: calculator  │          │   "function_name":  │                 │
│   │ Action Input:       │          │     "calculator",   │                 │
│   │ {"expression":      │          │   "arguments": {    │                 │
│   │   "850 * 0.15"}     │          │     "expression":   │                 │
│   └─────────────────────┘          │     "850 * 0.15"    │                 │
│                                    │   }                 │                 │
│   • Regex parsing required         │ }                   │                 │
│   • More error-prone              │                      │                 │
│   • Better for explanation        │ • Native JSON parsing│                 │
│                                   │ • More structured    │                 │
│                                   │ • Pydantic validation│                 │
│                                    └─────────────────────┘                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Differences:

Aspect	ReAct	Function Calling
Parsing	Regex-based extraction	Native JSON parsing
Validation	Manual checking	Pydantic model validation
Error Handling	Multiple fallback attempts	Single parse with clear errors
Format Mode	Standard text	`format="json"` in Ollama
Model Support	Any model	Works best with Qwen2.5
Verbosity	More text in output	Compact JSON

Why ActionType Enum?

class ActionType(str, Enum):
    TOOL = "tool"
    ANSWER = "answer"

Using an enum instead of raw strings:

Pydantic validates the value automatically
IDE autocomplete for action types
Type-safe comparisons in code

Part 5: Multi-Step Planning Agent

Handle complex tasks with planning.

# agents/planning_agent.py
"""
Planning agent that breaks down complex tasks.
"""

import json
from typing import Optional
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry


class Step(BaseModel):
    """A single step in a plan."""
    id: int = Field(description="Step number")
    description: str = Field(description="What this step does")
    tool: Optional[str] = Field(default=None, description="Tool to use, if any")
    depends_on: list[int] = Field(default_factory=list, description="Step IDs this depends on")


class Plan(BaseModel):
    """A plan to accomplish a task."""
    goal: str = Field(description="The overall goal")
    steps: list[Step] = Field(description="Ordered steps to achieve the goal")
    reasoning: str = Field(description="Why this plan makes sense")


class StepResult(BaseModel):
    """Result of executing a step."""
    step_id: int
    success: bool
    output: str
    needs_replanning: bool = False


class PlanningAgent:
    """
    Agent that creates and executes plans.
    """

    PLANNING_PROMPT = """You are a planning AI that breaks down tasks into steps.

Available tools:
{tools}

Given a task, create a plan with these properties:
1. Break the task into small, executable steps
2. Each step should use at most one tool
3. Steps can depend on previous steps
4. Include a final step that synthesizes the answer

Respond with a JSON object:
{{
  "goal": "The overall task",
  "steps": [
    {{"id": 1, "description": "What to do", "tool": "tool_name or null", "depends_on": []}},
    ...
  ],
  "reasoning": "Why this plan works"
}}

Task: {task}

Respond ONLY with valid JSON."""

    EXECUTION_PROMPT = """You are executing step {step_id} of a plan.

Goal: {goal}
Current Step: {step_description}
Tool to use: {tool}
Previous results:
{previous_results}

{tool_instruction}

Respond with a JSON object:
{{
  "reasoning": "Your thought process",
  "tool_call": {{"name": "tool_name", "arguments": {{}}}},  // or null if no tool needed
  "output": "The result or answer for this step"
}}

Respond ONLY with valid JSON."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.client = ollama.Client()

    def run(self, task: str) -> str:
        """
        Plan and execute a task.

        Args:
            task: The task to accomplish

        Returns:
            Final result string
        """
        # Create plan
        print("Creating plan...")
        plan = self._create_plan(task)

        if not plan:
            return "Failed to create a plan for this task."

        print(f"\nPlan for: {plan.goal}")
        print(f"Reasoning: {plan.reasoning}")
        print(f"\nSteps:")
        for step in plan.steps:
            deps = f" (depends on: {step.depends_on})" if step.depends_on else ""
            tool = f" [using {step.tool}]" if step.tool else ""
            print(f"  {step.id}. {step.description}{tool}{deps}")

        # Execute plan
        print("\nExecuting plan...")
        results = {}

        for step in plan.steps:
            # Check dependencies
            dep_results = {
                dep_id: results[dep_id].output
                for dep_id in step.depends_on
                if dep_id in results
            }

            # Execute step
            result = self._execute_step(step, plan.goal, dep_results)
            results[step.id] = result

            status = "✓" if result.success else "✗"
            print(f"  {status} Step {step.id}: {result.output[:100]}...")

            if result.needs_replanning:
                print("  ⚠ Replanning needed (not implemented in this demo)")

        # Return final result
        final_step = plan.steps[-1]
        return results[final_step.id].output

    def _create_plan(self, task: str) -> Optional[Plan]:
        """Create a plan for the task."""
        prompt = self.PLANNING_PROMPT.format(
            tools=self.tools.get_tools_prompt(),
            task=task
        )

        try:
            response = self.client.chat(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                options={"temperature": 0.2},
                format="json"
            )

            data = json.loads(response["message"]["content"])
            return Plan.model_validate(data)

        except Exception as e:
            print(f"Planning error: {e}")
            return None

    def _execute_step(
        self,
        step: Step,
        goal: str,
        previous_results: dict[int, str]
    ) -> StepResult:
        """Execute a single step."""
        # Format previous results
        prev_str = "\n".join(
            f"Step {sid}: {result}"
            for sid, result in previous_results.items()
        ) if previous_results else "No previous results"

        # Tool instruction
        if step.tool:
            tool = self.tools.get(step.tool)
            if tool:
                tool_instruction = f"Use the {step.tool} tool. Schema: {json.dumps(tool.get_schema())}"
            else:
                tool_instruction = f"Tool {step.tool} not found. Proceed without it."
        else:
            tool_instruction = "No tool needed for this step. Just reason and provide the output."

        prompt = self.EXECUTION_PROMPT.format(
            step_id=step.id,
            goal=goal,
            step_description=step.description,
            tool=step.tool or "None",
            previous_results=prev_str,
            tool_instruction=tool_instruction
        )

        try:
            response = self.client.chat(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                options={"temperature": 0.1},
                format="json"
            )

            data = json.loads(response["message"]["content"])

            # Execute tool if needed
            output = data.get("output", "")

            if step.tool and data.get("tool_call"):
                tool_call = data["tool_call"]
                tool = self.tools.get(tool_call["name"])
                if tool:
                    tool_result = tool.call(**tool_call.get("arguments", {}))
                    output = f"{output}\nTool result: {tool_result}"

            return StepResult(
                step_id=step.id,
                success=True,
                output=output
            )

        except Exception as e:
            return StepResult(
                step_id=step.id,
                success=False,
                output=f"Error: {str(e)}",
                needs_replanning=True
            )


if __name__ == "__main__":
    agent = PlanningAgent()

    task = "Calculate the average of the squares of numbers 1 through 5"

    print(f"Task: {task}")
    print("="*60)

    result = agent.run(task)

    print("\n" + "="*60)
    print(f"Final Result: {result}")

Understanding the Planning Agent Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                     PLANNING AGENT EXECUTION                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Task: "Calculate average of squares from 1 to 5"                         │
│                            │                                                │
│                            ▼                                                │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                      PLANNING PHASE                                │   │
│   │                                                                    │   │
│   │   Input: Task + Available Tools                                   │   │
│   │                       │                                            │   │
│   │                       ▼                                            │   │
│   │   ┌────────────────────────────────────────────────────────────┐  │   │
│   │   │ Plan:                                                      │  │   │
│   │   │   Step 1: Square each number 1-5           [run_python]    │  │   │
│   │   │   Step 2: Sum the squares                  [calculator]    │  │   │
│   │   │   Step 3: Divide by 5 for average          [calculator]    │  │   │
│   │   │   Step 4: Format final answer              [none]          │  │   │
│   │   └────────────────────────────────────────────────────────────┘  │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                            │                                                │
│                            ▼                                                │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                     EXECUTION PHASE                                │   │
│   │                                                                    │   │
│   │   Step 1 ──► run_python: "print([x**2 for x in range(1,6)])"     │   │
│   │          ◄── Result: "[1, 4, 9, 16, 25]"                          │   │
│   │                                                                    │   │
│   │   Step 2 ──► calculator: "1 + 4 + 9 + 16 + 25"                   │   │
│   │          ◄── Result: "55"                                         │   │
│   │                                                                    │   │
│   │   Step 3 ──► calculator: "55 / 5"                                │   │
│   │          ◄── Result: "11.0"                                       │   │
│   │                                                                    │   │
│   │   Step 4 ──► (no tool) Format: "The average is 11.0"             │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                            │                                                │
│                            ▼                                                │
│                    Final Result: "The average of the squares               │
│                                   of 1-5 is 11.0"                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Step Dependencies Explained:

┌─────────────────────────────────────────────┐
│ Step Dependencies Enable Parallelism        │
│                                             │
│ Step 1: Get stock price     depends_on: []  │
│ Step 2: Get exchange rate   depends_on: []  │
│ Step 3: Convert currency    depends_on: [1,2]│
│                                             │
│ Steps 1 and 2 can run in parallel!         │
│ Step 3 waits for both to complete          │
└─────────────────────────────────────────────┘

  Step 1 ─────┐
              ├──► Step 3
  Step 2 ─────┘

Planning vs ReAct Trade-offs:

Aspect	Planning Agent	ReAct Agent
Upfront cost	Higher (creates full plan)	Lower (step-by-step)
Parallelism	Possible with dependencies	Sequential only
Failure recovery	`needs_replanning` flag	Natural loop continuation
Token usage	More efficient for complex tasks	More efficient for simple tasks
Explainability	Full plan visible upfront	Reasoning visible per step

Part 6: Agent FastAPI Server

Deploy agents as a REST API.

# server/agent_server.py
"""
FastAPI server for SLM agents.
"""

import time
import asyncio
from typing import Optional, Literal
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry
from agents.react_agent import ReActAgent
from agents.function_calling_agent import FunctionCallingAgent
from agents.planning_agent import PlanningAgent


# Request/Response models
class AgentRequest(BaseModel):
    query: str = Field(..., description="The user's query")
    agent_type: Literal["react", "function_calling", "planning"] = Field(
        default="react",
        description="Type of agent to use"
    )
    model: str = Field(default="qwen2.5:3b-instruct", description="Model to use")
    max_steps: int = Field(default=10, le=20, description="Maximum agent steps")


class AgentResponse(BaseModel):
    answer: str
    steps_taken: int
    execution_time_s: float
    agent_type: str
    model: str


class AgentStep(BaseModel):
    step_num: int
    thought: str
    action: str
    observation: Optional[str]


class StreamingAgentResponse(BaseModel):
    type: Literal["step", "answer"]
    data: dict


# Async agent wrapper
class AsyncReActAgent:
    """Async wrapper for ReActAgent."""

    def __init__(self, model: str = "qwen2.5:3b-instruct", max_steps: int = 10):
        self.model = model
        self.max_steps = max_steps
        self.tools = create_default_registry()

    async def run(self, query: str):
        """Run agent asynchronously."""
        # Use sync agent in thread pool
        agent = ReActAgent(
            model=self.model,
            tools=self.tools,
            max_steps=self.max_steps,
            verbose=False
        )

        loop = asyncio.get_event_loop()
        steps = []
        final_answer = None

        def run_agent():
            nonlocal final_answer
            for step in agent.run(query):
                steps.append(step)
                if step.is_final:
                    final_answer = step.action_input.get("response", str(step.action_input))
            return final_answer

        await loop.run_in_executor(None, run_agent)

        return steps, final_answer

    async def run_stream(self, query: str):
        """Stream agent steps."""
        agent = ReActAgent(
            model=self.model,
            tools=self.tools,
            max_steps=self.max_steps,
            verbose=False
        )

        loop = asyncio.get_event_loop()

        def get_steps():
            results = []
            for step in agent.run(query):
                results.append(step)
            return results

        steps = await loop.run_in_executor(None, get_steps)

        for step in steps:
            yield step


# Global state
tools: Optional[ToolRegistry] = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Initialize tools on startup."""
    global tools
    tools = create_default_registry()
    print("Agent server initialized with tools:", tools.list_tools())
    yield
    tools = None


app = FastAPI(
    title="SLM Agent Server",
    description="Local AI agents powered by small language models",
    version="1.0.0",
    lifespan=lifespan
)


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "tools_available": tools.list_tools() if tools else []
    }


@app.get("/tools")
async def list_tools():
    """List available tools."""
    if not tools:
        raise HTTPException(status_code=503, detail="Tools not initialized")

    return {
        "tools": [
            {
                "name": name,
                "schema": tools.get(name).get_schema()
            }
            for name in tools.list_tools()
        ]
    }


@app.post("/run", response_model=AgentResponse)
async def run_agent(request: AgentRequest):
    """Run an agent on a query."""
    start_time = time.time()

    try:
        if request.agent_type == "react":
            agent = AsyncReActAgent(
                model=request.model,
                max_steps=request.max_steps
            )
            steps, answer = await agent.run(request.query)
            steps_taken = len(steps)

        elif request.agent_type == "function_calling":
            agent = FunctionCallingAgent(
                model=request.model,
                tools=tools,
                max_iterations=request.max_steps
            )
            loop = asyncio.get_event_loop()
            answer = await loop.run_in_executor(None, agent.run, request.query)
            steps_taken = request.max_steps  # Approximate

        elif request.agent_type == "planning":
            agent = PlanningAgent(
                model=request.model,
                tools=tools
            )
            loop = asyncio.get_event_loop()
            answer = await loop.run_in_executor(None, agent.run, request.query)
            steps_taken = request.max_steps  # Approximate

        else:
            raise HTTPException(status_code=400, detail=f"Unknown agent type: {request.agent_type}")

        return AgentResponse(
            answer=answer or "No answer generated",
            steps_taken=steps_taken,
            execution_time_s=time.time() - start_time,
            agent_type=request.agent_type,
            model=request.model
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


@app.post("/run/stream")
async def run_agent_stream(request: AgentRequest):
    """Run agent with streaming output."""
    if request.agent_type != "react":
        raise HTTPException(
            status_code=400,
            detail="Streaming only supported for 'react' agent"
        )

    agent = AsyncReActAgent(
        model=request.model,
        max_steps=request.max_steps
    )

    async def generate():
        async for step in agent.run_stream(request.query):
            data = {
                "type": "step" if not step.is_final else "answer",
                "data": {
                    "step_num": step.step_num,
                    "thought": step.thought,
                    "action": step.action,
                    "observation": step.observation
                }
            }
            if step.is_final:
                data["data"]["answer"] = step.action_input.get("response", "")

            yield f"data: {data}\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )


@app.post("/tools/{tool_name}")
async def call_tool(tool_name: str, arguments: dict):
    """Directly call a tool."""
    tool = tools.get(tool_name)
    if not tool:
        raise HTTPException(status_code=404, detail=f"Tool not found: {tool_name}")

    try:
        result = tool.call(**arguments)
        return {"tool": tool_name, "result": result}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Docker Configuration

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Install Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Start script
COPY start.sh /start.sh
RUN chmod +x /start.sh

CMD ["/start.sh"]

#!/bin/bash
# start.sh

# Start Ollama in background
ollama serve &

# Wait for Ollama to be ready
sleep 5

# Pull the model
ollama pull qwen2.5:3b-instruct

# Start the server
python server/agent_server.py

# docker-compose.yml
version: '3.8'

services:
  slm-agent:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    deploy:
      resources:
        limits:
          memory: 8G

volumes:
  ollama-data:

Exercises

Exercise 1: Custom Tool Creation

Create a custom tool that:

Fetches weather data (mock or real API)
Has proper parameter validation
Handles errors gracefully
Test it with the ReAct agent

Exercise 2: Conversation Memory

Extend the agents to:

Maintain conversation history across queries
Reference previous answers
Handle follow-up questions

Exercise 3: Tool Chain Optimization

Implement a system that:

Detects when tools can be called in parallel
Caches repeated tool calls
Measures and optimizes tool execution time

Exercise 4: Agent Evaluation

Build an evaluation framework that:

Tests agents on a benchmark dataset
Measures success rate, steps taken, and time
Compares different agent types and models

Summary

You've learned to build intelligent agents with SLMs:

Structured Output: Reliable JSON generation with Pydantic schemas
Tool Library: Flexible, extensible tool system
ReAct Agent: Reasoning and acting in a loop
Function Calling: Clean structured approach to tool use
Planning Agent: Multi-step task decomposition

Key insights:

Lower temperature (0.1-0.3) improves parsing reliability
Clear, explicit prompts are crucial for SLM agents
Retry logic handles occasional parsing failures
Limit tools to reduce complexity and errors
Start with simple tasks and gradually increase complexity

Key Concepts Recap

Concept	What It Is	Why It Matters
Structured Output	JSON from LLM matching Pydantic schema	Reliable tool calls and parsing
ReAct Pattern	Thought→Action→Observation loop	Enables multi-step reasoning
Function Calling	LLM selects and parameterizes tools	Cleaner than free-form tool use
Tool Registry	Collection of callable tools with schemas	Easy tool management and discovery
Low Temperature	0.1-0.3 for deterministic output	More consistent JSON parsing
JSON Format Mode	Force model to output valid JSON	Reduces parsing failures
Retry Logic	Re-attempt on parsing failure	Handles occasional errors gracefully
Tool Schema	JSON description of tool parameters	LLM knows how to call the tool
Planning Agent	Decompose task before execution	Handles complex multi-step tasks
Action Input	Arguments passed to selected tool	Must validate against tool schema

Next Steps

Training SLM from Scratch - Build your own small model
Production SLM System - Deploy agents at scale

SLM Agents

TL;DR

Build intelligent, tool-using agents powered by small language models for local, privacy-preserving automation.

Project Overview

Aspect	Details
Difficulty	Intermediate
Time	6-8 hours
Prerequisites	Python, SLM basics, prompt engineering
Learning Outcomes	Tool calling, structured output, ReAct pattern, agent loops

What You'll Learn

Implement function calling with local SLMs
Generate structured outputs using Pydantic
Build ReAct (Reasoning + Acting) agents
Create multi-step reasoning pipelines
Design tool libraries for SLM agents
Handle errors and edge cases gracefully

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                          SLM Agent Architecture                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌───────────────┐                                                         │
│   │  User Query   │                                                         │
│   └───────┬───────┘                                                         │
│           │                                                                 │
│           ▼                                                                 │
│   ┌───────────────────────────────────────────────────────────────────┐     │
│   │                         SLM AGENT                                 │     │
│   │  ┌─────────────┐    ┌──────────────┐    ┌──────────────────┐     │     │
│   │  │  Local SLM  │───►│Output Parser │───►│  Tool Executor   │     │     │
│   │  └──────▲──────┘    └──────┬───────┘    └────────┬─────────┘     │     │
│   │         │                  │                     │               │     │
│   │         │                  │ Final               │               │     │
│   │         │                  │ Answer              ▼               │     │
│   │  ┌──────┴──────┐           │            ┌──────────────────┐     │     │
│   │  │Conversation │◄──────────┼────────────│   Tool Library   │     │     │
│   │  │   Memory    │           │            ├──────────────────┤     │     │
│   │  └─────────────┘           │            │ • Calculator     │     │     │
│   │                            │            │ • Web Search     │     │     │
│   │                            │            │ • Code Runner    │     │     │
│   │                            │            │ • File Ops       │     │     │
│   │                            │            │ • API Calls      │     │     │
│   │                            │            └──────────────────┘     │     │
│   └────────────────────────────┼─────────────────────────────────────┘     │
│                                ▼                                            │
│                        ┌───────────────┐                                    │
│                        │   Response    │                                    │
│                        └───────────────┘                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

ReAct Loop: Thought ──► Action ──► Observation ──► (repeat until answer)

Project Setup

Dependencies

# Create project directory
mkdir slm-agents && cd slm-agents

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install ollama llama-cpp-python
pip install pydantic instructor
pip install fastapi uvicorn
pip install httpx aiohttp
pip install rich  # For pretty console output

Model Setup

# Pull recommended models for agent tasks
# Qwen2.5 has excellent function calling capabilities
ollama pull qwen2.5:3b-instruct

# Phi-3 also works well for structured tasks
ollama pull phi3:mini

# For very constrained environments
ollama pull qwen2.5:0.5b-instruct

Part 1: Structured Output Generation

The foundation of SLM agents is reliable structured output.

# core/structured_output.py
"""
Structured output generation using Pydantic and local SLMs.
"""

import json
import re
from typing import TypeVar, Type, Optional, Any
from pydantic import BaseModel, Field, ValidationError
import ollama


T = TypeVar('T', bound=BaseModel)


class StructuredOutputGenerator:
    """
    Generate structured outputs from SLMs using Pydantic schemas.
    """

    def __init__(self, model: str = "qwen2.5:3b-instruct"):
        self.model = model
        self.client = ollama.Client()

    def generate(
        self,
        prompt: str,
        output_schema: Type[T],
        max_retries: int = 3,
        temperature: float = 0.1
    ) -> Optional[T]:
        """
        Generate structured output matching the Pydantic schema.

        Args:
            prompt: The user prompt
            output_schema: Pydantic model class
            max_retries: Number of retry attempts
            temperature: Sampling temperature (lower = more deterministic)

        Returns:
            Parsed Pydantic model instance or None
        """
        # Generate JSON schema from Pydantic model
        schema = output_schema.model_json_schema()
        schema_str = json.dumps(schema, indent=2)

        # Build the system prompt
        system_prompt = f"""You are a helpful assistant that outputs JSON.
You must respond with valid JSON that matches this schema:

{schema_str}

Important:
- Output ONLY valid JSON, no other text
- Follow the schema exactly
- Use null for optional fields you can't fill
- Ensure all required fields are present"""

        for attempt in range(max_retries):
            try:
                response = self.client.chat(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": prompt}
                    ],
                    options={"temperature": temperature}
                )

                content = response["message"]["content"]

                # Extract JSON from response
                json_str = self._extract_json(content)

                # Parse and validate
                data = json.loads(json_str)
                return output_schema.model_validate(data)

            except (json.JSONDecodeError, ValidationError) as e:
                if attempt < max_retries - 1:
                    print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
                    temperature += 0.1  # Slightly increase temperature
                else:
                    print(f"All attempts failed: {e}")
                    return None

        return None

    def _extract_json(self, text: str) -> str:
        """Extract JSON from model response."""
        # Try to find JSON in code blocks
        code_block_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', text)
        if code_block_match:
            return code_block_match.group(1)

        # Try to find raw JSON object
        json_match = re.search(r'\{[\s\S]*\}', text)
        if json_match:
            return json_match.group(0)

        # Return as-is if nothing found
        return text.strip()


# Example schemas for common agent tasks
class ToolCall(BaseModel):
    """Represents a tool call decision."""
    tool_name: str = Field(description="Name of the tool to call")
    arguments: dict = Field(description="Arguments to pass to the tool")
    reasoning: str = Field(description="Why this tool is being called")


class ThoughtAction(BaseModel):
    """ReAct-style thought and action."""
    thought: str = Field(description="Reasoning about what to do next")
    action: str = Field(description="The action to take: 'tool' or 'answer'")
    tool_name: Optional[str] = Field(default=None, description="Tool to call if action is 'tool'")
    tool_args: Optional[dict] = Field(default=None, description="Arguments for the tool")
    final_answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")


class TaskDecomposition(BaseModel):
    """Break down a complex task into steps."""
    task: str = Field(description="The original task")
    steps: list[str] = Field(description="Ordered list of steps to complete the task")
    complexity: str = Field(description="Complexity level: simple, medium, complex")


# Example usage
if __name__ == "__main__":
    generator = StructuredOutputGenerator()

    # Test task decomposition
    result = generator.generate(
        "How do I make a cup of coffee?",
        TaskDecomposition
    )

    if result:
        print(f"Task: {result.task}")
        print(f"Complexity: {result.complexity}")
        print("Steps:")
        for i, step in enumerate(result.steps, 1):
            print(f"  {i}. {step}")

Part 2: Tool Library Design

Create a flexible tool system for SLM agents.

# tools/base.py
"""
Tool infrastructure for SLM agents.
"""

from abc import ABC, abstractmethod
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from pydantic import BaseModel, Field
import json


@dataclass
class ToolParameter:
    """Describes a tool parameter."""
    name: str
    type: str
    description: str
    required: bool = True
    default: Any = None


@dataclass
class Tool:
    """A callable tool for the agent."""
    name: str
    description: str
    parameters: list[ToolParameter]
    func: Callable
    category: str = "general"

    def get_schema(self) -> dict:
        """Get JSON schema for this tool."""
        properties = {}
        required = []

        for param in self.parameters:
            properties[param.name] = {
                "type": param.type,
                "description": param.description
            }
            if param.required:
                required.append(param.name)

        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": required
            }
        }

    def call(self, **kwargs) -> Any:
        """Execute the tool with given arguments."""
        return self.func(**kwargs)


class ToolRegistry:
    """Registry of available tools."""

    def __init__(self):
        self.tools: dict[str, Tool] = {}

    def register(self, tool: Tool):
        """Register a tool."""
        self.tools[tool.name] = tool

    def get(self, name: str) -> Optional[Tool]:
        """Get a tool by name."""
        return self.tools.get(name)

    def list_tools(self) -> list[str]:
        """List all registered tool names."""
        return list(self.tools.keys())

    def get_tools_prompt(self) -> str:
        """Generate a prompt describing all tools."""
        tool_descriptions = []

        for name, tool in self.tools.items():
            params_desc = ", ".join(
                f"{p.name}: {p.type}" for p in tool.parameters
            )
            tool_descriptions.append(
                f"- {name}({params_desc}): {tool.description}"
            )

        return "\n".join(tool_descriptions)

    def get_tools_schema(self) -> list[dict]:
        """Get JSON schemas for all tools."""
        return [tool.get_schema() for tool in self.tools.values()]


# Built-in tools
def create_calculator_tool() -> Tool:
    """Create a calculator tool."""

    def calculate(expression: str) -> str:
        """Safely evaluate a mathematical expression."""
        try:
            # Only allow safe operations
            allowed_chars = set("0123456789+-*/().^ ")
            if not all(c in allowed_chars for c in expression):
                return "Error: Invalid characters in expression"

            # Replace ^ with ** for exponentiation
            expression = expression.replace("^", "**")

            result = eval(expression, {"__builtins__": {}}, {})
            return str(result)
        except Exception as e:
            return f"Error: {str(e)}"

    return Tool(
        name="calculator",
        description="Evaluate mathematical expressions. Supports +, -, *, /, ^, and parentheses.",
        parameters=[
            ToolParameter(
                name="expression",
                type="string",
                description="The mathematical expression to evaluate"
            )
        ],
        func=calculate,
        category="math"
    )


def create_datetime_tool() -> Tool:
    """Create a datetime tool."""
    from datetime import datetime

    def get_datetime(format: str = "%Y-%m-%d %H:%M:%S") -> str:
        """Get current date and time."""
        return datetime.now().strftime(format)

    return Tool(
        name="get_datetime",
        description="Get the current date and time.",
        parameters=[
            ToolParameter(
                name="format",
                type="string",
                description="Date format string (default: %Y-%m-%d %H:%M:%S)",
                required=False,
                default="%Y-%m-%d %H:%M:%S"
            )
        ],
        func=get_datetime,
        category="utility"
    )


def create_web_search_tool() -> Tool:
    """Create a web search tool (mock for demo)."""

    def web_search(query: str, num_results: int = 3) -> str:
        """Search the web for information."""
        # In production, integrate with a real search API
        # This is a mock for demonstration
        return json.dumps({
            "query": query,
            "results": [
                {"title": f"Result {i+1} for: {query}", "snippet": f"Information about {query}..."}
                for i in range(num_results)
            ],
            "note": "This is mock data. Integrate with a real search API for production."
        })

    return Tool(
        name="web_search",
        description="Search the web for information on a topic.",
        parameters=[
            ToolParameter(
                name="query",
                type="string",
                description="The search query"
            ),
            ToolParameter(
                name="num_results",
                type="integer",
                description="Number of results to return",
                required=False,
                default=3
            )
        ],
        func=web_search,
        category="search"
    )


def create_code_runner_tool() -> Tool:
    """Create a Python code execution tool."""
    import sys
    from io import StringIO

    def run_python(code: str) -> str:
        """Execute Python code and return output."""
        # Capture stdout
        old_stdout = sys.stdout
        sys.stdout = captured_output = StringIO()

        try:
            # Only allow safe builtins
            safe_builtins = {
                "print": print,
                "len": len,
                "range": range,
                "int": int,
                "float": float,
                "str": str,
                "list": list,
                "dict": dict,
                "sum": sum,
                "min": min,
                "max": max,
                "sorted": sorted,
                "enumerate": enumerate,
                "zip": zip,
            }

            exec(code, {"__builtins__": safe_builtins}, {})
            output = captured_output.getvalue()
            return output if output else "Code executed successfully (no output)"

        except Exception as e:
            return f"Error: {str(e)}"

        finally:
            sys.stdout = old_stdout

    return Tool(
        name="run_python",
        description="Execute Python code and return the output. Limited to safe operations.",
        parameters=[
            ToolParameter(
                name="code",
                type="string",
                description="Python code to execute"
            )
        ],
        func=run_python,
        category="code"
    )


def create_default_registry() -> ToolRegistry:
    """Create a registry with default tools."""
    registry = ToolRegistry()
    registry.register(create_calculator_tool())
    registry.register(create_datetime_tool())
    registry.register(create_web_search_tool())
    registry.register(create_code_runner_tool())
    return registry


if __name__ == "__main__":
    # Test tools
    registry = create_default_registry()

    print("Available tools:")
    print(registry.get_tools_prompt())

    print("\nTesting calculator:")
    calc = registry.get("calculator")
    print(calc.call(expression="2 + 3 * 4"))

    print("\nTesting datetime:")
    dt = registry.get("get_datetime")
    print(dt.call())

    print("\nTesting code runner:")
    code = registry.get("run_python")
    print(code.call(code="print('Hello from SLM agent!')"))

Understanding the Tool Library Design:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        TOOL REGISTRY ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                        ToolRegistry                                 │   │
│   │  tools: dict[str, Tool]                                            │   │
│   │                                                                     │   │
│   │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │   │
│   │  │ "calculator"│ │"get_datetime"│ │ "web_search"│ │ "run_python"│   │   │
│   │  │  Tool(...)  │ │  Tool(...)   │ │  Tool(...)  │ │  Tool(...)  │   │   │
│   │  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                   │                                         │
│                                   ▼                                         │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  get_tools_prompt()  ────►  Human-readable tool list for prompts    │   │
│   │  get_tools_schema()  ────►  JSON schemas for function calling       │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why This Design Matters for SLMs:

Design Choice	Benefit for SLMs
Simple schema format	Easier for small models to parse
Category field	Can filter tools by context (math vs search)
get_tools_prompt()	Generates concise text-based tool list
Safe builtins only	Sandboxed execution prevents security issues
Explicit parameters	SLMs need clear parameter descriptions

Tool Safety Pattern:

┌────────────────────────────────────────────┐
│ Calculator: Safe eval()                    │
│                                            │
│ allowed_chars = "0123456789+-*/().^ "      │
│                                            │
│ ✓ "2 + 3 * 4"       → 14                  │
│ ✓ "(10 + 5) ^ 2"    → 225                 │
│ ✗ "__import__('os')" → Error: Invalid     │
│ ✗ "open('file')"     → Error: Invalid     │
└────────────────────────────────────────────┘

Part 3: ReAct Agent Implementation

Build a ReAct (Reasoning + Acting) agent.

# agents/react_agent.py
"""
ReAct agent implementation using local SLMs.
"""

import json
import re
from typing import Optional, Generator
from dataclasses import dataclass
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry


@dataclass
class AgentStep:
    """A single step in the agent's execution."""
    step_num: int
    thought: str
    action: str
    action_input: Optional[dict]
    observation: Optional[str]
    is_final: bool = False


class ReActAgent:
    """
    ReAct agent that reasons and acts iteratively.

    Uses the Thought -> Action -> Observation loop.
    """

    SYSTEM_PROMPT = """You are a helpful AI assistant that can use tools to answer questions.

Available tools:
{tools}

To use a tool, respond in this EXACT format:
Thought: [Your reasoning about what to do]
Action: [tool_name]
Action Input: {{"param1": "value1", "param2": "value2"}}

When you have enough information to answer, respond:
Thought: [Your final reasoning]
Action: answer
Action Input: {{"response": "Your final answer here"}}

Important rules:
1. Always start with a Thought
2. Use exactly one Action per response
3. Action Input must be valid JSON
4. Only use the tools listed above
5. When you have the answer, use action "answer"

Previous conversation:
{history}

Now respond to the user's query."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None,
        max_steps: int = 10,
        verbose: bool = True
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.max_steps = max_steps
        self.verbose = verbose
        self.client = ollama.Client()

    def run(self, query: str) -> Generator[AgentStep, None, str]:
        """
        Run the agent on a query, yielding steps.

        Args:
            query: User query

        Yields:
            AgentStep objects for each step

        Returns:
            Final answer string
        """
        history = []
        step_num = 0

        while step_num < self.max_steps:
            step_num += 1

            # Build prompt
            system = self.SYSTEM_PROMPT.format(
                tools=self.tools.get_tools_prompt(),
                history=self._format_history(history)
            )

            # Get response from model
            response = self.client.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": query}
                ],
                options={"temperature": 0.1}
            )

            content = response["message"]["content"]

            # Parse the response
            thought, action, action_input = self._parse_response(content)

            if self.verbose:
                print(f"\n--- Step {step_num} ---")
                print(f"Thought: {thought}")
                print(f"Action: {action}")
                print(f"Action Input: {action_input}")

            # Check if this is the final answer
            if action.lower() == "answer":
                final_answer = action_input.get("response", str(action_input))
                step = AgentStep(
                    step_num=step_num,
                    thought=thought,
                    action=action,
                    action_input=action_input,
                    observation=None,
                    is_final=True
                )
                yield step
                return final_answer

            # Execute the tool
            observation = self._execute_tool(action, action_input)

            if self.verbose:
                print(f"Observation: {observation}")

            # Create step
            step = AgentStep(
                step_num=step_num,
                thought=thought,
                action=action,
                action_input=action_input,
                observation=observation
            )
            yield step

            # Add to history
            history.append({
                "thought": thought,
                "action": action,
                "action_input": action_input,
                "observation": observation
            })

        # Max steps reached
        return "I was unable to find a complete answer within the step limit."

    def _parse_response(self, content: str) -> tuple[str, str, dict]:
        """Parse thought, action, and action input from response."""
        thought = ""
        action = ""
        action_input = {}

        # Extract thought
        thought_match = re.search(r'Thought:\s*(.+?)(?=Action:|$)', content, re.DOTALL)
        if thought_match:
            thought = thought_match.group(1).strip()

        # Extract action
        action_match = re.search(r'Action:\s*(\w+)', content)
        if action_match:
            action = action_match.group(1).strip()

        # Extract action input
        input_match = re.search(r'Action Input:\s*(\{.*?\})', content, re.DOTALL)
        if input_match:
            try:
                action_input = json.loads(input_match.group(1))
            except json.JSONDecodeError:
                # Try to fix common JSON issues
                json_str = input_match.group(1)
                json_str = re.sub(r"'", '"', json_str)  # Replace single quotes
                try:
                    action_input = json.loads(json_str)
                except json.JSONDecodeError:
                    pass

        return thought, action, action_input

    def _execute_tool(self, action: str, action_input: dict) -> str:
        """Execute a tool and return the observation."""
        tool = self.tools.get(action)

        if tool is None:
            return f"Error: Unknown tool '{action}'. Available tools: {self.tools.list_tools()}"

        try:
            result = tool.call(**action_input)
            return str(result)
        except Exception as e:
            return f"Error executing tool: {str(e)}"

    def _format_history(self, history: list[dict]) -> str:
        """Format conversation history."""
        if not history:
            return "No previous steps."

        formatted = []
        for i, step in enumerate(history, 1):
            formatted.append(f"""Step {i}:
Thought: {step['thought']}
Action: {step['action']}
Action Input: {json.dumps(step['action_input'])}
Observation: {step['observation']}""")

        return "\n\n".join(formatted)


# Example usage
if __name__ == "__main__":
    from rich.console import Console
    from rich.panel import Panel
    from rich.markdown import Markdown

    console = Console()

    agent = ReActAgent(verbose=False)

    queries = [
        "What is 25 * 48 + 100?",
        "What is the current date and time?",
        "Calculate the sum of squares from 1 to 5 using Python code."
    ]

    for query in queries:
        console.print(Panel(f"[bold blue]Query:[/bold blue] {query}"))

        final_answer = None
        for step in agent.run(query):
            console.print(f"\n[yellow]Step {step.step_num}[/yellow]")
            console.print(f"[dim]Thought:[/dim] {step.thought}")
            console.print(f"[dim]Action:[/dim] {step.action}")

            if step.observation:
                console.print(f"[dim]Observation:[/dim] {step.observation}")

            if step.is_final:
                final_answer = step.action_input.get("response", str(step.action_input))

        if final_answer:
            console.print(Panel(f"[bold green]Answer:[/bold green] {final_answer}"))

        console.print("\n" + "="*50 + "\n")

Part 4: Function Calling Agent

A cleaner approach using function calling style.

# agents/function_calling_agent.py
"""
Function calling agent using structured output.
"""

import json
from typing import Optional, Any
from pydantic import BaseModel, Field
from enum import Enum
import ollama

from tools.base import ToolRegistry, create_default_registry


class ActionType(str, Enum):
    TOOL = "tool"
    ANSWER = "answer"


class FunctionCall(BaseModel):
    """A function call decision."""
    reasoning: str = Field(description="Step-by-step reasoning for this decision")
    action: ActionType = Field(description="Whether to call a tool or provide final answer")
    function_name: Optional[str] = Field(default=None, description="Name of function to call")
    arguments: Optional[dict] = Field(default=None, description="Arguments for the function")
    answer: Optional[str] = Field(default=None, description="Final answer if action is 'answer'")


class FunctionCallingAgent:
    """
    Agent that uses structured function calling.
    """

    SYSTEM_PROMPT = """You are a helpful AI assistant that answers questions using available tools.

Available functions:
{functions}

For each user query, you must:
1. Reason step-by-step about how to answer
2. Decide whether to call a function or provide a final answer
3. If calling a function, specify the function name and arguments
4. If answering, provide a complete answer based on gathered information

Respond with a JSON object containing:
- reasoning: Your step-by-step thought process
- action: Either "tool" or "answer"
- function_name: Name of function to call (if action is "tool")
- arguments: Function arguments as object (if action is "tool")
- answer: Your final answer (if action is "answer")

Previous function calls and results:
{history}

Respond ONLY with valid JSON."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None,
        max_iterations: int = 10
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.max_iterations = max_iterations
        self.client = ollama.Client()

    def run(self, query: str) -> str:
        """
        Run the agent on a query.

        Args:
            query: User query

        Returns:
            Final answer string
        """
        history = []
        iteration = 0

        while iteration < self.max_iterations:
            iteration += 1

            # Build prompt
            system = self.SYSTEM_PROMPT.format(
                functions=self._format_functions(),
                history=self._format_history(history)
            )

            # Get response
            response = self.client.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": query}
                ],
                options={"temperature": 0.1},
                format="json"
            )

            content = response["message"]["content"]

            # Parse response
            try:
                data = json.loads(content)
                call = FunctionCall.model_validate(data)
            except Exception as e:
                print(f"Parse error: {e}")
                continue

            print(f"\n[Iteration {iteration}]")
            print(f"Reasoning: {call.reasoning}")
            print(f"Action: {call.action}")

            # Check for final answer
            if call.action == ActionType.ANSWER:
                return call.answer or "No answer provided"

            # Execute function
            if call.function_name and call.arguments is not None:
                result = self._execute_function(call.function_name, call.arguments)
                print(f"Function: {call.function_name}")
                print(f"Arguments: {call.arguments}")
                print(f"Result: {result}")

                history.append({
                    "function": call.function_name,
                    "arguments": call.arguments,
                    "result": result
                })

        return "Unable to complete the task within iteration limit."

    def _format_functions(self) -> str:
        """Format available functions."""
        schemas = self.tools.get_tools_schema()
        return json.dumps(schemas, indent=2)

    def _format_history(self, history: list[dict]) -> str:
        """Format function call history."""
        if not history:
            return "No previous function calls."

        formatted = []
        for i, h in enumerate(history, 1):
            formatted.append(
                f"{i}. {h['function']}({json.dumps(h['arguments'])}) -> {h['result']}"
            )
        return "\n".join(formatted)

    def _execute_function(self, name: str, arguments: dict) -> str:
        """Execute a function."""
        tool = self.tools.get(name)
        if not tool:
            return f"Error: Unknown function '{name}'"

        try:
            result = tool.call(**arguments)
            return str(result)
        except Exception as e:
            return f"Error: {str(e)}"


if __name__ == "__main__":
    agent = FunctionCallingAgent()

    # Test queries
    queries = [
        "What is 15% of 850?",
        "Write Python code to find the factorial of 6 and run it.",
        "What time is it right now?"
    ]

    for query in queries:
        print(f"\n{'='*60}")
        print(f"Query: {query}")
        print("="*60)
        answer = agent.run(query)
        print(f"\nFinal Answer: {answer}")

Understanding Function Calling vs ReAct:

┌─────────────────────────────────────────────────────────────────────────────┐
│                   ReAct vs Function Calling Comparison                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ReAct (Text-Based):              Function Calling (JSON-Based):          │
│   ┌─────────────────────┐          ┌─────────────────────┐                 │
│   │ Thought: I need to  │          │ {                   │                 │
│   │ calculate 15% of 850│          │   "reasoning": "...",│                 │
│   │                     │          │   "action": "tool", │                 │
│   │ Action: calculator  │          │   "function_name":  │                 │
│   │ Action Input:       │          │     "calculator",   │                 │
│   │ {"expression":      │          │   "arguments": {    │                 │
│   │   "850 * 0.15"}     │          │     "expression":   │                 │
│   └─────────────────────┘          │     "850 * 0.15"    │                 │
│                                    │   }                 │                 │
│   • Regex parsing required         │ }                   │                 │
│   • More error-prone              │                      │                 │
│   • Better for explanation        │ • Native JSON parsing│                 │
│                                   │ • More structured    │                 │
│                                   │ • Pydantic validation│                 │
│                                    └─────────────────────┘                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Differences:

Aspect	ReAct	Function Calling
Parsing	Regex-based extraction	Native JSON parsing
Validation	Manual checking	Pydantic model validation
Error Handling	Multiple fallback attempts	Single parse with clear errors
Format Mode	Standard text	`format="json"` in Ollama
Model Support	Any model	Works best with Qwen2.5
Verbosity	More text in output	Compact JSON

Why ActionType Enum?

class ActionType(str, Enum):
    TOOL = "tool"
    ANSWER = "answer"

Using an enum instead of raw strings:

Pydantic validates the value automatically
IDE autocomplete for action types
Type-safe comparisons in code

Part 5: Multi-Step Planning Agent

Handle complex tasks with planning.

# agents/planning_agent.py
"""
Planning agent that breaks down complex tasks.
"""

import json
from typing import Optional
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry


class Step(BaseModel):
    """A single step in a plan."""
    id: int = Field(description="Step number")
    description: str = Field(description="What this step does")
    tool: Optional[str] = Field(default=None, description="Tool to use, if any")
    depends_on: list[int] = Field(default_factory=list, description="Step IDs this depends on")


class Plan(BaseModel):
    """A plan to accomplish a task."""
    goal: str = Field(description="The overall goal")
    steps: list[Step] = Field(description="Ordered steps to achieve the goal")
    reasoning: str = Field(description="Why this plan makes sense")


class StepResult(BaseModel):
    """Result of executing a step."""
    step_id: int
    success: bool
    output: str
    needs_replanning: bool = False


class PlanningAgent:
    """
    Agent that creates and executes plans.
    """

    PLANNING_PROMPT = """You are a planning AI that breaks down tasks into steps.

Available tools:
{tools}

Given a task, create a plan with these properties:
1. Break the task into small, executable steps
2. Each step should use at most one tool
3. Steps can depend on previous steps
4. Include a final step that synthesizes the answer

Respond with a JSON object:
{{
  "goal": "The overall task",
  "steps": [
    {{"id": 1, "description": "What to do", "tool": "tool_name or null", "depends_on": []}},
    ...
  ],
  "reasoning": "Why this plan works"
}}

Task: {task}

Respond ONLY with valid JSON."""

    EXECUTION_PROMPT = """You are executing step {step_id} of a plan.

Goal: {goal}
Current Step: {step_description}
Tool to use: {tool}
Previous results:
{previous_results}

{tool_instruction}

Respond with a JSON object:
{{
  "reasoning": "Your thought process",
  "tool_call": {{"name": "tool_name", "arguments": {{}}}},  // or null if no tool needed
  "output": "The result or answer for this step"
}}

Respond ONLY with valid JSON."""

    def __init__(
        self,
        model: str = "qwen2.5:3b-instruct",
        tools: ToolRegistry = None
    ):
        self.model = model
        self.tools = tools or create_default_registry()
        self.client = ollama.Client()

    def run(self, task: str) -> str:
        """
        Plan and execute a task.

        Args:
            task: The task to accomplish

        Returns:
            Final result string
        """
        # Create plan
        print("Creating plan...")
        plan = self._create_plan(task)

        if not plan:
            return "Failed to create a plan for this task."

        print(f"\nPlan for: {plan.goal}")
        print(f"Reasoning: {plan.reasoning}")
        print(f"\nSteps:")
        for step in plan.steps:
            deps = f" (depends on: {step.depends_on})" if step.depends_on else ""
            tool = f" [using {step.tool}]" if step.tool else ""
            print(f"  {step.id}. {step.description}{tool}{deps}")

        # Execute plan
        print("\nExecuting plan...")
        results = {}

        for step in plan.steps:
            # Check dependencies
            dep_results = {
                dep_id: results[dep_id].output
                for dep_id in step.depends_on
                if dep_id in results
            }

            # Execute step
            result = self._execute_step(step, plan.goal, dep_results)
            results[step.id] = result

            status = "✓" if result.success else "✗"
            print(f"  {status} Step {step.id}: {result.output[:100]}...")

            if result.needs_replanning:
                print("  ⚠ Replanning needed (not implemented in this demo)")

        # Return final result
        final_step = plan.steps[-1]
        return results[final_step.id].output

    def _create_plan(self, task: str) -> Optional[Plan]:
        """Create a plan for the task."""
        prompt = self.PLANNING_PROMPT.format(
            tools=self.tools.get_tools_prompt(),
            task=task
        )

        try:
            response = self.client.chat(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                options={"temperature": 0.2},
                format="json"
            )

            data = json.loads(response["message"]["content"])
            return Plan.model_validate(data)

        except Exception as e:
            print(f"Planning error: {e}")
            return None

    def _execute_step(
        self,
        step: Step,
        goal: str,
        previous_results: dict[int, str]
    ) -> StepResult:
        """Execute a single step."""
        # Format previous results
        prev_str = "\n".join(
            f"Step {sid}: {result}"
            for sid, result in previous_results.items()
        ) if previous_results else "No previous results"

        # Tool instruction
        if step.tool:
            tool = self.tools.get(step.tool)
            if tool:
                tool_instruction = f"Use the {step.tool} tool. Schema: {json.dumps(tool.get_schema())}"
            else:
                tool_instruction = f"Tool {step.tool} not found. Proceed without it."
        else:
            tool_instruction = "No tool needed for this step. Just reason and provide the output."

        prompt = self.EXECUTION_PROMPT.format(
            step_id=step.id,
            goal=goal,
            step_description=step.description,
            tool=step.tool or "None",
            previous_results=prev_str,
            tool_instruction=tool_instruction
        )

        try:
            response = self.client.chat(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                options={"temperature": 0.1},
                format="json"
            )

            data = json.loads(response["message"]["content"])

            # Execute tool if needed
            output = data.get("output", "")

            if step.tool and data.get("tool_call"):
                tool_call = data["tool_call"]
                tool = self.tools.get(tool_call["name"])
                if tool:
                    tool_result = tool.call(**tool_call.get("arguments", {}))
                    output = f"{output}\nTool result: {tool_result}"

            return StepResult(
                step_id=step.id,
                success=True,
                output=output
            )

        except Exception as e:
            return StepResult(
                step_id=step.id,
                success=False,
                output=f"Error: {str(e)}",
                needs_replanning=True
            )


if __name__ == "__main__":
    agent = PlanningAgent()

    task = "Calculate the average of the squares of numbers 1 through 5"

    print(f"Task: {task}")
    print("="*60)

    result = agent.run(task)

    print("\n" + "="*60)
    print(f"Final Result: {result}")

Understanding the Planning Agent Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                     PLANNING AGENT EXECUTION                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Task: "Calculate average of squares from 1 to 5"                         │
│                            │                                                │
│                            ▼                                                │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                      PLANNING PHASE                                │   │
│   │                                                                    │   │
│   │   Input: Task + Available Tools                                   │   │
│   │                       │                                            │   │
│   │                       ▼                                            │   │
│   │   ┌────────────────────────────────────────────────────────────┐  │   │
│   │   │ Plan:                                                      │  │   │
│   │   │   Step 1: Square each number 1-5           [run_python]    │  │   │
│   │   │   Step 2: Sum the squares                  [calculator]    │  │   │
│   │   │   Step 3: Divide by 5 for average          [calculator]    │  │   │
│   │   │   Step 4: Format final answer              [none]          │  │   │
│   │   └────────────────────────────────────────────────────────────┘  │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                            │                                                │
│                            ▼                                                │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                     EXECUTION PHASE                                │   │
│   │                                                                    │   │
│   │   Step 1 ──► run_python: "print([x**2 for x in range(1,6)])"     │   │
│   │          ◄── Result: "[1, 4, 9, 16, 25]"                          │   │
│   │                                                                    │   │
│   │   Step 2 ──► calculator: "1 + 4 + 9 + 16 + 25"                   │   │
│   │          ◄── Result: "55"                                         │   │
│   │                                                                    │   │
│   │   Step 3 ──► calculator: "55 / 5"                                │   │
│   │          ◄── Result: "11.0"                                       │   │
│   │                                                                    │   │
│   │   Step 4 ──► (no tool) Format: "The average is 11.0"             │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                            │                                                │
│                            ▼                                                │
│                    Final Result: "The average of the squares               │
│                                   of 1-5 is 11.0"                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Step Dependencies Explained:

┌─────────────────────────────────────────────┐
│ Step Dependencies Enable Parallelism        │
│                                             │
│ Step 1: Get stock price     depends_on: []  │
│ Step 2: Get exchange rate   depends_on: []  │
│ Step 3: Convert currency    depends_on: [1,2]│
│                                             │
│ Steps 1 and 2 can run in parallel!         │
│ Step 3 waits for both to complete          │
└─────────────────────────────────────────────┘

  Step 1 ─────┐
              ├──► Step 3
  Step 2 ─────┘

Planning vs ReAct Trade-offs:

Aspect	Planning Agent	ReAct Agent
Upfront cost	Higher (creates full plan)	Lower (step-by-step)
Parallelism	Possible with dependencies	Sequential only
Failure recovery	`needs_replanning` flag	Natural loop continuation
Token usage	More efficient for complex tasks	More efficient for simple tasks
Explainability	Full plan visible upfront	Reasoning visible per step

Part 6: Agent FastAPI Server

Deploy agents as a REST API.

# server/agent_server.py
"""
FastAPI server for SLM agents.
"""

import time
import asyncio
from typing import Optional, Literal
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
import ollama

from tools.base import ToolRegistry, create_default_registry
from agents.react_agent import ReActAgent
from agents.function_calling_agent import FunctionCallingAgent
from agents.planning_agent import PlanningAgent


# Request/Response models
class AgentRequest(BaseModel):
    query: str = Field(..., description="The user's query")
    agent_type: Literal["react", "function_calling", "planning"] = Field(
        default="react",
        description="Type of agent to use"
    )
    model: str = Field(default="qwen2.5:3b-instruct", description="Model to use")
    max_steps: int = Field(default=10, le=20, description="Maximum agent steps")


class AgentResponse(BaseModel):
    answer: str
    steps_taken: int
    execution_time_s: float
    agent_type: str
    model: str


class AgentStep(BaseModel):
    step_num: int
    thought: str
    action: str
    observation: Optional[str]


class StreamingAgentResponse(BaseModel):
    type: Literal["step", "answer"]
    data: dict


# Async agent wrapper
class AsyncReActAgent:
    """Async wrapper for ReActAgent."""

    def __init__(self, model: str = "qwen2.5:3b-instruct", max_steps: int = 10):
        self.model = model
        self.max_steps = max_steps
        self.tools = create_default_registry()

    async def run(self, query: str):
        """Run agent asynchronously."""
        # Use sync agent in thread pool
        agent = ReActAgent(
            model=self.model,
            tools=self.tools,
            max_steps=self.max_steps,
            verbose=False
        )

        loop = asyncio.get_event_loop()
        steps = []
        final_answer = None

        def run_agent():
            nonlocal final_answer
            for step in agent.run(query):
                steps.append(step)
                if step.is_final:
                    final_answer = step.action_input.get("response", str(step.action_input))
            return final_answer

        await loop.run_in_executor(None, run_agent)

        return steps, final_answer

    async def run_stream(self, query: str):
        """Stream agent steps."""
        agent = ReActAgent(
            model=self.model,
            tools=self.tools,
            max_steps=self.max_steps,
            verbose=False
        )

        loop = asyncio.get_event_loop()

        def get_steps():
            results = []
            for step in agent.run(query):
                results.append(step)
            return results

        steps = await loop.run_in_executor(None, get_steps)

        for step in steps:
            yield step


# Global state
tools: Optional[ToolRegistry] = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Initialize tools on startup."""
    global tools
    tools = create_default_registry()
    print("Agent server initialized with tools:", tools.list_tools())
    yield
    tools = None


app = FastAPI(
    title="SLM Agent Server",
    description="Local AI agents powered by small language models",
    version="1.0.0",
    lifespan=lifespan
)


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "tools_available": tools.list_tools() if tools else []
    }


@app.get("/tools")
async def list_tools():
    """List available tools."""
    if not tools:
        raise HTTPException(status_code=503, detail="Tools not initialized")

    return {
        "tools": [
            {
                "name": name,
                "schema": tools.get(name).get_schema()
            }
            for name in tools.list_tools()
        ]
    }


@app.post("/run", response_model=AgentResponse)
async def run_agent(request: AgentRequest):
    """Run an agent on a query."""
    start_time = time.time()

    try:
        if request.agent_type == "react":
            agent = AsyncReActAgent(
                model=request.model,
                max_steps=request.max_steps
            )
            steps, answer = await agent.run(request.query)
            steps_taken = len(steps)

        elif request.agent_type == "function_calling":
            agent = FunctionCallingAgent(
                model=request.model,
                tools=tools,
                max_iterations=request.max_steps
            )
            loop = asyncio.get_event_loop()
            answer = await loop.run_in_executor(None, agent.run, request.query)
            steps_taken = request.max_steps  # Approximate

        elif request.agent_type == "planning":
            agent = PlanningAgent(
                model=request.model,
                tools=tools
            )
            loop = asyncio.get_event_loop()
            answer = await loop.run_in_executor(None, agent.run, request.query)
            steps_taken = request.max_steps  # Approximate

        else:
            raise HTTPException(status_code=400, detail=f"Unknown agent type: {request.agent_type}")

        return AgentResponse(
            answer=answer or "No answer generated",
            steps_taken=steps_taken,
            execution_time_s=time.time() - start_time,
            agent_type=request.agent_type,
            model=request.model
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


@app.post("/run/stream")
async def run_agent_stream(request: AgentRequest):
    """Run agent with streaming output."""
    if request.agent_type != "react":
        raise HTTPException(
            status_code=400,
            detail="Streaming only supported for 'react' agent"
        )

    agent = AsyncReActAgent(
        model=request.model,
        max_steps=request.max_steps
    )

    async def generate():
        async for step in agent.run_stream(request.query):
            data = {
                "type": "step" if not step.is_final else "answer",
                "data": {
                    "step_num": step.step_num,
                    "thought": step.thought,
                    "action": step.action,
                    "observation": step.observation
                }
            }
            if step.is_final:
                data["data"]["answer"] = step.action_input.get("response", "")

            yield f"data: {data}\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )


@app.post("/tools/{tool_name}")
async def call_tool(tool_name: str, arguments: dict):
    """Directly call a tool."""
    tool = tools.get(tool_name)
    if not tool:
        raise HTTPException(status_code=404, detail=f"Tool not found: {tool_name}")

    try:
        result = tool.call(**arguments)
        return {"tool": tool_name, "result": result}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Docker Configuration

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Install Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Start script
COPY start.sh /start.sh
RUN chmod +x /start.sh

CMD ["/start.sh"]

#!/bin/bash
# start.sh

# Start Ollama in background
ollama serve &

# Wait for Ollama to be ready
sleep 5

# Pull the model
ollama pull qwen2.5:3b-instruct

# Start the server
python server/agent_server.py

# docker-compose.yml
version: '3.8'

services:
  slm-agent:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    deploy:
      resources:
        limits:
          memory: 8G

volumes:
  ollama-data:

Exercises

Exercise 1: Custom Tool Creation

Create a custom tool that:

Fetches weather data (mock or real API)
Has proper parameter validation
Handles errors gracefully
Test it with the ReAct agent

Exercise 2: Conversation Memory

Extend the agents to:

Maintain conversation history across queries
Reference previous answers
Handle follow-up questions

Exercise 3: Tool Chain Optimization

Implement a system that:

Detects when tools can be called in parallel
Caches repeated tool calls
Measures and optimizes tool execution time

Exercise 4: Agent Evaluation

Build an evaluation framework that:

Tests agents on a benchmark dataset
Measures success rate, steps taken, and time
Compares different agent types and models

Summary

You've learned to build intelligent agents with SLMs:

Structured Output: Reliable JSON generation with Pydantic schemas
Tool Library: Flexible, extensible tool system
ReAct Agent: Reasoning and acting in a loop
Function Calling: Clean structured approach to tool use
Planning Agent: Multi-step task decomposition

Key insights:

Lower temperature (0.1-0.3) improves parsing reliability
Clear, explicit prompts are crucial for SLM agents
Retry logic handles occasional parsing failures
Limit tools to reduce complexity and errors
Start with simple tasks and gradually increase complexity

Key Concepts Recap

Concept	What It Is	Why It Matters
Structured Output	JSON from LLM matching Pydantic schema	Reliable tool calls and parsing
ReAct Pattern	Thought→Action→Observation loop	Enables multi-step reasoning
Function Calling	LLM selects and parameterizes tools	Cleaner than free-form tool use
Tool Registry	Collection of callable tools with schemas	Easy tool management and discovery
Low Temperature	0.1-0.3 for deterministic output	More consistent JSON parsing
JSON Format Mode	Force model to output valid JSON	Reduces parsing failures
Retry Logic	Re-attempt on parsing failure	Handles occasional errors gracefully
Tool Schema	JSON description of tool parameters	LLM knows how to call the tool
Planning Agent	Decompose task before execution	Handles complex multi-step tasks
Action Input	Arguments passed to selected tool	Must validate against tool schema

Next Steps

Training SLM from Scratch - Build your own small model
Production SLM System - Deploy agents at scale

SLM Agents

On this page

SLM Agents

On this page