Build an LLM-powered system that generates comprehensive business reports from structured data and natural language inputs

Automated Enterprise Reporting

Build a production-grade reporting system that transforms raw data into polished, narrative business reports with insights, visualizations, and recommendations.


Industry	Enterprise / Business Intelligence
Difficulty	Advanced
Time	1 week
Code	~1100 lines

TL;DR

Build an automated reporting system that ingests data from multiple sources (SQL, CSV, APIs), performs statistical analysis (trends, anomalies, seasonality), generates LLM-powered narratives (executive summaries, insights, recommendations), creates Plotly visualizations, and exports to PDF with WeasyPrint. Uses YAML templates for report configuration.

Why This Case Study?

Enterprise teams spend 20-40 hours per month manually creating recurring reports: pulling data from databases, calculating metrics, writing narratives, creating charts, and formatting PDFs. Most of this work is repetitive and follows the same template every week.

Metric	Manual Reporting	Automated with LLM
Time per report	4-8 hours	2-5 minutes
Report frequency	Weekly (limited by bandwidth)	On-demand or scheduled
Data freshness	Hours old by delivery	Real-time at generation
Narrative quality	Varies by analyst fatigue	Consistent with trend analysis
Annual time saved (10 reports/week)	Baseline	2,000-4,000 analyst hours

This case study teaches the full pipeline: data ingestion from multiple sources, statistical analysis, LLM-generated narratives, Plotly visualizations, and PDF export -- all configurable via YAML templates.

What You'll Build

An intelligent reporting system that:

Ingests multiple data sources - Databases, spreadsheets, APIs
Analyzes trends - Statistical analysis and anomaly detection
Generates narratives - Natural language explanations of data
Creates visualizations - Charts and graphs from data
Produces formatted reports - PDF, HTML, and email-ready outputs

Architecture

Automated Reporting Architecture

Data Sources

Database

External APIs

Spreadsheets

Previous Reports

Data Processing

ETL Pipeline

Statistical Analysis

Anomaly Detection

Report Generation

Template Selection

Narrative Writing

Chart Generation

Insight Extraction

Report Delivery

PDF Export

Web Dashboard

Email Distribution

Slack Integration

Project Structure

automated-reporting/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── connectors.py       # Data source connections
│   │   ├── etl.py              # Extract-Transform-Load
│   │   └── validators.py       # Data validation
│   ├── analysis/
│   │   ├── __init__.py
│   │   ├── statistics.py       # Statistical analysis
│   │   ├── trends.py           # Trend detection
│   │   └── anomalies.py        # Anomaly detection
│   ├── generation/
│   │   ├── __init__.py
│   │   ├── templates.py        # Report templates
│   │   ├── narrative.py        # LLM narrative generation
│   │   ├── insights.py         # Insight extraction
│   │   └── charts.py           # Visualization generation
│   ├── output/
│   │   ├── __init__.py
│   │   ├── pdf_export.py       # PDF generation
│   │   ├── html_export.py      # HTML output
│   │   └── distribution.py     # Email/Slack delivery
│   └── api/
│       ├── __init__.py
│       └── main.py             # FastAPI endpoints
├── templates/
│   ├── weekly_sales.yaml
│   ├── monthly_finance.yaml
│   └── quarterly_review.yaml
├── tests/
└── requirements.txt

Tech Stack

Technology	Purpose	Why
LangChain	LLM orchestration	Chain data analysis with narrative generation
OpenAI GPT-4o	Narrative generation	Strong at writing professional business prose
Pandas	Data manipulation	Standard for tabular data ETL
Plotly	Interactive charts	Publication-quality charts exportable to PNG
WeasyPrint	PDF generation	CSS-based PDF rendering for polished reports
SQLAlchemy	Database connections	Supports PostgreSQL, MySQL, SQLite uniformly
FastAPI	API endpoints	Async for long-running report generation
Jinja2	Template rendering	Compose HTML reports with dynamic sections

Implementation

Configuration

# src/config.py
from pydantic_settings import BaseSettings
from typing import Optional, List

class Settings(BaseSettings):
    # LLM Settings
    openai_api_key: str
    openai_model: str = "gpt-4o"

    # Database
    database_url: str = "postgresql://user:pass@localhost/reporting"

    # Report Settings
    company_name: str = "Acme Corp"
    report_output_dir: str = "./reports"
    default_currency: str = "USD"

    # Email Settings
    smtp_host: Optional[str] = None
    smtp_port: int = 587
    smtp_user: Optional[str] = None
    smtp_password: Optional[str] = None

    # Slack Settings
    slack_webhook_url: Optional[str] = None

    class Config:
        env_file = ".env"

settings = Settings()

Data Connectors

# src/data/connectors.py
from typing import Dict, Any, List
import pandas as pd
from sqlalchemy import create_engine, text
from abc import ABC, abstractmethod
from ..config import settings

class DataConnector(ABC):
    """Base class for data connectors."""

    @abstractmethod
    def fetch(self, query: str) -> pd.DataFrame:
        pass

class SQLConnector(DataConnector):
    """SQL database connector."""

    def __init__(self, connection_string: str = None):
        self.engine = create_engine(
            connection_string or settings.database_url
        )

    def fetch(self, query: str) -> pd.DataFrame:
        """Execute SQL query and return DataFrame."""
        with self.engine.connect() as conn:
            return pd.read_sql(text(query), conn)

    def fetch_table(self, table: str, limit: int = None) -> pd.DataFrame:
        """Fetch entire table."""
        query = f"SELECT * FROM {table}"
        if limit:
            query += f" LIMIT {limit}"
        return self.fetch(query)

class CSVConnector(DataConnector):
    """CSV file connector."""

    def __init__(self, base_path: str = "./data"):
        self.base_path = base_path

    def fetch(self, filename: str) -> pd.DataFrame:
        """Load CSV file."""
        import os
        filepath = os.path.join(self.base_path, filename)
        return pd.read_csv(filepath)

class APIConnector(DataConnector):
    """REST API connector."""

    def __init__(self, base_url: str, headers: Dict = None):
        self.base_url = base_url
        self.headers = headers or {}

    def fetch(self, endpoint: str) -> pd.DataFrame:
        """Fetch data from API endpoint."""
        import requests
        response = requests.get(
            f"{self.base_url}/{endpoint}",
            headers=self.headers
        )
        response.raise_for_status()
        return pd.DataFrame(response.json())


class DataManager:
    """Manages multiple data sources."""

    def __init__(self):
        self.connectors: Dict[str, DataConnector] = {}

    def register(self, name: str, connector: DataConnector):
        """Register a data connector."""
        self.connectors[name] = connector

    def fetch(self, source: str, query: str) -> pd.DataFrame:
        """Fetch data from a registered source."""
        if source not in self.connectors:
            raise ValueError(f"Unknown data source: {source}")
        return self.connectors[source].fetch(query)

Why Pluggable Data Connectors:

Data Connector Pattern

DataManager (registry)

'main' -> SQLConnector

PostgreSQL. fetch(query) returns DataFrame

'sheets' -> CSVConnector

Google Sheets. fetch(filename) returns DataFrame

'crm' -> APIConnector

Salesforce. fetch(endpoint) returns DataFrame

Template config selects connector by source name (e.g., source: main picks SQLConnector)

Connector	Use Case	Authentication
SQLConnector	Data warehouse, production DB	Connection string
CSVConnector	Spreadsheet exports, historical data	File path
APIConnector	CRM, marketing tools, external services	API key/OAuth

All connectors return pandas DataFrame - unified interface regardless of source.

Statistical Analysis

# src/analysis/statistics.py
from typing import Dict, List, Any, Optional
import pandas as pd
import numpy as np
from dataclasses import dataclass

@dataclass
class MetricSummary:
    """Summary statistics for a metric."""
    name: str
    current: float
    previous: float
    change: float
    change_percent: float
    trend: str  # up, down, stable
    is_anomaly: bool

class StatisticalAnalyzer:
    """Performs statistical analysis on data."""

    def __init__(self, significance_threshold: float = 0.05):
        self.significance_threshold = significance_threshold

    def calculate_summary(
        self,
        df: pd.DataFrame,
        metric_col: str,
        period_col: str
    ) -> MetricSummary:
        """Calculate summary statistics for a metric."""
        sorted_df = df.sort_values(period_col)

        current = sorted_df[metric_col].iloc[-1]
        previous = sorted_df[metric_col].iloc[-2] if len(sorted_df) > 1 else current

        change = current - previous
        change_percent = (change / previous * 100) if previous != 0 else 0

        # Determine trend
        if abs(change_percent) < 1:
            trend = "stable"
        elif change > 0:
            trend = "up"
        else:
            trend = "down"

        # Check for anomaly using z-score
        mean = sorted_df[metric_col].mean()
        std = sorted_df[metric_col].std()
        z_score = abs((current - mean) / std) if std > 0 else 0
        is_anomaly = z_score > 2

        return MetricSummary(
            name=metric_col,
            current=current,
            previous=previous,
            change=change,
            change_percent=change_percent,
            trend=trend,
            is_anomaly=is_anomaly
        )

    def calculate_growth_rate(
        self,
        values: List[float],
        periods: int = 1
    ) -> float:
        """Calculate compound growth rate."""
        if len(values) < 2:
            return 0.0

        start = values[0]
        end = values[-1]

        if start <= 0:
            return 0.0

        return ((end / start) ** (1 / periods) - 1) * 100

    def detect_seasonality(
        self,
        df: pd.DataFrame,
        value_col: str,
        date_col: str
    ) -> Dict[str, Any]:
        """Detect seasonal patterns in data."""
        df = df.copy()
        df[date_col] = pd.to_datetime(df[date_col])
        df['month'] = df[date_col].dt.month
        df['quarter'] = df[date_col].dt.quarter
        df['dayofweek'] = df[date_col].dt.dayofweek

        # Monthly seasonality
        monthly_avg = df.groupby('month')[value_col].mean()
        monthly_variation = monthly_avg.std() / monthly_avg.mean()

        # Quarterly seasonality
        quarterly_avg = df.groupby('quarter')[value_col].mean()
        quarterly_variation = quarterly_avg.std() / quarterly_avg.mean()

        return {
            "has_monthly_pattern": monthly_variation > 0.1,
            "monthly_averages": monthly_avg.to_dict(),
            "has_quarterly_pattern": quarterly_variation > 0.1,
            "quarterly_averages": quarterly_avg.to_dict(),
            "strongest_month": monthly_avg.idxmax(),
            "weakest_month": monthly_avg.idxmin()
        }

Understanding Statistical Analysis:

Metric Summary Calculation

Input: Revenue data (12 months)[100K, 105K, 98K, 110K, 115K, 108K, 120K, 125K, 118K, 130K, 135K, 150K]

MetricSummarycurrent: 150K (last value), previous: 135K, change: +15K, change_percent: +11.1%, trend: 'up' (change > 1%), is_anomaly: true (z-score > 2)

Anomaly Detection (z-score)mean = 117.8K, std = 14.5K. z = |150K - 117.8K| / 14.5K = 2.22 > 2 = ANOMALY

Analysis Type	What It Detects	Used For
Summary stats	Current vs previous period	Quick performance snapshot
Z-score anomaly	Unusual values (> 2 std devs)	Alert highlighting
Seasonality	Monthly/quarterly patterns	Contextualize comparisons
Growth rate	Compound growth over periods	Long-term trend analysis

Why z-score = 2: Captures values outside 95% of normal distribution - unusual enough to mention, not so strict that everything flags.

Narrative Generation

# src/generation/narrative.py
from typing import List, Dict, Any
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from ..analysis.statistics import MetricSummary
from ..config import settings

class ReportNarrative(BaseModel):
    """Structured narrative output."""
    executive_summary: str = Field(description="2-3 sentence overview")
    key_findings: List[str] = Field(description="3-5 bullet points")
    detailed_analysis: str = Field(description="2-3 paragraphs")
    recommendations: List[str] = Field(description="2-4 action items")
    outlook: str = Field(description="Forward-looking statement")

class NarrativeGenerator:
    """Generates natural language narratives from data analysis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.3
        )

    async def generate_metric_narrative(
        self,
        summary: MetricSummary,
        context: str = ""
    ) -> str:
        """Generate narrative for a single metric."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a business analyst writing reports.
Write clear, professional narratives about business metrics.
Use specific numbers and percentages. Be concise."""),
            ("human", """Metric: {name}
Current Value: {current:,.2f}
Previous Value: {previous:,.2f}
Change: {change:+,.2f} ({change_percent:+.1f}%)
Trend: {trend}
Is Anomaly: {is_anomaly}

Context: {context}

Write a 2-3 sentence narrative about this metric's performance.""")
        ])

        chain = prompt | self.llm
        result = await chain.ainvoke({
            "name": summary.name,
            "current": summary.current,
            "previous": summary.previous,
            "change": summary.change,
            "change_percent": summary.change_percent,
            "trend": summary.trend,
            "is_anomaly": summary.is_anomaly,
            "context": context
        })
        return result.content

    async def generate_full_narrative(
        self,
        metrics: List[MetricSummary],
        report_type: str,
        period: str,
        additional_context: Dict = None
    ) -> ReportNarrative:
        """Generate comprehensive report narrative."""
        structured_llm = self.llm.with_structured_output(ReportNarrative)

        metrics_text = "\n".join([
            f"- {m.name}: {m.current:,.2f} ({m.change_percent:+.1f}% {m.trend})"
            + (" [ANOMALY]" if m.is_anomaly else "")
            for m in metrics
        ])

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a senior business analyst writing {report_type} reports
for {company}. Write professional, insightful narratives that:
- Lead with the most important findings
- Explain what the numbers mean for the business
- Provide actionable recommendations
- Use confident, clear language"""),
            ("human", """Period: {period}

Key Metrics:
{metrics}

Additional Context:
{context}

Generate a comprehensive report narrative.""")
        ])

        chain = prompt | structured_llm
        result = await chain.ainvoke({
            "report_type": report_type,
            "company": settings.company_name,
            "period": period,
            "metrics": metrics_text,
            "context": str(additional_context or {})
        })
        return result

    async def generate_insight(
        self,
        data_description: str,
        question: str
    ) -> str:
        """Generate specific insight from data."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a data analyst. Answer questions about data concisely."),
            ("human", "Data: {data}\n\nQuestion: {question}")
        ])

        chain = prompt | self.llm
        result = await chain.ainvoke({
            "data": data_description,
            "question": question
        })
        return result.content

Why LLM for Narratives:

Narrative Generation Flow

Input: MetricSummary objectsRevenue: $150K (+11.1% up) [ANOMALY], Orders: 1,234 (+5.2% up), AOV: $121.55 (+5.6% up)

LLM with structured outputSystem: 'You are a senior business analyst...', Context: Company name, report type, period. Output: ReportNarrative (Pydantic model)

ReportNarrative outputexecutive_summary: 'Revenue hit an all-time high...', key_findings: ['Revenue grew 11.1%...', 'Order volume up 5.2%...'], recommendations: ['Investigate drivers of revenue spike...'], outlook: 'Momentum suggests strong Q4...'

Narrative Component	Purpose	Example
`executive_summary`	2-3 sentences for busy executives	"Revenue hit $150K, up 11%..."
`key_findings`	3-5 bullet points for quick scan	"• Order volume grew 5.2%..."
`detailed_analysis`	2-3 paragraphs with context	Explains WHY metrics moved
`recommendations`	2-4 actionable items	"Investigate Q4 marketing ROI"
`outlook`	Forward-looking statement	"Momentum suggests strong Q4"

Temperature = 0.3: Low enough for consistency, high enough for natural language variety.

Chart Generation

# src/generation/charts.py
from typing import List, Dict, Any, Optional
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import base64
from io import BytesIO

class ChartGenerator:
    """Generates charts and visualizations."""

    def __init__(self, theme: str = "plotly_white"):
        self.theme = theme
        self.colors = px.colors.qualitative.Set2

    def line_chart(
        self,
        df: pd.DataFrame,
        x: str,
        y: str,
        title: str,
        color: Optional[str] = None
    ) -> go.Figure:
        """Generate line chart."""
        fig = px.line(
            df, x=x, y=y, color=color,
            title=title,
            template=self.theme
        )
        fig.update_layout(
            hovermode='x unified',
            legend=dict(orientation="h", yanchor="bottom", y=1.02)
        )
        return fig

    def bar_chart(
        self,
        df: pd.DataFrame,
        x: str,
        y: str,
        title: str,
        orientation: str = "v"
    ) -> go.Figure:
        """Generate bar chart."""
        fig = px.bar(
            df, x=x, y=y,
            title=title,
            template=self.theme,
            orientation=orientation,
            color_discrete_sequence=self.colors
        )
        return fig

    def kpi_card(
        self,
        value: float,
        title: str,
        change: float = None,
        format_str: str = "{:,.0f}"
    ) -> go.Figure:
        """Generate KPI indicator card."""
        fig = go.Figure()

        fig.add_trace(go.Indicator(
            mode="number+delta" if change else "number",
            value=value,
            title={"text": title},
            delta={"reference": value - change} if change else None,
            number={"valueformat": ",.0f"}
        ))

        fig.update_layout(
            height=200,
            margin=dict(l=20, r=20, t=60, b=20)
        )
        return fig

    def comparison_chart(
        self,
        df: pd.DataFrame,
        category: str,
        current_col: str,
        previous_col: str,
        title: str
    ) -> go.Figure:
        """Generate comparison bar chart."""
        fig = go.Figure()

        fig.add_trace(go.Bar(
            name='Previous',
            x=df[category],
            y=df[previous_col],
            marker_color='lightgray'
        ))

        fig.add_trace(go.Bar(
            name='Current',
            x=df[category],
            y=df[current_col],
            marker_color='steelblue'
        ))

        fig.update_layout(
            title=title,
            barmode='group',
            template=self.theme
        )
        return fig

    def to_image(self, fig: go.Figure, format: str = "png") -> bytes:
        """Convert figure to image bytes."""
        return fig.to_image(format=format, scale=2)

    def to_base64(self, fig: go.Figure) -> str:
        """Convert figure to base64 string for embedding."""
        img_bytes = self.to_image(fig)
        return base64.b64encode(img_bytes).decode()

Chart Types for Reports:

Visualization Selection

line_chart

Time series and trends. Example: Revenue Over Time with peaks and valleys.

bar_chart

Comparisons and categories. Example: Sales by Region (East, West, North).

kpi_card

Single metric with delta. Example: $150,000 Revenue +11.1%.

comparison_chart

Period-over-period. Example: Previous vs Current bars for Jan, Feb, Mar.

Method	Output Format	Use In Report
`to_image()`	PNG bytes	Embedded in PDF
`to_base64()`	Base64 string	Inline in HTML
Direct `fig`	Plotly figure	Interactive web dashboard

Report Templates

# src/generation/templates.py
from typing import Dict, List, Any
import yaml
from pathlib import Path
from pydantic import BaseModel
from enum import Enum

class ReportFrequency(str, Enum):
    DAILY = "daily"
    WEEKLY = "weekly"
    MONTHLY = "monthly"
    QUARTERLY = "quarterly"

class MetricConfig(BaseModel):
    name: str
    source: str
    query: str
    display_name: str
    format: str = "{:,.2f}"
    show_trend: bool = True
    chart_type: str = "line"

class SectionConfig(BaseModel):
    title: str
    metrics: List[str]
    narrative_prompt: str
    charts: List[str] = []

class ReportTemplate(BaseModel):
    name: str
    frequency: ReportFrequency
    description: str
    data_sources: List[str]
    metrics: Dict[str, MetricConfig]
    sections: List[SectionConfig]
    recipients: List[str] = []

class TemplateManager:
    """Manages report templates."""

    def __init__(self, template_dir: str = "./templates"):
        self.template_dir = Path(template_dir)
        self.templates: Dict[str, ReportTemplate] = {}
        self._load_templates()

    def _load_templates(self):
        """Load all templates from directory."""
        for file in self.template_dir.glob("*.yaml"):
            with open(file) as f:
                data = yaml.safe_load(f)
                template = ReportTemplate(**data)
                self.templates[template.name] = template

    def get_template(self, name: str) -> ReportTemplate:
        """Get a template by name."""
        if name not in self.templates:
            raise ValueError(f"Template not found: {name}")
        return self.templates[name]

    def list_templates(self) -> List[str]:
        """List available templates."""
        return list(self.templates.keys())

Template-Driven Report Generation:

YAML Template Structure — weekly_sales.yaml

name + frequency'Weekly Sales Report', frequency: weekly (scheduler uses this)

metricstotal_revenue: source: main (DataManager looks up connector), query: 'SELECT date, SUM(amount)...', chart_type: line

sectionstitle: 'Revenue Overview', metrics: ['total_revenue'], narrative_prompt: 'Analyze revenue trends'

recipients[sales@company.com] — Benefits: non-engineers create reports via YAML, consistent structure, version-controlled, auto-scheduled

Template Field	Purpose
`frequency`	When to auto-generate (daily/weekly/monthly/quarterly)
`metrics.*.source`	Which DataConnector to use
`metrics.*.query`	Data retrieval query/endpoint
`sections`	Report structure with metrics and narratives
`recipients`	Auto-distribution email list

PDF Export

# src/output/pdf_export.py
from typing import Dict, List, Any
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML, CSS
from pathlib import Path
import tempfile
from datetime import datetime

class PDFExporter:
    """Exports reports to PDF format."""

    def __init__(self, template_dir: str = "./templates/pdf"):
        self.env = Environment(loader=FileSystemLoader(template_dir))
        self.css = self._load_styles()

    def _load_styles(self) -> CSS:
        """Load CSS styles for PDF."""
        css_content = """
        @page {
            size: A4;
            margin: 2cm;
            @top-right { content: "Page " counter(page) " of " counter(pages); }
        }
        body { font-family: 'Helvetica', sans-serif; line-height: 1.6; }
        h1 { color: #2c3e50; border-bottom: 2px solid #3498db; }
        h2 { color: #34495e; margin-top: 1.5em; }
        .kpi-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 20px; }
        .kpi-card {
            background: #f8f9fa; padding: 20px; border-radius: 8px;
            text-align: center; border-left: 4px solid #3498db;
        }
        .kpi-value { font-size: 2em; font-weight: bold; color: #2c3e50; }
        .kpi-change.positive { color: #27ae60; }
        .kpi-change.negative { color: #e74c3c; }
        .chart { margin: 20px 0; text-align: center; }
        .insight-box {
            background: #e8f4f8; padding: 15px; border-radius: 8px;
            margin: 15px 0; border-left: 4px solid #17a2b8;
        }
        table { width: 100%; border-collapse: collapse; margin: 20px 0; }
        th, td { padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }
        th { background: #3498db; color: white; }
        tr:nth-child(even) { background: #f8f9fa; }
        """
        return CSS(string=css_content)

    def render(
        self,
        template_name: str,
        data: Dict[str, Any],
        output_path: str = None
    ) -> bytes:
        """Render report to PDF."""
        template = self.env.get_template(f"{template_name}.html")

        # Add common data
        data["generated_at"] = datetime.now().strftime("%Y-%m-%d %H:%M")
        data["year"] = datetime.now().year

        html_content = template.render(**data)

        # Generate PDF
        html = HTML(string=html_content)
        pdf_bytes = html.write_pdf(stylesheets=[self.css])

        if output_path:
            Path(output_path).write_bytes(pdf_bytes)

        return pdf_bytes

    def create_simple_report(
        self,
        title: str,
        sections: List[Dict[str, Any]],
        charts: List[str] = None
    ) -> bytes:
        """Create a simple report without template."""
        html = f"""
        <!DOCTYPE html>
        <html>
        <head><title>{title}</title></head>
        <body>
            <h1>{title}</h1>
            <p class="date">Generated: {datetime.now().strftime('%B %d, %Y')}</p>
        """

        for section in sections:
            html += f"""
            <h2>{section['title']}</h2>
            <div class="content">{section['content']}</div>
            """

        if charts:
            html += '<div class="charts">'
            for chart_b64 in charts:
                html += f'<img src="data:image/png;base64,{chart_b64}" />'
            html += '</div>'

        html += "</body></html>"

        return HTML(string=html).write_pdf(stylesheets=[self.css])

PDF Generation Pipeline:

HTML to PDF Conversion

Data + NarrativeMetricSummary objects + ReportNarrative from LLM

Jinja2 Template Renderingreport_template.html + data = filled HTML. Variables: {{narrative.executive_summary}}. Loops: {% for finding in key_findings %}

WeasyPrint PDF GenerationCSS: Professional styling (grid, colors, fonts). @page: A4 size, margins, page numbers. Charts: Embedded as base64 images

PDF Outputbytes or file — ready for delivery

CSS Class	Visual Effect
`.kpi-grid`	3-column layout for KPI cards
`.kpi-card`	Styled box with left blue border
`.kpi-change.positive`	Green text for positive change
`.insight-box`	Blue-bordered callout for insights
`@page @top-right`	"Page 1 of 5" in header

Why WeasyPrint: Python-native, CSS-based styling, no external dependencies like wkhtmltopdf.

FastAPI Application

# src/api/main.py
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import FileResponse, StreamingResponse
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
import io

from ..data.connectors import DataManager, SQLConnector
from ..analysis.statistics import StatisticalAnalyzer
from ..generation.narrative import NarrativeGenerator
from ..generation.charts import ChartGenerator
from ..generation.templates import TemplateManager
from ..output.pdf_export import PDFExporter
from ..config import settings

app = FastAPI(
    title="Automated Reporting API",
    description="Generate business reports with AI-powered narratives"
)

# Initialize components
data_manager = DataManager()
data_manager.register("main", SQLConnector())

analyzer = StatisticalAnalyzer()
narrator = NarrativeGenerator()
charts = ChartGenerator()
templates = TemplateManager()
pdf_exporter = PDFExporter()

class ReportRequest(BaseModel):
    template: str
    period_start: str
    period_end: str
    output_format: str = "pdf"  # pdf, html, json

class ReportResponse(BaseModel):
    report_id: str
    status: str
    download_url: Optional[str] = None

@app.post("/reports/generate")
async def generate_report(
    request: ReportRequest,
    background_tasks: BackgroundTasks
):
    """Generate a report from template."""
    # Validate template
    try:
        template = templates.get_template(request.template)
    except ValueError as e:
        raise HTTPException(status_code=404, detail=str(e))

    # Generate report ID
    report_id = f"{request.template}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    # For demo, generate synchronously
    # In production, use background task

    # Fetch data for each metric
    metrics_data = {}
    for name, config in template.metrics.items():
        df = data_manager.fetch(config.source, config.query)
        summary = analyzer.calculate_summary(df, name, "date")
        metrics_data[name] = summary

    # Generate narratives
    narrative = await narrator.generate_full_narrative(
        list(metrics_data.values()),
        template.name,
        f"{request.period_start} to {request.period_end}"
    )

    # Generate PDF
    report_data = {
        "title": template.name,
        "period": f"{request.period_start} to {request.period_end}",
        "metrics": metrics_data,
        "narrative": narrative
    }

    pdf_bytes = pdf_exporter.create_simple_report(
        template.name,
        [
            {"title": "Executive Summary", "content": narrative.executive_summary},
            {"title": "Key Findings", "content": "<ul>" + "".join(f"<li>{f}</li>" for f in narrative.key_findings) + "</ul>"},
            {"title": "Detailed Analysis", "content": narrative.detailed_analysis},
            {"title": "Recommendations", "content": "<ul>" + "".join(f"<li>{r}</li>" for r in narrative.recommendations) + "</ul>"},
        ]
    )

    return StreamingResponse(
        io.BytesIO(pdf_bytes),
        media_type="application/pdf",
        headers={"Content-Disposition": f"attachment; filename={report_id}.pdf"}
    )

@app.get("/templates")
async def list_templates():
    """List available report templates."""
    return {"templates": templates.list_templates()}

@app.get("/health")
async def health():
    return {"status": "healthy"}

Example Template

# templates/weekly_sales.yaml
name: "Weekly Sales Report"
frequency: weekly
description: "Weekly sales performance summary"

data_sources:
  - main

metrics:
  total_revenue:
    name: total_revenue
    source: main
    query: "SELECT date, SUM(amount) as total_revenue FROM sales GROUP BY date"
    display_name: "Total Revenue"
    format: "${:,.2f}"
    chart_type: line

  order_count:
    name: order_count
    source: main
    query: "SELECT date, COUNT(*) as order_count FROM orders GROUP BY date"
    display_name: "Order Count"
    format: "{:,.0f}"
    chart_type: bar

  avg_order_value:
    name: avg_order_value
    source: main
    query: "SELECT date, AVG(amount) as avg_order_value FROM orders GROUP BY date"
    display_name: "Average Order Value"
    format: "${:,.2f}"
    chart_type: line

sections:
  - title: "Revenue Overview"
    metrics: ["total_revenue"]
    narrative_prompt: "Analyze revenue performance and trends"
    charts: ["revenue_trend"]

  - title: "Order Analysis"
    metrics: ["order_count", "avg_order_value"]
    narrative_prompt: "Analyze order volume and value trends"
    charts: ["orders_comparison"]

recipients:
  - sales@company.com
  - leadership@company.com

Business Impact

Metric	Before	After	Improvement
Report creation time	4 hours	15 minutes	94% reduction
Analyst hours/week	20 hours	3 hours	85% reduction
Report consistency	Variable	100% standardized	Complete uniformity
Insights per report	3-5	8-12	2-3x more insights
Report frequency	Monthly	Weekly	4x more frequent

Key Learnings

Templates enable scale - Well-designed templates allow non-technical users to create reports
LLM narratives add value - AI-generated insights catch patterns humans miss
Structured output is critical - Pydantic models ensure consistent report structure
Caching matters - Cache data queries to enable fast iteration on narratives

Key Concepts Recap

Concept	What It Is	Why It Matters
Data Connectors	Pluggable interfaces for SQL, CSV, APIs	Single template can pull from multiple sources
MetricSummary	Current, previous, change, trend, anomaly flag	Structured input for both charts and narratives
Z-score Anomaly	Flag values > 2 standard deviations from mean	Auto-highlights unusual metrics for attention
Seasonality Detection	Monthly/quarterly pattern analysis	Contextualizes comparisons (December always high)
ReportNarrative	Structured LLM output with Pydantic	Ensures consistent sections in every report
YAML Templates	Declarative report configuration	Non-engineers can create new reports
Plotly Charts	Interactive visualizations, export to PNG	Same charts for web dashboard and PDF
WeasyPrint	HTML+CSS to PDF conversion	Professional formatting without external tools
Jinja2 Templates	HTML templating with variables and loops	Clean separation of content and presentation
Structured Output	LLM returns Pydantic model, not free text	Guaranteed fields for executive_summary, recommendations, etc.

Next Steps

Add scheduled report generation with Celery
Implement interactive dashboards alongside PDFs
Add multi-language support for global teams
Build feedback loop to improve narratives

Automated Enterprise Reporting

Build a production-grade reporting system that transforms raw data into polished, narrative business reports with insights, visualizations, and recommendations.


Industry	Enterprise / Business Intelligence
Difficulty	Advanced
Time	1 week
Code	~1100 lines

TL;DR

Why This Case Study?

Metric	Manual Reporting	Automated with LLM
Time per report	4-8 hours	2-5 minutes
Report frequency	Weekly (limited by bandwidth)	On-demand or scheduled
Data freshness	Hours old by delivery	Real-time at generation
Narrative quality	Varies by analyst fatigue	Consistent with trend analysis
Annual time saved (10 reports/week)	Baseline	2,000-4,000 analyst hours

What You'll Build

An intelligent reporting system that:

Ingests multiple data sources - Databases, spreadsheets, APIs
Analyzes trends - Statistical analysis and anomaly detection
Generates narratives - Natural language explanations of data
Creates visualizations - Charts and graphs from data
Produces formatted reports - PDF, HTML, and email-ready outputs

Architecture

Automated Reporting Architecture

Data Sources

Database

External APIs

Spreadsheets

Previous Reports

Data Processing

ETL Pipeline

Statistical Analysis

Anomaly Detection

Report Generation

Template Selection

Narrative Writing

Chart Generation

Insight Extraction

Report Delivery

PDF Export

Web Dashboard

Email Distribution

Slack Integration

Project Structure

automated-reporting/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── connectors.py       # Data source connections
│   │   ├── etl.py              # Extract-Transform-Load
│   │   └── validators.py       # Data validation
│   ├── analysis/
│   │   ├── __init__.py
│   │   ├── statistics.py       # Statistical analysis
│   │   ├── trends.py           # Trend detection
│   │   └── anomalies.py        # Anomaly detection
│   ├── generation/
│   │   ├── __init__.py
│   │   ├── templates.py        # Report templates
│   │   ├── narrative.py        # LLM narrative generation
│   │   ├── insights.py         # Insight extraction
│   │   └── charts.py           # Visualization generation
│   ├── output/
│   │   ├── __init__.py
│   │   ├── pdf_export.py       # PDF generation
│   │   ├── html_export.py      # HTML output
│   │   └── distribution.py     # Email/Slack delivery
│   └── api/
│       ├── __init__.py
│       └── main.py             # FastAPI endpoints
├── templates/
│   ├── weekly_sales.yaml
│   ├── monthly_finance.yaml
│   └── quarterly_review.yaml
├── tests/
└── requirements.txt

Tech Stack

Technology	Purpose	Why
LangChain	LLM orchestration	Chain data analysis with narrative generation
OpenAI GPT-4o	Narrative generation	Strong at writing professional business prose
Pandas	Data manipulation	Standard for tabular data ETL
Plotly	Interactive charts	Publication-quality charts exportable to PNG
WeasyPrint	PDF generation	CSS-based PDF rendering for polished reports
SQLAlchemy	Database connections	Supports PostgreSQL, MySQL, SQLite uniformly
FastAPI	API endpoints	Async for long-running report generation
Jinja2	Template rendering	Compose HTML reports with dynamic sections

Implementation

Configuration

# src/config.py
from pydantic_settings import BaseSettings
from typing import Optional, List

class Settings(BaseSettings):
    # LLM Settings
    openai_api_key: str
    openai_model: str = "gpt-4o"

    # Database
    database_url: str = "postgresql://user:pass@localhost/reporting"

    # Report Settings
    company_name: str = "Acme Corp"
    report_output_dir: str = "./reports"
    default_currency: str = "USD"

    # Email Settings
    smtp_host: Optional[str] = None
    smtp_port: int = 587
    smtp_user: Optional[str] = None
    smtp_password: Optional[str] = None

    # Slack Settings
    slack_webhook_url: Optional[str] = None

    class Config:
        env_file = ".env"

settings = Settings()

Data Connectors

# src/data/connectors.py
from typing import Dict, Any, List
import pandas as pd
from sqlalchemy import create_engine, text
from abc import ABC, abstractmethod
from ..config import settings

class DataConnector(ABC):
    """Base class for data connectors."""

    @abstractmethod
    def fetch(self, query: str) -> pd.DataFrame:
        pass

class SQLConnector(DataConnector):
    """SQL database connector."""

    def __init__(self, connection_string: str = None):
        self.engine = create_engine(
            connection_string or settings.database_url
        )

    def fetch(self, query: str) -> pd.DataFrame:
        """Execute SQL query and return DataFrame."""
        with self.engine.connect() as conn:
            return pd.read_sql(text(query), conn)

    def fetch_table(self, table: str, limit: int = None) -> pd.DataFrame:
        """Fetch entire table."""
        query = f"SELECT * FROM {table}"
        if limit:
            query += f" LIMIT {limit}"
        return self.fetch(query)

class CSVConnector(DataConnector):
    """CSV file connector."""

    def __init__(self, base_path: str = "./data"):
        self.base_path = base_path

    def fetch(self, filename: str) -> pd.DataFrame:
        """Load CSV file."""
        import os
        filepath = os.path.join(self.base_path, filename)
        return pd.read_csv(filepath)

class APIConnector(DataConnector):
    """REST API connector."""

    def __init__(self, base_url: str, headers: Dict = None):
        self.base_url = base_url
        self.headers = headers or {}

    def fetch(self, endpoint: str) -> pd.DataFrame:
        """Fetch data from API endpoint."""
        import requests
        response = requests.get(
            f"{self.base_url}/{endpoint}",
            headers=self.headers
        )
        response.raise_for_status()
        return pd.DataFrame(response.json())


class DataManager:
    """Manages multiple data sources."""

    def __init__(self):
        self.connectors: Dict[str, DataConnector] = {}

    def register(self, name: str, connector: DataConnector):
        """Register a data connector."""
        self.connectors[name] = connector

    def fetch(self, source: str, query: str) -> pd.DataFrame:
        """Fetch data from a registered source."""
        if source not in self.connectors:
            raise ValueError(f"Unknown data source: {source}")
        return self.connectors[source].fetch(query)

Why Pluggable Data Connectors:

Data Connector Pattern

DataManager (registry)

'main' -> SQLConnector

PostgreSQL. fetch(query) returns DataFrame

'sheets' -> CSVConnector

Google Sheets. fetch(filename) returns DataFrame

'crm' -> APIConnector

Salesforce. fetch(endpoint) returns DataFrame

Template config selects connector by source name (e.g., source: main picks SQLConnector)

Connector	Use Case	Authentication
SQLConnector	Data warehouse, production DB	Connection string
CSVConnector	Spreadsheet exports, historical data	File path
APIConnector	CRM, marketing tools, external services	API key/OAuth

All connectors return pandas DataFrame - unified interface regardless of source.

Statistical Analysis

# src/analysis/statistics.py
from typing import Dict, List, Any, Optional
import pandas as pd
import numpy as np
from dataclasses import dataclass

@dataclass
class MetricSummary:
    """Summary statistics for a metric."""
    name: str
    current: float
    previous: float
    change: float
    change_percent: float
    trend: str  # up, down, stable
    is_anomaly: bool

class StatisticalAnalyzer:
    """Performs statistical analysis on data."""

    def __init__(self, significance_threshold: float = 0.05):
        self.significance_threshold = significance_threshold

    def calculate_summary(
        self,
        df: pd.DataFrame,
        metric_col: str,
        period_col: str
    ) -> MetricSummary:
        """Calculate summary statistics for a metric."""
        sorted_df = df.sort_values(period_col)

        current = sorted_df[metric_col].iloc[-1]
        previous = sorted_df[metric_col].iloc[-2] if len(sorted_df) > 1 else current

        change = current - previous
        change_percent = (change / previous * 100) if previous != 0 else 0

        # Determine trend
        if abs(change_percent) < 1:
            trend = "stable"
        elif change > 0:
            trend = "up"
        else:
            trend = "down"

        # Check for anomaly using z-score
        mean = sorted_df[metric_col].mean()
        std = sorted_df[metric_col].std()
        z_score = abs((current - mean) / std) if std > 0 else 0
        is_anomaly = z_score > 2

        return MetricSummary(
            name=metric_col,
            current=current,
            previous=previous,
            change=change,
            change_percent=change_percent,
            trend=trend,
            is_anomaly=is_anomaly
        )

    def calculate_growth_rate(
        self,
        values: List[float],
        periods: int = 1
    ) -> float:
        """Calculate compound growth rate."""
        if len(values) < 2:
            return 0.0

        start = values[0]
        end = values[-1]

        if start <= 0:
            return 0.0

        return ((end / start) ** (1 / periods) - 1) * 100

    def detect_seasonality(
        self,
        df: pd.DataFrame,
        value_col: str,
        date_col: str
    ) -> Dict[str, Any]:
        """Detect seasonal patterns in data."""
        df = df.copy()
        df[date_col] = pd.to_datetime(df[date_col])
        df['month'] = df[date_col].dt.month
        df['quarter'] = df[date_col].dt.quarter
        df['dayofweek'] = df[date_col].dt.dayofweek

        # Monthly seasonality
        monthly_avg = df.groupby('month')[value_col].mean()
        monthly_variation = monthly_avg.std() / monthly_avg.mean()

        # Quarterly seasonality
        quarterly_avg = df.groupby('quarter')[value_col].mean()
        quarterly_variation = quarterly_avg.std() / quarterly_avg.mean()

        return {
            "has_monthly_pattern": monthly_variation > 0.1,
            "monthly_averages": monthly_avg.to_dict(),
            "has_quarterly_pattern": quarterly_variation > 0.1,
            "quarterly_averages": quarterly_avg.to_dict(),
            "strongest_month": monthly_avg.idxmax(),
            "weakest_month": monthly_avg.idxmin()
        }

Understanding Statistical Analysis:

Metric Summary Calculation

Input: Revenue data (12 months)[100K, 105K, 98K, 110K, 115K, 108K, 120K, 125K, 118K, 130K, 135K, 150K]

MetricSummarycurrent: 150K (last value), previous: 135K, change: +15K, change_percent: +11.1%, trend: 'up' (change > 1%), is_anomaly: true (z-score > 2)

Anomaly Detection (z-score)mean = 117.8K, std = 14.5K. z = |150K - 117.8K| / 14.5K = 2.22 > 2 = ANOMALY

Analysis Type	What It Detects	Used For
Summary stats	Current vs previous period	Quick performance snapshot
Z-score anomaly	Unusual values (> 2 std devs)	Alert highlighting
Seasonality	Monthly/quarterly patterns	Contextualize comparisons
Growth rate	Compound growth over periods	Long-term trend analysis

Why z-score = 2: Captures values outside 95% of normal distribution - unusual enough to mention, not so strict that everything flags.

Narrative Generation

# src/generation/narrative.py
from typing import List, Dict, Any
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from ..analysis.statistics import MetricSummary
from ..config import settings

class ReportNarrative(BaseModel):
    """Structured narrative output."""
    executive_summary: str = Field(description="2-3 sentence overview")
    key_findings: List[str] = Field(description="3-5 bullet points")
    detailed_analysis: str = Field(description="2-3 paragraphs")
    recommendations: List[str] = Field(description="2-4 action items")
    outlook: str = Field(description="Forward-looking statement")

class NarrativeGenerator:
    """Generates natural language narratives from data analysis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.3
        )

    async def generate_metric_narrative(
        self,
        summary: MetricSummary,
        context: str = ""
    ) -> str:
        """Generate narrative for a single metric."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a business analyst writing reports.
Write clear, professional narratives about business metrics.
Use specific numbers and percentages. Be concise."""),
            ("human", """Metric: {name}
Current Value: {current:,.2f}
Previous Value: {previous:,.2f}
Change: {change:+,.2f} ({change_percent:+.1f}%)
Trend: {trend}
Is Anomaly: {is_anomaly}

Context: {context}

Write a 2-3 sentence narrative about this metric's performance.""")
        ])

        chain = prompt | self.llm
        result = await chain.ainvoke({
            "name": summary.name,
            "current": summary.current,
            "previous": summary.previous,
            "change": summary.change,
            "change_percent": summary.change_percent,
            "trend": summary.trend,
            "is_anomaly": summary.is_anomaly,
            "context": context
        })
        return result.content

    async def generate_full_narrative(
        self,
        metrics: List[MetricSummary],
        report_type: str,
        period: str,
        additional_context: Dict = None
    ) -> ReportNarrative:
        """Generate comprehensive report narrative."""
        structured_llm = self.llm.with_structured_output(ReportNarrative)

        metrics_text = "\n".join([
            f"- {m.name}: {m.current:,.2f} ({m.change_percent:+.1f}% {m.trend})"
            + (" [ANOMALY]" if m.is_anomaly else "")
            for m in metrics
        ])

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a senior business analyst writing {report_type} reports
for {company}. Write professional, insightful narratives that:
- Lead with the most important findings
- Explain what the numbers mean for the business
- Provide actionable recommendations
- Use confident, clear language"""),
            ("human", """Period: {period}

Key Metrics:
{metrics}

Additional Context:
{context}

Generate a comprehensive report narrative.""")
        ])

        chain = prompt | structured_llm
        result = await chain.ainvoke({
            "report_type": report_type,
            "company": settings.company_name,
            "period": period,
            "metrics": metrics_text,
            "context": str(additional_context or {})
        })
        return result

    async def generate_insight(
        self,
        data_description: str,
        question: str
    ) -> str:
        """Generate specific insight from data."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a data analyst. Answer questions about data concisely."),
            ("human", "Data: {data}\n\nQuestion: {question}")
        ])

        chain = prompt | self.llm
        result = await chain.ainvoke({
            "data": data_description,
            "question": question
        })
        return result.content

Why LLM for Narratives:

Narrative Generation Flow

Input: MetricSummary objectsRevenue: $150K (+11.1% up) [ANOMALY], Orders: 1,234 (+5.2% up), AOV: $121.55 (+5.6% up)

LLM with structured outputSystem: 'You are a senior business analyst...', Context: Company name, report type, period. Output: ReportNarrative (Pydantic model)

Narrative Component	Purpose	Example
`executive_summary`	2-3 sentences for busy executives	"Revenue hit $150K, up 11%..."
`key_findings`	3-5 bullet points for quick scan	"• Order volume grew 5.2%..."
`detailed_analysis`	2-3 paragraphs with context	Explains WHY metrics moved
`recommendations`	2-4 actionable items	"Investigate Q4 marketing ROI"
`outlook`	Forward-looking statement	"Momentum suggests strong Q4"

Temperature = 0.3: Low enough for consistency, high enough for natural language variety.

Chart Generation

# src/generation/charts.py
from typing import List, Dict, Any, Optional
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import base64
from io import BytesIO

class ChartGenerator:
    """Generates charts and visualizations."""

    def __init__(self, theme: str = "plotly_white"):
        self.theme = theme
        self.colors = px.colors.qualitative.Set2

    def line_chart(
        self,
        df: pd.DataFrame,
        x: str,
        y: str,
        title: str,
        color: Optional[str] = None
    ) -> go.Figure:
        """Generate line chart."""
        fig = px.line(
            df, x=x, y=y, color=color,
            title=title,
            template=self.theme
        )
        fig.update_layout(
            hovermode='x unified',
            legend=dict(orientation="h", yanchor="bottom", y=1.02)
        )
        return fig

    def bar_chart(
        self,
        df: pd.DataFrame,
        x: str,
        y: str,
        title: str,
        orientation: str = "v"
    ) -> go.Figure:
        """Generate bar chart."""
        fig = px.bar(
            df, x=x, y=y,
            title=title,
            template=self.theme,
            orientation=orientation,
            color_discrete_sequence=self.colors
        )
        return fig

    def kpi_card(
        self,
        value: float,
        title: str,
        change: float = None,
        format_str: str = "{:,.0f}"
    ) -> go.Figure:
        """Generate KPI indicator card."""
        fig = go.Figure()

        fig.add_trace(go.Indicator(
            mode="number+delta" if change else "number",
            value=value,
            title={"text": title},
            delta={"reference": value - change} if change else None,
            number={"valueformat": ",.0f"}
        ))

        fig.update_layout(
            height=200,
            margin=dict(l=20, r=20, t=60, b=20)
        )
        return fig

    def comparison_chart(
        self,
        df: pd.DataFrame,
        category: str,
        current_col: str,
        previous_col: str,
        title: str
    ) -> go.Figure:
        """Generate comparison bar chart."""
        fig = go.Figure()

        fig.add_trace(go.Bar(
            name='Previous',
            x=df[category],
            y=df[previous_col],
            marker_color='lightgray'
        ))

        fig.add_trace(go.Bar(
            name='Current',
            x=df[category],
            y=df[current_col],
            marker_color='steelblue'
        ))

        fig.update_layout(
            title=title,
            barmode='group',
            template=self.theme
        )
        return fig

    def to_image(self, fig: go.Figure, format: str = "png") -> bytes:
        """Convert figure to image bytes."""
        return fig.to_image(format=format, scale=2)

    def to_base64(self, fig: go.Figure) -> str:
        """Convert figure to base64 string for embedding."""
        img_bytes = self.to_image(fig)
        return base64.b64encode(img_bytes).decode()

Chart Types for Reports:

Visualization Selection

line_chart

Time series and trends. Example: Revenue Over Time with peaks and valleys.

bar_chart

Comparisons and categories. Example: Sales by Region (East, West, North).

kpi_card

Single metric with delta. Example: $150,000 Revenue +11.1%.

comparison_chart

Period-over-period. Example: Previous vs Current bars for Jan, Feb, Mar.

Method	Output Format	Use In Report
`to_image()`	PNG bytes	Embedded in PDF
`to_base64()`	Base64 string	Inline in HTML
Direct `fig`	Plotly figure	Interactive web dashboard

Report Templates

# src/generation/templates.py
from typing import Dict, List, Any
import yaml
from pathlib import Path
from pydantic import BaseModel
from enum import Enum

class ReportFrequency(str, Enum):
    DAILY = "daily"
    WEEKLY = "weekly"
    MONTHLY = "monthly"
    QUARTERLY = "quarterly"

class MetricConfig(BaseModel):
    name: str
    source: str
    query: str
    display_name: str
    format: str = "{:,.2f}"
    show_trend: bool = True
    chart_type: str = "line"

class SectionConfig(BaseModel):
    title: str
    metrics: List[str]
    narrative_prompt: str
    charts: List[str] = []

class ReportTemplate(BaseModel):
    name: str
    frequency: ReportFrequency
    description: str
    data_sources: List[str]
    metrics: Dict[str, MetricConfig]
    sections: List[SectionConfig]
    recipients: List[str] = []

class TemplateManager:
    """Manages report templates."""

    def __init__(self, template_dir: str = "./templates"):
        self.template_dir = Path(template_dir)
        self.templates: Dict[str, ReportTemplate] = {}
        self._load_templates()

    def _load_templates(self):
        """Load all templates from directory."""
        for file in self.template_dir.glob("*.yaml"):
            with open(file) as f:
                data = yaml.safe_load(f)
                template = ReportTemplate(**data)
                self.templates[template.name] = template

    def get_template(self, name: str) -> ReportTemplate:
        """Get a template by name."""
        if name not in self.templates:
            raise ValueError(f"Template not found: {name}")
        return self.templates[name]

    def list_templates(self) -> List[str]:
        """List available templates."""
        return list(self.templates.keys())

Template-Driven Report Generation:

YAML Template Structure — weekly_sales.yaml

name + frequency'Weekly Sales Report', frequency: weekly (scheduler uses this)

metricstotal_revenue: source: main (DataManager looks up connector), query: 'SELECT date, SUM(amount)...', chart_type: line

sectionstitle: 'Revenue Overview', metrics: ['total_revenue'], narrative_prompt: 'Analyze revenue trends'

recipients[sales@company.com] — Benefits: non-engineers create reports via YAML, consistent structure, version-controlled, auto-scheduled

Template Field	Purpose
`frequency`	When to auto-generate (daily/weekly/monthly/quarterly)
`metrics.*.source`	Which DataConnector to use
`metrics.*.query`	Data retrieval query/endpoint
`sections`	Report structure with metrics and narratives
`recipients`	Auto-distribution email list

PDF Export

# src/output/pdf_export.py
from typing import Dict, List, Any
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML, CSS
from pathlib import Path
import tempfile
from datetime import datetime

class PDFExporter:
    """Exports reports to PDF format."""

    def __init__(self, template_dir: str = "./templates/pdf"):
        self.env = Environment(loader=FileSystemLoader(template_dir))
        self.css = self._load_styles()

    def _load_styles(self) -> CSS:
        """Load CSS styles for PDF."""
        css_content = """
        @page {
            size: A4;
            margin: 2cm;
            @top-right { content: "Page " counter(page) " of " counter(pages); }
        }
        body { font-family: 'Helvetica', sans-serif; line-height: 1.6; }
        h1 { color: #2c3e50; border-bottom: 2px solid #3498db; }
        h2 { color: #34495e; margin-top: 1.5em; }
        .kpi-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 20px; }
        .kpi-card {
            background: #f8f9fa; padding: 20px; border-radius: 8px;
            text-align: center; border-left: 4px solid #3498db;
        }
        .kpi-value { font-size: 2em; font-weight: bold; color: #2c3e50; }
        .kpi-change.positive { color: #27ae60; }
        .kpi-change.negative { color: #e74c3c; }
        .chart { margin: 20px 0; text-align: center; }
        .insight-box {
            background: #e8f4f8; padding: 15px; border-radius: 8px;
            margin: 15px 0; border-left: 4px solid #17a2b8;
        }
        table { width: 100%; border-collapse: collapse; margin: 20px 0; }
        th, td { padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }
        th { background: #3498db; color: white; }
        tr:nth-child(even) { background: #f8f9fa; }
        """
        return CSS(string=css_content)

    def render(
        self,
        template_name: str,
        data: Dict[str, Any],
        output_path: str = None
    ) -> bytes:
        """Render report to PDF."""
        template = self.env.get_template(f"{template_name}.html")

        # Add common data
        data["generated_at"] = datetime.now().strftime("%Y-%m-%d %H:%M")
        data["year"] = datetime.now().year

        html_content = template.render(**data)

        # Generate PDF
        html = HTML(string=html_content)
        pdf_bytes = html.write_pdf(stylesheets=[self.css])

        if output_path:
            Path(output_path).write_bytes(pdf_bytes)

        return pdf_bytes

    def create_simple_report(
        self,
        title: str,
        sections: List[Dict[str, Any]],
        charts: List[str] = None
    ) -> bytes:
        """Create a simple report without template."""
        html = f"""
        <!DOCTYPE html>
        <html>
        <head><title>{title}</title></head>
        <body>
            <h1>{title}</h1>
            <p class="date">Generated: {datetime.now().strftime('%B %d, %Y')}</p>
        """

        for section in sections:
            html += f"""
            <h2>{section['title']}</h2>
            <div class="content">{section['content']}</div>
            """

        if charts:
            html += '<div class="charts">'
            for chart_b64 in charts:
                html += f'<img src="data:image/png;base64,{chart_b64}" />'
            html += '</div>'

        html += "</body></html>"

        return HTML(string=html).write_pdf(stylesheets=[self.css])

PDF Generation Pipeline:

HTML to PDF Conversion

Data + NarrativeMetricSummary objects + ReportNarrative from LLM

Jinja2 Template Renderingreport_template.html + data = filled HTML. Variables: {{narrative.executive_summary}}. Loops: {% for finding in key_findings %}

WeasyPrint PDF GenerationCSS: Professional styling (grid, colors, fonts). @page: A4 size, margins, page numbers. Charts: Embedded as base64 images

PDF Outputbytes or file — ready for delivery

CSS Class	Visual Effect
`.kpi-grid`	3-column layout for KPI cards
`.kpi-card`	Styled box with left blue border
`.kpi-change.positive`	Green text for positive change
`.insight-box`	Blue-bordered callout for insights
`@page @top-right`	"Page 1 of 5" in header

Why WeasyPrint: Python-native, CSS-based styling, no external dependencies like wkhtmltopdf.

FastAPI Application

# src/api/main.py
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import FileResponse, StreamingResponse
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
import io

from ..data.connectors import DataManager, SQLConnector
from ..analysis.statistics import StatisticalAnalyzer
from ..generation.narrative import NarrativeGenerator
from ..generation.charts import ChartGenerator
from ..generation.templates import TemplateManager
from ..output.pdf_export import PDFExporter
from ..config import settings

app = FastAPI(
    title="Automated Reporting API",
    description="Generate business reports with AI-powered narratives"
)

# Initialize components
data_manager = DataManager()
data_manager.register("main", SQLConnector())

analyzer = StatisticalAnalyzer()
narrator = NarrativeGenerator()
charts = ChartGenerator()
templates = TemplateManager()
pdf_exporter = PDFExporter()

class ReportRequest(BaseModel):
    template: str
    period_start: str
    period_end: str
    output_format: str = "pdf"  # pdf, html, json

class ReportResponse(BaseModel):
    report_id: str
    status: str
    download_url: Optional[str] = None

@app.post("/reports/generate")
async def generate_report(
    request: ReportRequest,
    background_tasks: BackgroundTasks
):
    """Generate a report from template."""
    # Validate template
    try:
        template = templates.get_template(request.template)
    except ValueError as e:
        raise HTTPException(status_code=404, detail=str(e))

    # Generate report ID
    report_id = f"{request.template}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    # For demo, generate synchronously
    # In production, use background task

    # Fetch data for each metric
    metrics_data = {}
    for name, config in template.metrics.items():
        df = data_manager.fetch(config.source, config.query)
        summary = analyzer.calculate_summary(df, name, "date")
        metrics_data[name] = summary

    # Generate narratives
    narrative = await narrator.generate_full_narrative(
        list(metrics_data.values()),
        template.name,
        f"{request.period_start} to {request.period_end}"
    )

    # Generate PDF
    report_data = {
        "title": template.name,
        "period": f"{request.period_start} to {request.period_end}",
        "metrics": metrics_data,
        "narrative": narrative
    }

    pdf_bytes = pdf_exporter.create_simple_report(
        template.name,
        [
            {"title": "Executive Summary", "content": narrative.executive_summary},
            {"title": "Key Findings", "content": "<ul>" + "".join(f"<li>{f}</li>" for f in narrative.key_findings) + "</ul>"},
            {"title": "Detailed Analysis", "content": narrative.detailed_analysis},
            {"title": "Recommendations", "content": "<ul>" + "".join(f"<li>{r}</li>" for r in narrative.recommendations) + "</ul>"},
        ]
    )

    return StreamingResponse(
        io.BytesIO(pdf_bytes),
        media_type="application/pdf",
        headers={"Content-Disposition": f"attachment; filename={report_id}.pdf"}
    )

@app.get("/templates")
async def list_templates():
    """List available report templates."""
    return {"templates": templates.list_templates()}

@app.get("/health")
async def health():
    return {"status": "healthy"}

Example Template

# templates/weekly_sales.yaml
name: "Weekly Sales Report"
frequency: weekly
description: "Weekly sales performance summary"

data_sources:
  - main

metrics:
  total_revenue:
    name: total_revenue
    source: main
    query: "SELECT date, SUM(amount) as total_revenue FROM sales GROUP BY date"
    display_name: "Total Revenue"
    format: "${:,.2f}"
    chart_type: line

  order_count:
    name: order_count
    source: main
    query: "SELECT date, COUNT(*) as order_count FROM orders GROUP BY date"
    display_name: "Order Count"
    format: "{:,.0f}"
    chart_type: bar

  avg_order_value:
    name: avg_order_value
    source: main
    query: "SELECT date, AVG(amount) as avg_order_value FROM orders GROUP BY date"
    display_name: "Average Order Value"
    format: "${:,.2f}"
    chart_type: line

sections:
  - title: "Revenue Overview"
    metrics: ["total_revenue"]
    narrative_prompt: "Analyze revenue performance and trends"
    charts: ["revenue_trend"]

  - title: "Order Analysis"
    metrics: ["order_count", "avg_order_value"]
    narrative_prompt: "Analyze order volume and value trends"
    charts: ["orders_comparison"]

recipients:
  - sales@company.com
  - leadership@company.com

Business Impact

Metric	Before	After	Improvement
Report creation time	4 hours	15 minutes	94% reduction
Analyst hours/week	20 hours	3 hours	85% reduction
Report consistency	Variable	100% standardized	Complete uniformity
Insights per report	3-5	8-12	2-3x more insights
Report frequency	Monthly	Weekly	4x more frequent

Key Learnings

Templates enable scale - Well-designed templates allow non-technical users to create reports
LLM narratives add value - AI-generated insights catch patterns humans miss
Structured output is critical - Pydantic models ensure consistent report structure
Caching matters - Cache data queries to enable fast iteration on narratives

Key Concepts Recap

Concept	What It Is	Why It Matters
Data Connectors	Pluggable interfaces for SQL, CSV, APIs	Single template can pull from multiple sources
MetricSummary	Current, previous, change, trend, anomaly flag	Structured input for both charts and narratives
Z-score Anomaly	Flag values > 2 standard deviations from mean	Auto-highlights unusual metrics for attention
Seasonality Detection	Monthly/quarterly pattern analysis	Contextualizes comparisons (December always high)
ReportNarrative	Structured LLM output with Pydantic	Ensures consistent sections in every report
YAML Templates	Declarative report configuration	Non-engineers can create new reports
Plotly Charts	Interactive visualizations, export to PNG	Same charts for web dashboard and PDF
WeasyPrint	HTML+CSS to PDF conversion	Professional formatting without external tools
Jinja2 Templates	HTML templating with variables and loops	Clean separation of content and presentation
Structured Output	LLM returns Pydantic model, not free text	Guaranteed fields for executive_summary, recommendations, etc.

Next Steps

Add scheduled report generation with Celery
Implement interactive dashboards alongside PDFs
Add multi-language support for global teams
Build feedback loop to improve narratives

Automated Enterprise Reporting

On this page

Automated Enterprise Reporting

On this page