This is the full technical breakdown. Architecture evolution, code snippets, database schema, and every pit from 97 commits — all laid out. If you're building something similar, this should save you a few weeks.

The Real Pain Points

The client is a custom aluminum window and door manufacturer in Sydney. The boss is Chinese.

Most people's first assumption would be "he's drowning in emails." That's wrong. The real pain points were two things:

1. Cross-country workflow gap. Australian B2B runs on email. Customers send inquiries, floor plans, quote confirmations, and progress updates — all via email. But the boss spent years doing business in China, where everything runs through WeChat — voice messages, screenshots, phone calls. Email wasn't "too much" for him — it was foreign. He didn't know when to reply, what format to use, or how to chase a two-week-old email thread.

2. Language barrier. The boss can hold a casual English conversation, but can't write professional business emails. "Please confirm the quote at your convenience," "Attached is the revised drawing for your review" — these sentences don't come naturally. His previous workflow was typing content into ChatGPT one message at a time, copy-pasting back to Outlook. It worked, but a single email took 10-15 minutes.

So the requirement wasn't "automate my email." It was: let me drive an unfamiliar workflow (English email + file management) from a familiar interface (Chinese, messaging platform).

This requirement definition is crucial — it shaped every architecture decision that followed. The agent's core value isn't "automation," it's "bridging" — bridging language, bridging work habits, bridging platforms.

Architecture Overview

                    ┌─────────────┐
                    │   Outlook   │
                    │  (emails)   │
                    └──────┬──────┘
                           │ Microsoft Graph API
                           ▼
              ┌────────────────────────┐
              │   Email Pipeline       │
              │  (every 5 min)         │
              │                        │
              │  1. Incremental fetch  │
              │  2. Rule pre-filter    │
              │  3. LLM classify       │
              │  4. Extract fields     │
              │  5. Dedup & merge      │
              │  6. Store → Supabase   │
              │  7. Notify boss        │
              └────────────┬───────────┘
                           │
              ┌────────────▼───────────┐
              │   Supabase PostgreSQL   │
              │                        │
              │  inquiries             │
              │  bot_sessions          │
              │  memories_global       │
              │  customer_memories     │
              └────────────┬───────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
    ┌────▼────┐      ┌────▼────┐      ┌────▼────┐
    │ Discord │      │Telegram │      │WhatsApp │
    │  Bot    │      │  Bot    │      │ Webhook │
    └────┬────┘      └────┬────┘      └────┬────┘
         │                 │                 │
         └────────┬────────┘─────────┬───────┘
                  │                  │
           ┌──────▼──────┐   ┌──────▼──────┐
           │ Chat Agent  │   │  OneDrive   │
           │   (LLM)    │   │  Watcher    │
           │ 20+ tools   │   │ (every 10m) │
           └─────────────┘   └─────────────┘

Three layers:

Data ingestion — Email pipeline + OneDrive watcher pull external data automatically
Storage — Supabase holds all inquiry state, session history, and extracted memories
Interaction — The boss talks to the agent through any messaging platform; the agent executes through 20+ tools

Architecture Evolution: Three Phases, Three Rewrites

I didn't plan it all out upfront. The project went through three architecture-level rewrites.

Phase 1: Classifier + Router (v1, lasted one day)

The v1 approach was textbook:

Boss sends message → Classifier determines intent → Route to handler → Return result

The classifier output a JSON: {"action": "stats"} or {"action": "advance"}. Hardcoded handlers executed the corresponding operation.

# v1: classifier + router (deprecated)
async def handle_message(message):
    intent = await classify_intent(message.content)
    handler = ROUTE_MAP.get(intent["action"])
    if handler:
        return await handler(message, intent)
    return "Unsupported operation"

ROUTE_MAP = {
    "stats": handle_stats,
    "advance": handle_advance,
    "draft": handle_draft,
    "files": handle_files,
}

The problem? Real scenarios aren't single-step.

The boss says "check Steward's floor plan." The classifier routes to "files," searches OneDrive — nothing found. Done.

But the correct approach is: search customer → check OneDrive → not there → search email attachments → found the PDF → return it. That's a multi-step reasoning chain. A classifier can't do that.

V1 broke on day one. Commit 6a3df39 ripped out the entire classifier architecture.

Phase 2: Agentic Tool-Calling (still running)

Replaced with a ReAct pattern: the agent gets a set of tools and decides the calling order itself.

Same "check Steward's floor plan" scenario:

Agent → search_customer("Steward") → found customer
Agent → list_customer_files(customer_id) → nothing on OneDrive
Agent → search_email_attachments(customer_id) → found PDF in email
Agent → download_and_show_attachments(attachment_id) → sent image to boss

Four autonomous steps. No hardcoded routing needed.

The core loop:

async def agent_loop(user_message, session, channel):
    messages = build_context(session, user_message)
    
    for round_num in range(MAX_ROUNDS := 15):
        response = await llm_call(
            MODEL_TIER_3,
            messages=messages,
            tools=ALL_TOOL_DEFINITIONS
        )
        
        if response.has_tool_calls:
            for tool_call in response.tool_calls:
                result = await execute_tool(tool_call, channel)
                messages.append({"role": "tool", "content": result,
                                "tool_call_id": tool_call.id})
            messages.append(response.assistant_message)
        else:
            await channel.send_text(response.content)
            return
    
    await channel.send_text("⚠️ Too many steps — please simplify the request")

The trade-off: cost and safety. Every LLM call costs money, and one task calling 4-5 tools adds up. And once the agent has "free will," the ways it can go wrong multiply — more on that below.

Phase 3: Platform Expansion (Discord → Telegram → WhatsApp)

The boss initially used Discord (because I developed on Discord). But he actually uses Telegram and WeChat day-to-day.

This led to the messaging abstraction layer:

class MessageChannel(ABC):
    """Unified interface for all messaging platforms"""
    
    @abstractmethod
    async def send_text(self, text: str) -> None: ...
    
    @abstractmethod
    async def send_file(self, file_path: str, filename: str) -> None: ...
    
    @abstractmethod
    async def send_preview(self, long_text: str) -> None: ...

Three implementations:

Platform	Char limit	Key implementation details
Discord	1990 chars	Native threads, emoji formatting
Telegram	4096 chars	Group chat support, boss-ID whitelist for approvals, HTML formatting
WhatsApp	4096 chars	Meta Cloud API webhook, media upload, phone verification

send_preview() chunks long responses at paragraph boundaries so nothing gets cut mid-sentence. Chunking logic adapts per platform's character limit.

Adding a new platform (Slack, Teams) only requires implementing this interface — zero changes to agent logic.

Lesson: build the MVP on the platform your user actually uses, not the one you're comfortable with.

Three-Tier Model Routing: The Key to Cost Control

The first month I used the most capable model for everything. The bill was a wake-up call.

Core principle: every task has a minimum intelligence threshold. If a cheap model can handle it, don't use an expensive one.

Tier	Role	Use case
Tier 1	Fast & cheap small model	Email classification, memory dedup, rule filtering
Tier 2	Mid-tier model	Field extraction, email composition
Tier 3	Full reasoning large model	Agent reasoning, complex decisions

# config.py
MODEL_TIER_1 = "..."   # fast & cheap
MODEL_TIER_2 = "..."   # mid-tier
MODEL_TIER_3 = "..."   # full reasoning

All calls go through a unified LLM routing API. Shared httpx connection pool, async semaphore (max 10 concurrent), exponential backoff on 429/5xx (3 retries).

# llm_client.py
_semaphore = asyncio.Semaphore(10)

async def llm_call(model, messages, tools=None, max_retries=3):
    async with _semaphore:
        for attempt in range(max_retries):
            try:
                resp = await _http_client.post(
                    LLM_API_ENDPOINT,
                    json={"model": model, "messages": messages, "tools": tools},
                    headers={"Authorization": f"Bearer {API_KEY}"}
                )
                if resp.status_code == 429:
                    await asyncio.sleep(2 ** attempt)
                    continue
                return resp.json()
            except httpx.TimeoutException:
                if attempt == max_retries - 1:
                    raise

Classifying whether an email is a real inquiry or spam — a small model with a good prompt is just as accurate as a large model, at a fraction of the cost.

JSONB vs Relational Tables

The inquiries table's most important column is context_data (JSONB):

CREATE TABLE inquiries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email_id TEXT UNIQUE NOT NULL,
  classification TEXT CHECK (classification IN ('A', 'B', 'C')),
  confidence FLOAT,
  status TEXT CHECK (status IN ('new', 'pending', 'scheduled', 'quoted', 'closed')),
  customer_name TEXT,
  customer_email TEXT,
  customer_phone TEXT,
  address TEXT,
  suburb TEXT,
  product_types TEXT[],
  product_description TEXT,
  dimensions TEXT,
  urgency TEXT DEFAULT 'medium',
  notes TEXT,
  raw_subject TEXT,
  raw_body TEXT,
  context_data JSONB,
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

context_data structure:

{
  "emails": [
    {
      "email_id": "AAMkAGI...",
      "subject": "Quote request - sliding doors",
      "body": "Hi, I need 3 sliding doors...",
      "received_at": "2024-03-15T10:00:00Z"
    }
  ],
  "pipeline": {
    "current_stage": "inquiry",
    "stage_updated_at": "2024-03-15T10:00:00Z",
    "history": [
      {
        "from": "new",
        "to": "inquiry",
        "at": "2024-03-15T10:00:00Z",
        "reason": "Auto-classified as A-type",
        "triggered_by": "pipeline"
      }
    ]
  },
  "memories": [
    {"type": "customer_note", "content": "Budget is limited, wants basic spec"}
  ],
  "has_dimensions": true,
  "glass_type": "double IGU",
  "language": "en"
}

Why not relational? I started with a normalized design — emails table, pipeline_history table, memories table, foreign keys. After a week I realized the problem: every time the agent needs to reply to a customer, it needs the full email thread, pipeline state, and memories loaded into the prompt. That means JOINing 3-4 tables every time.

With JSONB, one query gets the full context. The trade-off: you lose cross-customer relational queries ("show me all emails with subject X"). For this business, that query never exists — we always operate on a single customer dimension.

Design your database around access patterns, not normal forms.

Email Pipeline: Step by Step

Runs every 5 minutes:

Step 1: Incremental Fetch

messages = await graph_client.get(
    "/me/mailFolders/inbox/messages",
    params={
        "$filter": "isRead eq false",
        "$orderby": "receivedDateTime desc",
        "$top": 50,
        "$select": "id,subject,body,from,receivedDateTime,hasAttachments"
    }
)

Pit: Graph API pagination uses @odata.nextLink — must handle explicitly. $top defaults to 10, easy to miss emails. Timezone: API returns UTC, boss thinks in AEST. Convert at the storage layer.

Step 2: Rule Pre-filter (Zero LLM Cost)

def pre_filter(email):
    if email.headers.get("X-Auto-Reply"):
        return "skip"
    if email.from_domain in KNOWN_SPAM_DOMAINS:
        return "skip"
    if "out of office" in email.subject.lower():
        return "skip"
    return "needs_llm"

Step 3: LLM Classification

The Tier 1 small model classifies what gets through:

A: Needs physical measurement (has address, custom product)
B: Can quote remotely (has dimensions, standard product)
R: Reply to existing thread
C: Not an inquiry (marketing, supplier, internal)

Pit: R vs A boundary. An existing customer replying to an old thread with a new project gets classified as R → merged into the old record → almost missed as a new lead. Fix: secondary check — if the reply body contains new addresses or product types, treat as A regardless.

Step 4: Field Extraction

The Tier 2 mid-tier model parses: name, email, phone, address, suburb, product types, dimensions, urgency, language.

Pit: Australian phone numbers. Mobile: 04xx. Landline: (02) xxxx. International: +61 4xx. Same number, three formats. Normalize to E.164 before storage, or dedup treats one customer as three.

Step 5: Dedup & Merge

existing = await db.search_by_email(sender_email)
if existing:
    existing.context_data["emails"].append(new_email_data)
    await db.update_inquiry(existing.id, context_data=existing.context_data)
else:
    await db.create_inquiry(extracted_fields)

Step 6: Strip Reply Chains

Incoming emails often carry the full reply history. Passing all of it to the LLM means paying for tokens already stored in context_data.emails[].

REPLY_PATTERNS = [
    r"^>.*$",                          # Gmail style
    r"^From:.*\nSent:.*\nTo:.*",       # Outlook style
    r"^-+\s*Original Message\s*-+",     # Generic
    r"^On .+ wrote:$",                  # Apple Mail
]

No universal standard. My multi-pattern stripper covers the major cases; edge cases still slip through. Good enough > perfect.

Lessons from 97 Git Commits

These aren't retrospective "best practices." They're fix records from actual production failures.

Lesson 1: The Agent Lies — "Email Sent" Hallucination

Commit bed80c7: prevent false send status reports

The agent drafted an email. The tool returned draft content. Then the agent told the boss: "Email has been sent to the customer."

It hadn't been sent. The agent confused "drafted" with "sent."

In B2B, if the boss thinks an email was sent, he won't follow up. A $50,000 order could be lost to this single hallucination.

The fix wasn't adding "don't lie" to the prompt. It was forcing the tool return value:

# draft_email tool return
return "✅ DRAFT_PENDING_APPROVAL — Email drafted, NOT yet sent. Boss must reply '发送' to confirm."

This text isn't for the human — it's for the agent. On its next reasoning step, it reads this tool result and won't misjudge the state.

Additionally, send_email only executes when matching a valid draft_id:

async def send_email(draft_id: str):
    draft = await get_draft(draft_id)
    if not draft or draft.status != "PENDING_APPROVAL":
        return "❌ No pending draft found"
    
    await outlook.send(draft.message_id)
    draft.status = "SENT"
    return f"✅ Email sent to {draft.to_email}"

Lesson 2: Ambiguous Confirmation = Accidental Send

Commit 7399e7b: tighten send trigger words

The boss says "好" (ok), "可以" (sure), "ok" — in Chinese context, this might just mean "I see," not "confirm send." But the agent interpreted it as a send command.

Fix: send triggers restricted to "发送", "发", "send", "确认发送". Removed "好", "可以", "ok", "yes" entirely.

SEND_TRIGGERS = {"发送", "发", "send", "确认发送"}
# Removed: "好", "可以", "ok", "yes", "行"

Also added sent-items dedup: if an email was sent to this customer in the last 14 days, draft_email warns proactively.

Lesson 3: Agent Amnesia — Forgetting Which Customer

Commit 7195bf8: inject current customer context

Boss asks "how's Billy doing?" Agent checks Billy's record, replies. Then boss says "draft him a reply." Agent: "who's 'him'?"

Root cause: context was rebuilt each turn without carrying over "the customer currently under discussion."

Fix:

def build_context(session, new_message):
    context = []
    
    if session.current_customer:
        context.append({
            "role": "system",
            "content": f"[Current customer: {session.current_customer.name}]"
        })
    
    context.extend(format_memories(session.memories))
    context.extend(compress_session(session.messages))
    context.append({"role": "user", "content": new_message})
    
    return context

Lesson 4: Notion Was Dead Weight — Removing an Entire Layer

Commit 486fc1a: remove Notion dependency

V1 used Discord + OneDrive + Supabase + Notion. Notion served as a CRM dashboard.

The problem: the boss never opened Notion. All his interactions happened on Discord/Telegram. Notion became a sync target that only I maintained and nobody read. Every pipeline state change triggered a Notion sync, and the sync logic was complex due to Notion API limitations.

Removed the entire Notion layer on day three. Database = Supabase only. UI = messaging platforms only.

Lesson: don't add a component because it "looks professional." If the user doesn't use it, the maintenance cost is pure liability.

Lesson 5: Context Management Is a Continuous Battle

There are a dozen+ commits related to context, spanning the entire project:

286e30c: P0+P1 context optimization

Duplicate boss messages in context (build_context timing bug)
Customer search returning full context_data JSONB (several KB) when only name and email were needed
Session history with no time limit — month-old conversations still in context

Estimated savings: ~200 tokens per call + 80% reduction in database transfer.

889c09c: increase context retention limit

Over-compressed context to save tokens → agent responses became "brainless" — not enough context for good decisions
Found the balance: last 20 messages in full + older ones as compressed summary

d433fed: preserve last agent response in full

Boss often sends very short follow-ups: "and then?", "how much?"
If the agent's own previous response was truncated, it can't connect these short messages

This isn't a one-time fix. Context management is something you rebalance every time you add a feature.

Lesson 6: Batch Operations Can't Go Through the Agent

Commit cfe0e41: split batch_draft_emails

The boss wanted to send emails to 92 contacts. I initially built this as an agent tool — agent reads CSV, checks dedup, generates emails, batch sends.

The agent, processing 92 rows of CSV in its context, started fabricating email addresses. Not malformed — plausible-looking addresses that simply didn't exist.

Fix: extract batch operations into deterministic Python workflows:

# workflows/batch_email.py
async def batch_email_workflow(csv_path, template):
    # Step 1: Parse CSV — pure Python, zero LLM
    contacts = parse_csv(csv_path)
    
    # Step 2: Check sent emails — pure API call, zero LLM
    sent = await get_sent_emails_last_n_days(14)
    sent_addresses = {e.to_email for e in sent}
    
    # Step 3: Filter
    to_send = [c for c in contacts if c.email not in sent_addresses]
    
    # Step 4: Generate emails — only this step uses LLM, one call per email
    for contact in to_send:
        draft = await llm_call(MODEL_TIER_2, compose_prompt(template, contact))
        await save_draft(contact, draft)
    
    return f"Generated {len(to_send)} drafts, pending approval"

The agent initiates the workflow, but the data pipeline is deterministic Python — never passes through LLM context.

Core principle: LLM makes decisions, Python moves data. Any step that doesn't need "creativity" must never touch the LLM.

Lesson 7: Message Formatting Can't Depend on Prompts

Commit e1917d1: three-layer Telegram formatting

I initially told the system prompt: "Reply in Telegram HTML format." Sometimes the agent output Markdown, sometimes HTML, sometimes a hybrid. Telegram's HTML parser is strict — one unclosed <b> tag and the entire message fails to send.

Final solution: three-layer fallback, completely prompt-independent:

def _md_to_tg_html(text: str) -> str:
    """Markdown → Telegram HTML, done in code, not LLM"""
    # 1. Protect code blocks
    code_blocks = []
    text = re.sub(r'```(\w*)\n(.*?)```',
                  lambda m: _save_code_block(m, code_blocks), text, flags=re.DOTALL)
    
    # 2. Escape HTML special chars
    text = text.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
    
    # 3. Convert Markdown syntax
    text = re.sub(r'\*\*(.+?)\*\*', r'<b>\1</b>', text)
    text = re.sub(r'\*(.+?)\*', r'<i>\1</i>', text)
    text = re.sub(r'`(.+?)`', r'<code>\1</code>', text)
    
    # 4. Tables → mobile-friendly format
    text = convert_tables_to_dots(text)
    
    # 5. Restore code blocks
    text = restore_code_blocks(text, code_blocks)
    
    return text

async def send_text(self, text: str):
    html = _md_to_tg_html(text)
    try:
        await self.bot.send_message(self.chat_id, html, parse_mode="HTML")
    except TelegramError:
        plain = strip_all_tags(html)
        await self.bot.send_message(self.chat_id, plain)

Lesson: never rely on the LLM's output format. Let the LLM output freely; handle formatting in code.

Lesson 8: Discord Concurrency Bug Chain

The densest fix sequence in Git history (commits 2b8011c → eb0bf9c → 4f03d6f) — three bugs in three days, all Discord bot concurrency issues:

Added a lock to prevent concurrent processing → lock blocked all @mentions
Switched to _should_respond flag → race condition, all messages flagged as "don't process"
Switched to _imlang_handled attribute → discord.py Message objects don't support custom attributes → crash

Final fix: removed all the "clever" concurrency control. Simple approach — each message processed independently, concurrency limited by asyncio.Semaphore.

_processing_semaphore = asyncio.Semaphore(3)

async def on_message(message):
    if not should_respond(message):
        return
    async with _processing_semaphore:
        await process_message(message)

Lesson: in concurrent systems, "simple and brutal" often beats "clever and elegant."

The Complete Tool Set

20+ tools in four categories:

Query: search_customer (fuzzy search), query_stats (today's numbers), get_pending (pending items), scan_mailbox (full email thread analysis), search_emails (conditional search)

Files: list_customer_files (OneDrive listing), search_email_attachments (email attachment search), download_and_show_attachments (download + preview), read_onedrive_file (read Excel/PDF/text), create_customer_folder

Pipeline: advance_stage (progress state), audit_pipeline (consistency check)

Email: draft_email (bilingual draft), send_email (send approved draft), forward_to_factory (package and forward), batch_draft_emails, create_calendar_event

Tools defined in OpenAI function-calling format:

{
    "name": "search_customer",
    "description": "Fuzzy search customers by name, email, or phone",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Customer name, email, or phone to search"
            }
        },
        "required": ["query"]
    }
}

Tool count grew from 0 (hardcoded actions in v1) to 20+ incrementally. Each tool was added because the boss hit a need that existing tools couldn't serve.

The Draft-Approval Mechanism

The most critical safety design in the entire system:

1. Boss: "Reply to Billy, quote $15,000"
2. Agent → search_customer("Billy") → found
3. Agent → draft_email(to=billy@..., body=...) → bilingual preview
4. Agent shows boss:
   [Chinese translation] — for boss to review content
   [English body] — what actually gets sent
   ⚠️ NOT SENT — reply "发送" to confirm
5. Status → DRAFT_PENDING_APPROVAL
6. Boss says "发送" → send_email(draft_id=xxx)
   Boss says "change it" → back to step 3

Never auto-send. In B2B manufacturing, one wrong email can lose a $50,000 order.

Memory System

After every conversation, MemoryWatcher runs asynchronously (doesn't block the response):

async def extract_and_store_memories(conversation, customer_id=None):
    memories = await llm_call(
        MODEL_TIER_1,  # cheapest model for extraction
        f"Extract long-term facts from this conversation:\n{conversation}"
    )
    
    for memory in memories:
        if await is_semantic_duplicate(memory):
            continue
            
        if memory.type == "boss_pref":
            # "Always include GST in quotes" → global
            await store_global_memory(memory)
        elif memory.type == "customer_note":
            # "Billy's budget is limited" → customer-specific
            await store_customer_memory(customer_id, memory)
        elif memory.type == "business_rule":
            # "Melbourne shipping +$200" → global
            await store_global_memory(memory)

Memories inject into the next session's system prompt. The agent accumulates business knowledge over time.

Semantic dedup uses the small model — "Billy's budget is limited" and "Billy has a tight budget" are the same memory. Not perfect, but prevents memory bloat.

I added this in week three. For the first two weeks, the boss repeated the same preferences 20+ times, and the agent acted like it was day one every time. If I started over, memory system goes in on day one.

Session Management: Progressive Compression

session = {
    "messages": deque(maxlen=40),
    "summary": "",
    "customer_context": {}
}

def compress_session(messages):
    msgs = list(messages)
    
    # Oldest 10 → summary (~200 tokens)
    old = msgs[:10]
    summary = await summarize(old)
    
    # Middle 20 → keep text but truncate tool returns
    middle = msgs[10:30]
    for msg in middle:
        if msg["role"] == "tool":
            msg["content"] = truncate(msg["content"], max_chars=500)
    
    # Newest 10 → full fidelity (including agent's own complete response)
    recent = msgs[30:]
    
    return [{"role": "system", "content": summary}] + middle + recent

Pit: Compression is lossy. Important info can get summarized away. Mitigation: MemoryWatcher extracts key facts into persistent memories before the compression window moves. Not perfect, but a practical trade-off.

Pipeline Tracking & OneDrive Sync

8-stage state machine with full audit trail:

inquiry → measure → quoting → follow_up → ordered → production → install → done

Timeout thresholds per stage:

Stage	Timeout	Action
inquiry	2 days	Remind boss
measure	5 days	Schedule measurement
quoting	7 days	Follow up on quote
follow_up	14 days	Suggest re-contact
ordered	3 days	Confirm factory received
production	21 days	Check production progress
install	7 days	Schedule installation

OneDrive folders organized by stage:

OneDrive/
├── 未报价顾客/           ← inquiry, measure
│   ├── Billy Smith/
│   └── John Lee/
├── 厂家已报价/           ← quoting
│   └── Sarah Chen/
├── 已报价顾客/           ← follow_up
│   └── Mike Wang/
└── 已确定并付定金顾客/    ← ordered through done
    └── David Liu/

OneDrive Watcher checks delta API every 10 minutes. If a folder moves to a new stage directory, the database pipeline status updates automatically.

Pit: OneDrive delta API can detect "something changed" but can't distinguish "file rename" from "file move" — they look identical in the response. I wrote path-comparison logic: extract parent folder name; if it changed from 未报价顾客/ to 厂家已报价/, that's a stage transition. If the parent didn't change, it's a rename — no state update triggered.

Voice Message Transcription

The boss prefers sending voice messages over typing. Added voice transcription (commits bd8bcdc → 5605df3):

Using the Whisper model for speech recognition, connected via a free API endpoint.

Flow: receive voice message → download audio → Whisper → text → process as regular text message. The boss doesn't notice the difference — voice and text work identically.

The Bilingual Factor

This isn't one module's problem — it permeates everything:

Two sets of prompts: Chinese for understanding, English for email generation
Error messages in Chinese: English errors confuse the boss
Bilingual email previews: Chinese for boss to review, English for the customer
Business culture differences: "麻烦您尽快回复" ≠ "Please reply at your earliest convenience"
Product terminology mapping: 推拉门 = sliding door, 平开窗 = casement window

System prompt includes product knowledge (AS2047 standards, product specs, building codes) so the agent can reference correct technical parameters.

Budget 2-3x the time of a monolingual project.

Scheduling

scheduler = AsyncIOScheduler(timezone="Australia/Sydney")

scheduler.add_job(run_email_check, "interval", minutes=5)
scheduler.add_job(run_onedrive_watch, "interval", minutes=10)
scheduler.add_job(run_evening_schedule, "cron", hour=20)  # 8 PM briefing + route

scheduler.start()

Every evening at 8 PM AEST:

Query all pending A-type inquiries
Generate optimized measurement route from addresses
Compile briefing: customer name, address, product interest, notes
Push to boss's active messaging platform

The boss starts each morning knowing exactly where to go.

Deployment

Component	Choice
Agent runtime	Railway (Python worker)
Database	Supabase
LLM calls	Unified routing API
Voice transcription	Whisper API
Website	Vercel (Next.js)

Railway deployment: worker: python main.py. Nixpacks build with CJK font support (agent generates Chinese PDFs).

Pit: Railway cold start — 10-15 second wait after idle. Scheduled jobs aren't affected, but if the boss messages during cold start, he waits. Fix: heartbeat keepalive, worker never sleeps.

Tech Stack

Component	Technology
Runtime	Python 3.12, asyncio
LLM routing	Unified routing API (three-tier models)
Database	Supabase PostgreSQL
Email	Microsoft Graph API (Outlook)
Files	OneDrive API
Messaging	Discord.py / python-telegram-bot / Meta WhatsApp Cloud API
Voice	Whisper API
Scheduling	APScheduler
HTTP	httpx (async, connection pooling)
Deployment	Railway
Website	Next.js, React, Tailwind CSS, Vercel

If I Started Over

Start with Telegram/WhatsApp. Build where the user is, not where you're comfortable.
Memory system on day one. An agent that can't remember is an agent users lose trust in.
JSONB from the start. Wasted a week on relational tables.
Batch operations as workflows from day one. LLMs fabricate data at scale.
Format in code, not prompts. Never rely on LLM output format.
Context management is continuous. Every new feature requires rebalancing.
Budget 2.5x for bilingual. It's not just translation — it's dual-track interaction at every layer.

Conclusion

97 commits, 12 days (from v1 on March 25 to the latest commit on April 6). From a classifier-router toy to a production system the boss uses daily.

The core takeaway: the hard part of building an AI agent isn't the model calls. It's:

Understanding the user's real pain (not "too many emails" — it's "unfamiliar workflow + language barrier")
Handling LLM unreliability (hallucination, format inconsistency, context bloat)
Balancing agent autonomy against system safety
Continuous iteration (in 97 commits, fixes outnumber features)

Questions welcome.

Parker Kuang | QKAi Studio | qkai.com.au

AI agentbuild AI agent from scratchmulti-channel agentbusiness automationAI architecture

Place this article inside the wider QKAI perspective

AI Agent14 Apr 2026

From First Principles to an Agent Harness

I did not start with LangGraph or any agent framework. I started with a real business system, and by solving scheduling, tool orchestration, messaging abstraction, and external APIs from first principles, I ended up building my own agent harness.

Robotics Strategy22 Apr 2026

The Robotics Flywheel Starts Before the Robot

The real robotics flywheel starts long before the robot — with workflow redesign, data structure, AI agents, and operational discipline. The hardware comes last.

Automation Strategy21 Apr 2026

Why High Labour Costs in Australia Are Making AI Automation a Business Priority

High labour costs in Australia are pushing SMEs to rethink admin automation, operations automation, lead qualification, quote automation, and AI agent workflows before adding more headcount.

Next step

If the problem is workflow, not just content

QKAI can help assess which enquiry, quoting, follow-up, CRM, or support workflows should be redesigned before another manual layer is added.

See services Book a call

← Back to Blog

This is the full technical breakdown. Architecture evolution, code snippets, database schema, and every pit from 97 commits — all laid out. If you're building something similar, this should save you a few weeks.

The Real Pain Points

The client is a custom aluminum window and door manufacturer in Sydney. The boss is Chinese.

Most people's first assumption would be "he's drowning in emails." That's wrong. The real pain points were two things:

So the requirement wasn't "automate my email." It was: let me drive an unfamiliar workflow (English email + file management) from a familiar interface (Chinese, messaging platform).

Architecture Overview

                    ┌─────────────┐
                    │   Outlook   │
                    │  (emails)   │
                    └──────┬──────┘
                           │ Microsoft Graph API
                           ▼
              ┌────────────────────────┐
              │   Email Pipeline       │
              │  (every 5 min)         │
              │                        │
              │  1. Incremental fetch  │
              │  2. Rule pre-filter    │
              │  3. LLM classify       │
              │  4. Extract fields     │
              │  5. Dedup & merge      │
              │  6. Store → Supabase   │
              │  7. Notify boss        │
              └────────────┬───────────┘
                           │
              ┌────────────▼───────────┐
              │   Supabase PostgreSQL   │
              │                        │
              │  inquiries             │
              │  bot_sessions          │
              │  memories_global       │
              │  customer_memories     │
              └────────────┬───────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
    ┌────▼────┐      ┌────▼────┐      ┌────▼────┐
    │ Discord │      │Telegram │      │WhatsApp │
    │  Bot    │      │  Bot    │      │ Webhook │
    └────┬────┘      └────┬────┘      └────┬────┘
         │                 │                 │
         └────────┬────────┘─────────┬───────┘
                  │                  │
           ┌──────▼──────┐   ┌──────▼──────┐
           │ Chat Agent  │   │  OneDrive   │
           │   (LLM)    │   │  Watcher    │
           │ 20+ tools   │   │ (every 10m) │
           └─────────────┘   └─────────────┘

Three layers:

Data ingestion — Email pipeline + OneDrive watcher pull external data automatically
Storage — Supabase holds all inquiry state, session history, and extracted memories
Interaction — The boss talks to the agent through any messaging platform; the agent executes through 20+ tools

Architecture Evolution: Three Phases, Three Rewrites

I didn't plan it all out upfront. The project went through three architecture-level rewrites.

Phase 1: Classifier + Router (v1, lasted one day)

The v1 approach was textbook:

Boss sends message → Classifier determines intent → Route to handler → Return result

The classifier output a JSON: {"action": "stats"} or {"action": "advance"}. Hardcoded handlers executed the corresponding operation.

# v1: classifier + router (deprecated)
async def handle_message(message):
    intent = await classify_intent(message.content)
    handler = ROUTE_MAP.get(intent["action"])
    if handler:
        return await handler(message, intent)
    return "Unsupported operation"

ROUTE_MAP = {
    "stats": handle_stats,
    "advance": handle_advance,
    "draft": handle_draft,
    "files": handle_files,
}

The problem? Real scenarios aren't single-step.

The boss says "check Steward's floor plan." The classifier routes to "files," searches OneDrive — nothing found. Done.

V1 broke on day one. Commit 6a3df39 ripped out the entire classifier architecture.

Phase 2: Agentic Tool-Calling (still running)

Replaced with a ReAct pattern: the agent gets a set of tools and decides the calling order itself.

Same "check Steward's floor plan" scenario:

Agent → search_customer("Steward") → found customer
Agent → list_customer_files(customer_id) → nothing on OneDrive
Agent → search_email_attachments(customer_id) → found PDF in email
Agent → download_and_show_attachments(attachment_id) → sent image to boss

Four autonomous steps. No hardcoded routing needed.

The core loop:

async def agent_loop(user_message, session, channel):
    messages = build_context(session, user_message)
    
    for round_num in range(MAX_ROUNDS := 15):
        response = await llm_call(
            MODEL_TIER_3,
            messages=messages,
            tools=ALL_TOOL_DEFINITIONS
        )
        
        if response.has_tool_calls:
            for tool_call in response.tool_calls:
                result = await execute_tool(tool_call, channel)
                messages.append({"role": "tool", "content": result,
                                "tool_call_id": tool_call.id})
            messages.append(response.assistant_message)
        else:
            await channel.send_text(response.content)
            return
    
    await channel.send_text("⚠️ Too many steps — please simplify the request")

The trade-off: cost and safety. Every LLM call costs money, and one task calling 4-5 tools adds up. And once the agent has "free will," the ways it can go wrong multiply — more on that below.

Phase 3: Platform Expansion (Discord → Telegram → WhatsApp)

The boss initially used Discord (because I developed on Discord). But he actually uses Telegram and WeChat day-to-day.

This led to the messaging abstraction layer:

class MessageChannel(ABC):
    """Unified interface for all messaging platforms"""
    
    @abstractmethod
    async def send_text(self, text: str) -> None: ...
    
    @abstractmethod
    async def send_file(self, file_path: str, filename: str) -> None: ...
    
    @abstractmethod
    async def send_preview(self, long_text: str) -> None: ...

Three implementations:

Platform	Char limit	Key implementation details
Discord	1990 chars	Native threads, emoji formatting
Telegram	4096 chars	Group chat support, boss-ID whitelist for approvals, HTML formatting
WhatsApp	4096 chars	Meta Cloud API webhook, media upload, phone verification

send_preview() chunks long responses at paragraph boundaries so nothing gets cut mid-sentence. Chunking logic adapts per platform's character limit.

Adding a new platform (Slack, Teams) only requires implementing this interface — zero changes to agent logic.

Lesson: build the MVP on the platform your user actually uses, not the one you're comfortable with.

Three-Tier Model Routing: The Key to Cost Control

The first month I used the most capable model for everything. The bill was a wake-up call.

Core principle: every task has a minimum intelligence threshold. If a cheap model can handle it, don't use an expensive one.

Tier	Role	Use case
Tier 1	Fast & cheap small model	Email classification, memory dedup, rule filtering
Tier 2	Mid-tier model	Field extraction, email composition
Tier 3	Full reasoning large model	Agent reasoning, complex decisions

# config.py
MODEL_TIER_1 = "..."   # fast & cheap
MODEL_TIER_2 = "..."   # mid-tier
MODEL_TIER_3 = "..."   # full reasoning

All calls go through a unified LLM routing API. Shared httpx connection pool, async semaphore (max 10 concurrent), exponential backoff on 429/5xx (3 retries).

# llm_client.py
_semaphore = asyncio.Semaphore(10)

async def llm_call(model, messages, tools=None, max_retries=3):
    async with _semaphore:
        for attempt in range(max_retries):
            try:
                resp = await _http_client.post(
                    LLM_API_ENDPOINT,
                    json={"model": model, "messages": messages, "tools": tools},
                    headers={"Authorization": f"Bearer {API_KEY}"}
                )
                if resp.status_code == 429:
                    await asyncio.sleep(2 ** attempt)
                    continue
                return resp.json()
            except httpx.TimeoutException:
                if attempt == max_retries - 1:
                    raise

Classifying whether an email is a real inquiry or spam — a small model with a good prompt is just as accurate as a large model, at a fraction of the cost.

JSONB vs Relational Tables

The inquiries table's most important column is context_data (JSONB):

CREATE TABLE inquiries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email_id TEXT UNIQUE NOT NULL,
  classification TEXT CHECK (classification IN ('A', 'B', 'C')),
  confidence FLOAT,
  status TEXT CHECK (status IN ('new', 'pending', 'scheduled', 'quoted', 'closed')),
  customer_name TEXT,
  customer_email TEXT,
  customer_phone TEXT,
  address TEXT,
  suburb TEXT,
  product_types TEXT[],
  product_description TEXT,
  dimensions TEXT,
  urgency TEXT DEFAULT 'medium',
  notes TEXT,
  raw_subject TEXT,
  raw_body TEXT,
  context_data JSONB,
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

context_data structure:

{
  "emails": [
    {
      "email_id": "AAMkAGI...",
      "subject": "Quote request - sliding doors",
      "body": "Hi, I need 3 sliding doors...",
      "received_at": "2024-03-15T10:00:00Z"
    }
  ],
  "pipeline": {
    "current_stage": "inquiry",
    "stage_updated_at": "2024-03-15T10:00:00Z",
    "history": [
      {
        "from": "new",
        "to": "inquiry",
        "at": "2024-03-15T10:00:00Z",
        "reason": "Auto-classified as A-type",
        "triggered_by": "pipeline"
      }
    ]
  },
  "memories": [
    {"type": "customer_note", "content": "Budget is limited, wants basic spec"}
  ],
  "has_dimensions": true,
  "glass_type": "double IGU",
  "language": "en"
}

Design your database around access patterns, not normal forms.

Email Pipeline: Step by Step

Runs every 5 minutes:

Step 1: Incremental Fetch

messages = await graph_client.get(
    "/me/mailFolders/inbox/messages",
    params={
        "$filter": "isRead eq false",
        "$orderby": "receivedDateTime desc",
        "$top": 50,
        "$select": "id,subject,body,from,receivedDateTime,hasAttachments"
    }
)

Step 2: Rule Pre-filter (Zero LLM Cost)

def pre_filter(email):
    if email.headers.get("X-Auto-Reply"):
        return "skip"
    if email.from_domain in KNOWN_SPAM_DOMAINS:
        return "skip"
    if "out of office" in email.subject.lower():
        return "skip"
    return "needs_llm"

Step 3: LLM Classification

The Tier 1 small model classifies what gets through:

A: Needs physical measurement (has address, custom product)
B: Can quote remotely (has dimensions, standard product)
R: Reply to existing thread
C: Not an inquiry (marketing, supplier, internal)

Step 4: Field Extraction

The Tier 2 mid-tier model parses: name, email, phone, address, suburb, product types, dimensions, urgency, language.

Pit: Australian phone numbers. Mobile: 04xx. Landline: (02) xxxx. International: +61 4xx. Same number, three formats. Normalize to E.164 before storage, or dedup treats one customer as three.

Step 5: Dedup & Merge

existing = await db.search_by_email(sender_email)
if existing:
    existing.context_data["emails"].append(new_email_data)
    await db.update_inquiry(existing.id, context_data=existing.context_data)
else:
    await db.create_inquiry(extracted_fields)

Step 6: Strip Reply Chains

Incoming emails often carry the full reply history. Passing all of it to the LLM means paying for tokens already stored in context_data.emails[].

REPLY_PATTERNS = [
    r"^>.*$",                          # Gmail style
    r"^From:.*\nSent:.*\nTo:.*",       # Outlook style
    r"^-+\s*Original Message\s*-+",     # Generic
    r"^On .+ wrote:$",                  # Apple Mail
]

No universal standard. My multi-pattern stripper covers the major cases; edge cases still slip through. Good enough > perfect.

Lessons from 97 Git Commits

These aren't retrospective "best practices." They're fix records from actual production failures.

Lesson 1: The Agent Lies — "Email Sent" Hallucination

Commit bed80c7: prevent false send status reports

The agent drafted an email. The tool returned draft content. Then the agent told the boss: "Email has been sent to the customer."

It hadn't been sent. The agent confused "drafted" with "sent."

In B2B, if the boss thinks an email was sent, he won't follow up. A $50,000 order could be lost to this single hallucination.

The fix wasn't adding "don't lie" to the prompt. It was forcing the tool return value:

# draft_email tool return
return "✅ DRAFT_PENDING_APPROVAL — Email drafted, NOT yet sent. Boss must reply '发送' to confirm."

This text isn't for the human — it's for the agent. On its next reasoning step, it reads this tool result and won't misjudge the state.

Additionally, send_email only executes when matching a valid draft_id:

async def send_email(draft_id: str):
    draft = await get_draft(draft_id)
    if not draft or draft.status != "PENDING_APPROVAL":
        return "❌ No pending draft found"
    
    await outlook.send(draft.message_id)
    draft.status = "SENT"
    return f"✅ Email sent to {draft.to_email}"

Lesson 2: Ambiguous Confirmation = Accidental Send

Commit 7399e7b: tighten send trigger words

The boss says "好" (ok), "可以" (sure), "ok" — in Chinese context, this might just mean "I see," not "confirm send." But the agent interpreted it as a send command.

Fix: send triggers restricted to "发送", "发", "send", "确认发送". Removed "好", "可以", "ok", "yes" entirely.

SEND_TRIGGERS = {"发送", "发", "send", "确认发送"}
# Removed: "好", "可以", "ok", "yes", "行"

Also added sent-items dedup: if an email was sent to this customer in the last 14 days, draft_email warns proactively.

Lesson 3: Agent Amnesia — Forgetting Which Customer

Commit 7195bf8: inject current customer context

Boss asks "how's Billy doing?" Agent checks Billy's record, replies. Then boss says "draft him a reply." Agent: "who's 'him'?"

Root cause: context was rebuilt each turn without carrying over "the customer currently under discussion."

Fix:

def build_context(session, new_message):
    context = []
    
    if session.current_customer:
        context.append({
            "role": "system",
            "content": f"[Current customer: {session.current_customer.name}]"
        })
    
    context.extend(format_memories(session.memories))
    context.extend(compress_session(session.messages))
    context.append({"role": "user", "content": new_message})
    
    return context

Lesson 4: Notion Was Dead Weight — Removing an Entire Layer

Commit 486fc1a: remove Notion dependency

V1 used Discord + OneDrive + Supabase + Notion. Notion served as a CRM dashboard.

Removed the entire Notion layer on day three. Database = Supabase only. UI = messaging platforms only.

Lesson: don't add a component because it "looks professional." If the user doesn't use it, the maintenance cost is pure liability.

Lesson 5: Context Management Is a Continuous Battle

There are a dozen+ commits related to context, spanning the entire project:

286e30c: P0+P1 context optimization

Duplicate boss messages in context (build_context timing bug)
Customer search returning full context_data JSONB (several KB) when only name and email were needed
Session history with no time limit — month-old conversations still in context

Estimated savings: ~200 tokens per call + 80% reduction in database transfer.

889c09c: increase context retention limit

Over-compressed context to save tokens → agent responses became "brainless" — not enough context for good decisions
Found the balance: last 20 messages in full + older ones as compressed summary

d433fed: preserve last agent response in full

Boss often sends very short follow-ups: "and then?", "how much?"
If the agent's own previous response was truncated, it can't connect these short messages

This isn't a one-time fix. Context management is something you rebalance every time you add a feature.

Lesson 6: Batch Operations Can't Go Through the Agent

Commit cfe0e41: split batch_draft_emails

The boss wanted to send emails to 92 contacts. I initially built this as an agent tool — agent reads CSV, checks dedup, generates emails, batch sends.

The agent, processing 92 rows of CSV in its context, started fabricating email addresses. Not malformed — plausible-looking addresses that simply didn't exist.

Fix: extract batch operations into deterministic Python workflows:

# workflows/batch_email.py
async def batch_email_workflow(csv_path, template):
    # Step 1: Parse CSV — pure Python, zero LLM
    contacts = parse_csv(csv_path)
    
    # Step 2: Check sent emails — pure API call, zero LLM
    sent = await get_sent_emails_last_n_days(14)
    sent_addresses = {e.to_email for e in sent}
    
    # Step 3: Filter
    to_send = [c for c in contacts if c.email not in sent_addresses]
    
    # Step 4: Generate emails — only this step uses LLM, one call per email
    for contact in to_send:
        draft = await llm_call(MODEL_TIER_2, compose_prompt(template, contact))
        await save_draft(contact, draft)
    
    return f"Generated {len(to_send)} drafts, pending approval"

The agent initiates the workflow, but the data pipeline is deterministic Python — never passes through LLM context.

Core principle: LLM makes decisions, Python moves data. Any step that doesn't need "creativity" must never touch the LLM.

Lesson 7: Message Formatting Can't Depend on Prompts

Commit e1917d1: three-layer Telegram formatting

Final solution: three-layer fallback, completely prompt-independent:

def _md_to_tg_html(text: str) -> str:
    """Markdown → Telegram HTML, done in code, not LLM"""
    # 1. Protect code blocks
    code_blocks = []
    text = re.sub(r'```(\w*)\n(.*?)```',
                  lambda m: _save_code_block(m, code_blocks), text, flags=re.DOTALL)
    
    # 2. Escape HTML special chars
    text = text.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
    
    # 3. Convert Markdown syntax
    text = re.sub(r'\*\*(.+?)\*\*', r'<b>\1</b>', text)
    text = re.sub(r'\*(.+?)\*', r'<i>\1</i>', text)
    text = re.sub(r'`(.+?)`', r'<code>\1</code>', text)
    
    # 4. Tables → mobile-friendly format
    text = convert_tables_to_dots(text)
    
    # 5. Restore code blocks
    text = restore_code_blocks(text, code_blocks)
    
    return text

async def send_text(self, text: str):
    html = _md_to_tg_html(text)
    try:
        await self.bot.send_message(self.chat_id, html, parse_mode="HTML")
    except TelegramError:
        plain = strip_all_tags(html)
        await self.bot.send_message(self.chat_id, plain)

Lesson: never rely on the LLM's output format. Let the LLM output freely; handle formatting in code.

Lesson 8: Discord Concurrency Bug Chain

The densest fix sequence in Git history (commits 2b8011c → eb0bf9c → 4f03d6f) — three bugs in three days, all Discord bot concurrency issues:

Added a lock to prevent concurrent processing → lock blocked all @mentions
Switched to _should_respond flag → race condition, all messages flagged as "don't process"
Switched to _imlang_handled attribute → discord.py Message objects don't support custom attributes → crash

Final fix: removed all the "clever" concurrency control. Simple approach — each message processed independently, concurrency limited by asyncio.Semaphore.

_processing_semaphore = asyncio.Semaphore(3)

async def on_message(message):
    if not should_respond(message):
        return
    async with _processing_semaphore:
        await process_message(message)

Lesson: in concurrent systems, "simple and brutal" often beats "clever and elegant."

The Complete Tool Set

20+ tools in four categories:

Query: search_customer (fuzzy search), query_stats (today's numbers), get_pending (pending items), scan_mailbox (full email thread analysis), search_emails (conditional search)

Pipeline: advance_stage (progress state), audit_pipeline (consistency check)

Email: draft_email (bilingual draft), send_email (send approved draft), forward_to_factory (package and forward), batch_draft_emails, create_calendar_event

Tools defined in OpenAI function-calling format:

{
    "name": "search_customer",
    "description": "Fuzzy search customers by name, email, or phone",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Customer name, email, or phone to search"
            }
        },
        "required": ["query"]
    }
}

Tool count grew from 0 (hardcoded actions in v1) to 20+ incrementally. Each tool was added because the boss hit a need that existing tools couldn't serve.

The Draft-Approval Mechanism

The most critical safety design in the entire system:

1. Boss: "Reply to Billy, quote $15,000"
2. Agent → search_customer("Billy") → found
3. Agent → draft_email(to=billy@..., body=...) → bilingual preview
4. Agent shows boss:
   [Chinese translation] — for boss to review content
   [English body] — what actually gets sent
   ⚠️ NOT SENT — reply "发送" to confirm
5. Status → DRAFT_PENDING_APPROVAL
6. Boss says "发送" → send_email(draft_id=xxx)
   Boss says "change it" → back to step 3

Never auto-send. In B2B manufacturing, one wrong email can lose a $50,000 order.

Memory System

After every conversation, MemoryWatcher runs asynchronously (doesn't block the response):

async def extract_and_store_memories(conversation, customer_id=None):
    memories = await llm_call(
        MODEL_TIER_1,  # cheapest model for extraction
        f"Extract long-term facts from this conversation:\n{conversation}"
    )
    
    for memory in memories:
        if await is_semantic_duplicate(memory):
            continue
            
        if memory.type == "boss_pref":
            # "Always include GST in quotes" → global
            await store_global_memory(memory)
        elif memory.type == "customer_note":
            # "Billy's budget is limited" → customer-specific
            await store_customer_memory(customer_id, memory)
        elif memory.type == "business_rule":
            # "Melbourne shipping +$200" → global
            await store_global_memory(memory)

Memories inject into the next session's system prompt. The agent accumulates business knowledge over time.

Semantic dedup uses the small model — "Billy's budget is limited" and "Billy has a tight budget" are the same memory. Not perfect, but prevents memory bloat.

Session Management: Progressive Compression

session = {
    "messages": deque(maxlen=40),
    "summary": "",
    "customer_context": {}
}

def compress_session(messages):
    msgs = list(messages)
    
    # Oldest 10 → summary (~200 tokens)
    old = msgs[:10]
    summary = await summarize(old)
    
    # Middle 20 → keep text but truncate tool returns
    middle = msgs[10:30]
    for msg in middle:
        if msg["role"] == "tool":
            msg["content"] = truncate(msg["content"], max_chars=500)
    
    # Newest 10 → full fidelity (including agent's own complete response)
    recent = msgs[30:]
    
    return [{"role": "system", "content": summary}] + middle + recent

Pipeline Tracking & OneDrive Sync

8-stage state machine with full audit trail:

inquiry → measure → quoting → follow_up → ordered → production → install → done

Timeout thresholds per stage:

Stage	Timeout	Action
inquiry	2 days	Remind boss
measure	5 days	Schedule measurement
quoting	7 days	Follow up on quote
follow_up	14 days	Suggest re-contact
ordered	3 days	Confirm factory received
production	21 days	Check production progress
install	7 days	Schedule installation

OneDrive folders organized by stage:

OneDrive/
├── 未报价顾客/           ← inquiry, measure
│   ├── Billy Smith/
│   └── John Lee/
├── 厂家已报价/           ← quoting
│   └── Sarah Chen/
├── 已报价顾客/           ← follow_up
│   └── Mike Wang/
└── 已确定并付定金顾客/    ← ordered through done
    └── David Liu/

OneDrive Watcher checks delta API every 10 minutes. If a folder moves to a new stage directory, the database pipeline status updates automatically.

Voice Message Transcription

The boss prefers sending voice messages over typing. Added voice transcription (commits bd8bcdc → 5605df3):

Using the Whisper model for speech recognition, connected via a free API endpoint.

Flow: receive voice message → download audio → Whisper → text → process as regular text message. The boss doesn't notice the difference — voice and text work identically.

The Bilingual Factor

This isn't one module's problem — it permeates everything:

Two sets of prompts: Chinese for understanding, English for email generation
Error messages in Chinese: English errors confuse the boss
Bilingual email previews: Chinese for boss to review, English for the customer
Business culture differences: "麻烦您尽快回复" ≠ "Please reply at your earliest convenience"
Product terminology mapping: 推拉门 = sliding door, 平开窗 = casement window

System prompt includes product knowledge (AS2047 standards, product specs, building codes) so the agent can reference correct technical parameters.

Budget 2-3x the time of a monolingual project.

Scheduling

scheduler = AsyncIOScheduler(timezone="Australia/Sydney")

scheduler.add_job(run_email_check, "interval", minutes=5)
scheduler.add_job(run_onedrive_watch, "interval", minutes=10)
scheduler.add_job(run_evening_schedule, "cron", hour=20)  # 8 PM briefing + route

scheduler.start()

Every evening at 8 PM AEST:

Query all pending A-type inquiries
Generate optimized measurement route from addresses
Compile briefing: customer name, address, product interest, notes
Push to boss's active messaging platform

The boss starts each morning knowing exactly where to go.

Deployment

Component	Choice
Agent runtime	Railway (Python worker)
Database	Supabase
LLM calls	Unified routing API
Voice transcription	Whisper API
Website	Vercel (Next.js)

Railway deployment: worker: python main.py. Nixpacks build with CJK font support (agent generates Chinese PDFs).

Pit: Railway cold start — 10-15 second wait after idle. Scheduled jobs aren't affected, but if the boss messages during cold start, he waits. Fix: heartbeat keepalive, worker never sleeps.

Tech Stack

Component	Technology
Runtime	Python 3.12, asyncio
LLM routing	Unified routing API (three-tier models)
Database	Supabase PostgreSQL
Email	Microsoft Graph API (Outlook)
Files	OneDrive API
Messaging	Discord.py / python-telegram-bot / Meta WhatsApp Cloud API
Voice	Whisper API
Scheduling	APScheduler
HTTP	httpx (async, connection pooling)
Deployment	Railway
Website	Next.js, React, Tailwind CSS, Vercel

If I Started Over

Start with Telegram/WhatsApp. Build where the user is, not where you're comfortable.
Memory system on day one. An agent that can't remember is an agent users lose trust in.
JSONB from the start. Wasted a week on relational tables.
Batch operations as workflows from day one. LLMs fabricate data at scale.
Format in code, not prompts. Never rely on LLM output format.
Context management is continuous. Every new feature requires rebalancing.
Budget 2.5x for bilingual. It's not just translation — it's dual-track interaction at every layer.

Conclusion

97 commits, 12 days (from v1 on March 25 to the latest commit on April 6). From a classifier-router toy to a production system the boss uses daily.

The core takeaway: the hard part of building an AI agent isn't the model calls. It's:

Understanding the user's real pain (not "too many emails" — it's "unfamiliar workflow + language barrier")
Handling LLM unreliability (hallucination, format inconsistency, context bloat)
Balancing agent autonomy against system safety
Continuous iteration (in 97 commits, fixes outnumber features)

Questions welcome.

Parker Kuang | QKAi Studio | qkai.com.au

AI agentbuild AI agent from scratchmulti-channel agentbusiness automationAI architecture

Place this article inside the wider QKAI perspective

AI Agent14 Apr 2026

If the problem is workflow, not just content

QKAI can help assess which enquiry, quoting, follow-up, CRM, or support workflows should be redesigned before another manual layer is added.

See services Book a call

← Back to Blog

How I Built an AI Agent from Scratch

The Real Pain Points

Architecture Overview

Architecture Evolution: Three Phases, Three Rewrites

Phase 1: Classifier + Router (v1, lasted one day)

Phase 2: Agentic Tool-Calling (still running)

Phase 3: Platform Expansion (Discord → Telegram → WhatsApp)

Three-Tier Model Routing: The Key to Cost Control

JSONB vs Relational Tables

Email Pipeline: Step by Step

Step 1: Incremental Fetch

Step 2: Rule Pre-filter (Zero LLM Cost)

Step 3: LLM Classification

Step 4: Field Extraction

Step 5: Dedup & Merge

Step 6: Strip Reply Chains

Lessons from 97 Git Commits

Lesson 1: The Agent Lies — "Email Sent" Hallucination

Lesson 2: Ambiguous Confirmation = Accidental Send

Lesson 3: Agent Amnesia — Forgetting Which Customer

Lesson 4: Notion Was Dead Weight — Removing an Entire Layer

Lesson 5: Context Management Is a Continuous Battle

Lesson 6: Batch Operations Can't Go Through the Agent

Lesson 7: Message Formatting Can't Depend on Prompts

Lesson 8: Discord Concurrency Bug Chain

The Complete Tool Set

The Draft-Approval Mechanism

Memory System

Session Management: Progressive Compression

Pipeline Tracking & OneDrive Sync

Voice Message Transcription

The Bilingual Factor

Scheduling

Deployment

Tech Stack

If I Started Over

Conclusion

Place this article inside the wider QKAI perspective

From First Principles to an Agent Harness

The Robotics Flywheel Starts Before the Robot

Why High Labour Costs in Australia Are Making AI Automation a Business Priority

If the problem is workflow, not just content

How I Built an AI Agent from Scratch

The Real Pain Points

Architecture Overview

Architecture Evolution: Three Phases, Three Rewrites

Phase 1: Classifier + Router (v1, lasted one day)

Phase 2: Agentic Tool-Calling (still running)

Phase 3: Platform Expansion (Discord → Telegram → WhatsApp)

Three-Tier Model Routing: The Key to Cost Control

JSONB vs Relational Tables

Email Pipeline: Step by Step

Step 1: Incremental Fetch

Step 2: Rule Pre-filter (Zero LLM Cost)

Step 3: LLM Classification

Step 4: Field Extraction

Step 5: Dedup & Merge

Step 6: Strip Reply Chains

Lessons from 97 Git Commits

Lesson 1: The Agent Lies — "Email Sent" Hallucination

Lesson 2: Ambiguous Confirmation = Accidental Send

Lesson 3: Agent Amnesia — Forgetting Which Customer

Lesson 4: Notion Was Dead Weight — Removing an Entire Layer

Lesson 5: Context Management Is a Continuous Battle

Lesson 6: Batch Operations Can't Go Through the Agent

Lesson 7: Message Formatting Can't Depend on Prompts

Lesson 8: Discord Concurrency Bug Chain

The Complete Tool Set

The Draft-Approval Mechanism

Memory System

Session Management: Progressive Compression

Pipeline Tracking & OneDrive Sync

Voice Message Transcription

The Bilingual Factor

Scheduling

Deployment

Tech Stack

If I Started Over

Conclusion

Place this article inside the wider QKAI perspective