Architecture & Retrospective

The fund analyzes markets, debates, trades a simulated $1M book, and justifies itself — every run. Deterministic guardrails (position sizing, turnover, sector limits, stop-loss/take-profit) sit outside the model; the LLM must argue its case and respond to the bear thesis before anything executes. A LangGraph cycle sits on a hardened LLM gateway, with idempotent stores as the system of record, Qdrant as long-term memory, and evals in CI. The differentiator isn't returns — it's radical transparency.

System architecture

One LLM gateway is the single choke point — routing, retries, tracing, and cost tracking are one-time costs, not per-agent ones.

Orchestrate

LangGraph daily cycleconditional routing · SQLite resume · HITL approval

Agents

Researchtool-calling

Debatebull · bear · risk

PM synthesisstrong tier

Risk enginedeterministic

Reflectionweekly

Investor letterweekly

Gateway

LLM Gateway — Pydantic schemas · repair retry · backoff · tier routing + fallback · tool loop · cost/latency log

Providers

OpenAI

Fallback route

Langfusetracing (opt-in)

Data

SQLiterun-progress / resume

CSV · JSONLidempotent by run_id

Qdrantchunked RAG · sector metadata

Durable run history

Assure

Decision evals in CI

Grounding judge

CalibrationBrier · curves

Retrieval eval

Surface

GitHub Pages dashboard

MCP server7 read-only tools

Weekly investor letter

Run state machine

Each daily run is a guarded LangGraph state machine. Guardrails run before execution; an empty or fully-rejected decision skips execution; any node error finalizes with diagnostics; a killed run resumes by reusing its run_id.

Idempotent: re-executing a resumed run overwrites by run_id — no duplicate trades or journal entries

Non-idempotent guard: an already-sent tweet isn't re-posted on resume

One daily cycle — sequence

How the components talk during a single run. Solid arrows are calls; dashed are returns.

Key engineering decisions

The calls that shaped the project — foundational bets first, then the choices a constraint forced later.

Decision	Why
One LLM gateway as the single choke point	The highest-leverage refactor in the repo: routing, retries, structured-output validation, tracing, and cost tracking become one-time costs instead of per-agent ones. Every agent flows through it — the later provider seam and tool loop were built on it.
Deterministic guardrails outside the LLM	The portfolio manager can be creative; the risk layer is boring on purpose. Position sizing, turnover, sector limits, and stop-loss/take-profit are enforced in code the model can't argue past.
LangGraph as the only runner	The legacy linear orchestrator was deleted once the graph reached parity. Two orchestrators is debt, not safety.
SQLite, not Postgres, as the system of record	Transactions, upserts, and queryability without running a server. Stop committing raw state to `main`; keep committing exports to Pages.
Langfuse over LangSmith for tracing	Self-hostable and open source — a better story for an in-public project, and a no-op unless keys are set.
Idempotency-based resume over LangGraph's native `SqliteSaver`	The graph state carries non-serializable live handles (engine, clients). Persisting progress + reusing the `run_id` leans on the idempotent stores for duplicate-free re-execution — correct, without a large risky state refactor.
Chunking eval with a deterministic hashing embedder + in-memory Qdrant	Runs in CI with no API key, yet does real vector search. Cosine over TF vectors rewards term concentration — the actual reason chunking helps — so the 0.15 → 1.00 hit@1 win is honest, not rigged.
Grounding gate before publish	Both a daily decision and the weekly letter are checked against the facts they had; unsupported claims are blocked, never published or tweeted.

Deliberately not built

Scope is a feature — what was left out on purpose.

Live brokerage / real money — transparency works because it's simulated

Fine-tuning / DPO — too few decisions, no reward signal

Knowledge graph (Neo4j) — vectors get 90% at 10% of the cost

Market microstructure — quant infra, not AI

Multi-user SaaS — effort without AI learning

React SPA — HTML surfaces the AI work fine

Hand-rolled agent framework — LangGraph is the point

Naive backtest — lookahead contamination