Glasshouse Fund

The fund analyzes markets, debates, trades a simulated $1M book, and justifies itself — every run. Deterministic guardrails (position sizing, turnover, sector limits, stop-loss/take-profit) sit outside the model; the LLM must argue its case and respond to the bear thesis before anything executes. A LangGraph cycle sits on a hardened LLM gateway, with idempotent stores as the system of record, Qdrant as long-term memory, and evals in CI. The differentiator isn't returns — it's radical transparency.

System architecture

One LLM gateway is the single choke point — routing, retries, tracing, and cost tracking are one-time costs, not per-agent ones.

Orchestrate
LangGraph daily cycleconditional routing · SQLite resume · HITL approval
Agents
Researchtool-calling
Debatebull · bear · risk
PM synthesisstrong tier
Risk enginedeterministic
Reflectionweekly
Investor letterweekly
Gateway
LLM Gateway — Pydantic schemas · repair retry · backoff · tier routing + fallback · tool loop · cost/latency log
Providers
OpenAI
Fallback route
Langfusetracing (opt-in)
Data
SQLiterun-progress / resume
CSV · JSONLidempotent by run_id
Qdrantchunked RAG · sector metadata
Durable run history
Assure
Decision evals in CI
Grounding judge
CalibrationBrier · curves
Retrieval eval
Surface
GitHub Pages dashboard
MCP server7 read-only tools
Weekly investor letter

Run state machine

Each daily run is a guarded LangGraph state machine. Guardrails run before execution; an empty or fully-rejected decision skips execution; any node error finalizes with diagnostics; a killed run resumes by reusing its run_id.

Contextualize mark-to-market · research · memory Decide debate → PM synthesis → grounding Guardrails risk review · rebalance approved? yes Approve & Execute HITL · execute · track Persist & Publish journal · report · tweet · export · ingest Done · status recorded none → skip exec Failed finalize · diagnostics on error resume · reuse run_id
Idempotent: re-executing a resumed run overwrites by run_id — no duplicate trades or journal entries
Non-idempotent guard: an already-sent tweet isn't re-posted on resume

One daily cycle — sequence

How the components talk during a single run. Solid arrows are calls; dashed are returns.

Graph Agents Gateway Risk Simulator Stores research + memory debate → decide validated JSON grounding check grounded? review + stop-loss / take-profit approved trades execute approved upsert trades / journal export dashboard · publish (if grounded)

Key engineering decisions

The calls that shaped the project — foundational bets first, then the choices a constraint forced later.

DecisionWhy
One LLM gateway as the single choke pointThe highest-leverage refactor in the repo: routing, retries, structured-output validation, tracing, and cost tracking become one-time costs instead of per-agent ones. Every agent flows through it — the later provider seam and tool loop were built on it.
Deterministic guardrails outside the LLMThe portfolio manager can be creative; the risk layer is boring on purpose. Position sizing, turnover, sector limits, and stop-loss/take-profit are enforced in code the model can't argue past.
LangGraph as the only runnerThe legacy linear orchestrator was deleted once the graph reached parity. Two orchestrators is debt, not safety.
SQLite, not Postgres, as the system of recordTransactions, upserts, and queryability without running a server. Stop committing raw state to main; keep committing exports to Pages.
Langfuse over LangSmith for tracingSelf-hostable and open source — a better story for an in-public project, and a no-op unless keys are set.
Idempotency-based resume over LangGraph's native SqliteSaverThe graph state carries non-serializable live handles (engine, clients). Persisting progress + reusing the run_id leans on the idempotent stores for duplicate-free re-execution — correct, without a large risky state refactor.
Chunking eval with a deterministic hashing embedder + in-memory QdrantRuns in CI with no API key, yet does real vector search. Cosine over TF vectors rewards term concentration — the actual reason chunking helps — so the 0.15 → 1.00 hit@1 win is honest, not rigged.
Grounding gate before publishBoth a daily decision and the weekly letter are checked against the facts they had; unsupported claims are blocked, never published or tweeted.

Deliberately not built

Scope is a feature — what was left out on purpose.

Live brokerage / real money — transparency works because it's simulated
Fine-tuning / DPO — too few decisions, no reward signal
Knowledge graph (Neo4j) — vectors get 90% at 10% of the cost
Market microstructure — quant infra, not AI
Multi-user SaaS — effort without AI learning
React SPA — HTML surfaces the AI work fine
Hand-rolled agent framework — LangGraph is the point
Naive backtest — lookahead contamination