Skip to content

08 — The Big Swing (Weeks 9–12)

Mission

One product you genuinely care about, four weeks, real users. It should integrate the stack you now own — agentic core, retrieval where it needs knowledge, evals and traces from day one — and end with a public launch write-up. In parallel, the visibility loop runs weekly, because weeks 9–12 are also where the portfolio gets seen.

Why this rung

Seven one-week ships prove range; the big swing proves you can go deep: hold scope for a month, survive contact with users, iterate on evidence instead of vibes. It's the difference between "has built many demos" and "has shipped a product," and it's what interviews, clients, and cofounders actually probe. Picking something you care about isn't sentimental advice — it's what keeps week 11 from collapsing when the novelty is gone.

Choosing it: by now the reps have given you taste — trust it. Strong candidates, in this program's domain: the Week 1–7 ship that most refused to leave your head, now taken seriously (the log-triage agent grown into a real on-call assistant; the CVE parser grown into a stack-aware advisory pipeline; the runbook RAG grown into a team's ops copilot); a real pain in your own infra/security/ops work; or the tool you wanted mid-program and nobody had built. It should be ambitious enough to scare you slightly and small enough that a brutal v1 cut ships in two weeks.

The mental model

A product is a loop with users inside it. For seven weeks your feedback came from eval sets you wrote — a closed loop you controlled. Real users break that closure: they bring inputs you never imagined, workflows you didn't design for, and indifference you can't argue with. The month's core discipline is letting evidence replace intuition at exactly the moment intuition gets loud: traces of real usage tell you what to fix, and your opinion of what's interesting stops being an input. This is also where your Week-8 instrumentation pays for itself twice — the same traces that debug the product are the user research.

Scope is the only resource you actually control. Time is fixed (four weeks), quality is non-negotiable (evals in CI, like every ship), so scope is the release valve — always scope, never quality, never the deadline. The one-page spec's most important section is what v1 excludes, and the week-11 instruction to kill a feature isn't hazing: something in your plan is dead weight, and finding it while it's cheap is the skill. The counterintuitive part is that ruthless cutting is what makes the product feel finished — five things that work beat twelve that almost do.

And the honest metric hierarchy: signups flatter, usage informs, retention decides. A spike of curious visitors is weather; the five people who come back in week three are climate. Your own daily use counts — but only if you'd genuinely miss it, and you have to audit yourself for polite lies. The launch write-up reports retention-shaped numbers because they're the only ones a reader (or a future interviewer) can't discount.

The gotcha — the week-11 trough is structural, not personal. Novelty is gone, users are lukewarm about the feature you loved, the backlog looks infinite, and the quiet failure mode is retreating to what feels productive: polishing code nobody asked for. The escape is mechanical, not motivational: watch five real sessions, fix the top thing that stopped them, ship it, repeat. Momentum in week 11 comes from the loop, not from mood.

The four weeks

Start here (Monday morning, week 9): write the one-page spec before touching code — the user, the job, the single core loop, what v1 excludes — then create the repo from your now-standard skeleton the same day: eval toolkit in, tracing on, CI green on a hello-world slice. Default pick if you're still torn by noon: take the ship you've used most yourself since building it and make it real for one specific person you can name. Caring is load-bearing, and your own usage is the evidence you already care.

  • Week 9 — spec + skeleton. A one-page spec: the user, the job, the single core loop, what v1 ruthlessly excludes. Walking skeleton deployed by Sunday: thinnest end-to-end slice, evals and tracing wired from the start (you have a toolkit; use it from commit 1).
  • Week 10 — the core loop, made good. Build the eval set alongside the product. Ship to 3–5 real humans by Sunday — recruited from your community/network; "real" means they can hurt your feelings.
  • Week 11 — iterate on evidence. Watch traces of real usage; fix what users actually hit, not what you find interesting. Kill one feature (there's always one). Grow to ≥5 active users or one genuinely retained daily use — including your own honest daily use.
  • Week 12 — harden, operate, launch. This is the week the program's lightest dimension — running it in production — gets real. Three passes:
  • Ops (LLMOps): the things that turn a demo into a service — failure paths, retries, rate limits, a hard cost ceiling with alerting, request/response logging, and monitoring: watch quality, cost, and latency over time (your traces feed this) and a plan for the day a provider silently changes the model under you. Write a one-page incident runbook (what breaks, how you'd know, what you'd do).
  • Security (full pass): walk the OWASP LLM Top 10 against your product — prompt injection, insecure output handling, excessive agency, sensitive-info disclosure — and fix or explicitly accept each. This is your Week-3/4 security checkpoints, graduated to a whole product.
  • Launch: the write-up — what it is, who it's for, the numbers (users, usage, evals, cost), the security posture, and what building it taught you that the seven ships didn't. Post it where its users live.

Marketable stretch (optional, if you have credits or a job that uses one): deploy the product behind a cloud AI platform — AWS Bedrock, Azure AI, or Google Vertex. It's on-theme with the cloud domain and the single most marketable enterprise line on your résumé — but earn it by actually shipping on it (routing, IAM, cost controls), not by listing it. litellm already lets you point at Bedrock/Vertex with a one-line change.

The visibility loop (parallel, ~half a day per week)

  • [ ] Weekly public write-up — the build-log habit, aimed outward: progress, one real number, one honest lesson. Same day, every week.
  • [ ] One OSS contribution during the four weeks, to a tool you actually used this program (Unsloth, Langfuse, browser-use, an MCP SDK, Gradio…): a real bug fix, a docs improvement from your own confusion, or an adapter/example. Engage the maintainers properly. One merged PR outweighs a hundred followers.
  • [ ] Be early once. When a model or capability drops during these weeks — one will — publish a same-week demo or measured take through your stack ("ran the new model through my gauntlet; here's the table"). Your Week-8 toolkit makes you faster at this than almost anyone; that speed is the point.

Eval bar

  • ≥5 real users (or 1 measured, retained daily-use workflow), with usage evidenced from traces, not memory.
  • The product's eval suite runs in CI; quality/cost/latency in the README like every ship.
  • Production posture shipped: cost ceiling + monitoring live, an incident runbook written, and an OWASP LLM Top 10 pass documented (fixed or consciously accepted, item by item).
  • Launch write-up published; 4 weekly write-ups shipped on schedule.
  • One OSS PR submitted (merged is the goal; a substantive PR under review counts at day 90).
  • One "early" artifact published within a week of some release.

JIT learning — pull when stuck

By design, almost everything this month needs is already in your heads and repos. When something new blocks you, reference/jit-map.md is the index. The one genuinely new muscle is talking to users — keep it lightweight: watch 5 sessions (traces count), ask 3 users what almost stopped them, fix that.

Key ideas

  • A product is a loop with users inside; traces replace intuition the day real usage starts.
  • Scope is the only free variable — cut features, never quality, never the date.
  • What v1 excludes is the spec's most load-bearing section.
  • Signups flatter, usage informs, retention decides; audit your own daily use for polite lies.
  • The week-11 trough is structural; the exit is mechanical (watch sessions → fix → ship).
  • Working in public compounds: write-ups, OSS, and being early are distribution you own.

Check yourself

  • What distinguishes a retention-shaped number from a launch-shaped one in your write-up?
  • Users ignore your favorite feature and lean hard on a side utility. What does the top-1% move look like?
  • It's day 78, you're behind, and two features remain. Walk the decision.

Publish

Everything. That's the month.

Proof

"In 90 days I shipped eight measured AI products, published the evals to prove them, contributed to the tools I build on, and ran a real product from spec to users — here's the portfolio."

Day 91

Rest. Then reread your Week-1 write-up and marvel. Where to deepen next — serving and performance, data engineering, research reproduction, or doubling down on the big swing — is a decision you're now qualified to make from taste. If the big swing has a pulse: follow it.