Groundtruth — The 90-Day Blueprint¶
Top-1% AI user & builder, in 90 days, without the pedagogy. The program is a ship ladder: one real, public, measured project per week, ascending in difficulty, with learning pulled in just-in-time by whatever the current ship demands.
(v1 of this file was a 13-track curriculum modeled on Plaintext. Deliberately retired —
it optimized for comprehensiveness, not speed. It lives in git history; the useful residue
is reference/jit-map.md.)
The thesis¶
The top 1% of AI users/builders is a lower bar than it looks, because the median is terrible: most engineers use AI as autocomplete, have never written an eval, never built an agent loop, never shipped an LLM feature to a real user. The gap is closed by reps at the frontier + shipped artifacts + measurement + visibility — not by knowing more.
Four leverage points, in order:
- Reps — use frontier tools at the edge, daily, on real work. Taste comes from volume.
- Ships — a deployed, public artifact per week. After ~10, the portfolio is the signal.
- Measurement — evals are the most underpriced skill in the field. Eval everything.
- Visibility — write-ups, OSS contributions, being demonstrably early on new capability.
What gets deliberately deferred (not in v1; revisited on day 91, not skipped by accident): the math ladder, classical ML, and Kaggle competitions; training models from scratch and deep serving/inference internals (vLLM internals, kernels); heavy MLOps platform work and multi-agent systems at scale. Each is named here so its absence is a choice, not a blind spot — the program trains the builder, and these are the model- researcher and platform-engineer adjacencies to grow into next. The pedagogy that is here isn't gone from the schedule — it's demand-driven: when the RAG week goes badly, that's when you learn embeddings and reranking, and it sticks because it cost you something.
Two things that are emphatically not deferred, because a builder who skips them ships liabilities: security (every ship has a "Secure it" checkpoint scaled to its attack surface — see the rules) and evaluation (rule 3). They are woven through, not saved for a track that never comes.
The ladder¶
| When | Ship | What it proves |
|---|---|---|
| Days 1–7 | 00 — Arm yourself | Your own workflow is AI-native: agentic coding, a personal MCP server, an eval template |
| Week 2 | 01 — LLM feature | You can ship a model-powered feature users touch, with structured outputs and a golden-set eval |
| Week 3 | 02 — RAG | You can build retrieval that measurably retrieves — baseline → hybrid → rerank, with numbers |
| Week 4 | 03 — Agent | You can build the loop, design tools, trace runs, and report a task completion rate |
| Week 5 | 04 — Browser agent | Your agents can operate software made for humans, safely, with a measured success rate |
| Week 6 | 05 — Fine-tune | You can adapt an open model that beats its base (and challenges a frontier model) on your eval, published to the Hub |
| Week 7 | 06 — Multimodal | You can ship beyond text — voice or vision — with latency numbers |
| Week 8 | 07 — Eval gauntlet | The differentiator: tracing + evals retrofitted across every ship, one reusable toolkit, one flagship write-up |
| Weeks 9–12 | 08 — Big swing | One product you care about, with real users — plus the visibility loop (write-ups, OSS, day-one demos) |
Each brief states: Mission (what ships by Sunday) · Why this rung (the so-what) · The mental model (the bridge — see below) · The path (start-here first hour, a default pick so project selection can't stall you, and a Mon–Sun build order of objectives-with-hints where each step feeds the next) · Spec (must-haves, including a Secure it checkpoint) · Eval bar (the numbers that count as done) · JIT learning (pull when stuck, not before) · Key ideas (the recap) · Check yourself (three questions with no artifact to hide behind) · Publish (what goes public) · Stretch · Proof (the sentence you can say after).
Every ship is one turn of the same loop — spec → build → measure → iterate — which is the outer, human-run counterpart to the inner agentic loop Week 0 teaches. "The path" build orders are that loop written out: the spec is day 0, the eval is the measure, the week is the iterate. Naming it is the point: the meta-skill the whole program trains is running that loop faster and more honestly than anyone else, at every altitude from a single agent task to a four-week product.
Domain focus for idea starters (binding)¶
The skills on the ladder are general AI-engineering skills — but every idea starter (the example shapes and the default pick in each brief) is drawn from one domain family: networks, operating systems, automation, security, and cloud. This is deliberate. It plays to the intended learner's existing ground, keeps the portfolio coherent (a body of infra/security AI tooling reads as a specialty, not a scatter of demos), and dovetails with Plaintext. It is a constraint on examples only — a learner with a real project in another domain should take it; the defaults exist to kill selection-stall, not to fence the field. When adding or revising a brief, its default pick and "good shapes" must stay in this family; the mental model, spec, and eval bar stay domain-neutral.
Current defaults, for reference: host-recon MCP server + advisory-watch workflow (00); CVE-advisory → structured record (01); man-page/runbook RAG (02); dependency-vuln triage agent (03); patch-and-advisory browser brief (04); the Week-2 CVE parser, fine-tuned (05); network-diagram → topology extractor (06); portfolio-wide, no theme (07); an infra/security product grown from an earlier ship (08).
The prose model (inherited from Plaintext, binding)¶
Curate the raw explanation; write original prose for the bridge. The step-by-step "how attention works / how LoRA trains" is covered by people who did it better — the JIT learning links carry that, each with a why-line and a time-box. The original value is The mental model: the compressed frame ("RAG is two systems wearing one trenchcoat"), the practitioner translation, the judgment no single link states, and one honest gotcha per ship. Write the bridge; never re-teach the basics.
Two failure modes stay banned: the thin brief (a spec and some links — a lab, not a learning unit) and the regurgitation brief (re-deriving what a linked resource already nails). The bridge frames; the links explain; the ship proves.
Rules of the game¶
- Ship every week. Done and public beats complete and private. Scope down, never slip.
- Everything public. Each ship is its own public repo (+ a deployed demo where it fits). This repo holds the program and your build log; ships link back here.
- Evals on everything. No ship counts without its numbers. "It seems good" is not a number.
- AI writes most of the code; you review every line. That's the job now. Directing and verifying at high velocity is the top-1% skill being trained.
- JIT learning only. No study weeks. Hit the wall first, then pull from
reference/jit-map.md. Timebox rabbit holes to 90 minutes. - Log every Sunday. What shipped (link), what you measured (numbers), what broke, what
you learned, what's next. Template in
log/TEMPLATE.md. - Stay at the frontier. Release notes and model cards over courses. When something new drops mid-program, being early with a demo is worth more than that week's stretch goals.
- Secure what you ship. Every ship carries a Secure it checkpoint scaled to its attack surface: untrusted input is untrusted (prompt injection), model output never reaches a dangerous sink unvalidated, agents get least privilege and the lethal trifecta (untrusted input + private data + outbound channel) is broken, and the big swing gets a full OWASP LLM Top 10 pass. Security is a builder skill, not a track — and, given this program's domain focus, your sharpest differentiator.
- License & data note. Any ship that ingests data, scrapes, or publishes a model/dataset states provenance and license and scrubs PII/secrets before indexing or committing. Weights and datasets live on the Hub/Kaggle and are linked, never committed.
- Build the raw loop; no cm-deep tool coverage. Agent loops, RAG orchestration, and
the eval harness are hand-written against the model SDK — the durable skill is the loop,
tool design, and context flow, which a framework hides. We name a tool only if a ship
digs into it; marketable modules we don't use (MLflow, Pinecone, Airflow, LangChain…)
aren't listed as keywords. The two worth one genuine rep — LangGraph and a cloud AI
platform — appear as optional stretches, done after the raw build so you leave with an
opinion, not a name-drop. Full stack, stance, and rationale in
TOOLKIT.md.
Definition of done (per ship)¶
- Deployed / runnable by a stranger from the README in ≤5 minutes.
- Eval bar met, with the numbers in the README (accuracy/success-rate + cost + latency).
- Its own public repo with an honest README: what it does, what it can't, what you measured.
- Build-log entry committed here, same Sunday.
A ship missing its numbers is a demo, not a ship.
Cost & accounts¶
Runs near-$0 by design: agentic coding on whatever plan you already have; free-tier GPUs (Colab/Kaggle) for the fine-tune; HF Spaces/Hub for deploys and publishing; Ollama for local models. A frontier-API budget of ~$20–40 across the 90 days removes friction but every ship has a local-model fallback. Accounts needed by Week 2: GitHub (public), Hugging Face, one frontier API key, Ollama installed.
After day 90¶
The repo's endgame: 8+ public ships, a measured eval story, a flagship write-up, one OSS contribution, and a big-swing product with users — the portfolio that is the top-1% claim. Where it goes next (deepening into serving/perf, data engineering, or research reproduction) is a decision for day 91, made from taste you didn't have on day 1.