Groundtruth — The 90-Day Blueprint¶

Top-1% AI user & builder, in 90 days, without the pedagogy. The program is a ship ladder: one real, public, measured project per week, ascending in difficulty, with learning pulled in just-in-time by whatever the current ship demands.

(v1 of this file was a 13-track curriculum modeled on Plaintext. Deliberately retired — it optimized for comprehensiveness, not speed. It lives in git history; the useful residue is reference/jit-map.md.)

The thesis¶

The top 1% of AI users/builders is a lower bar than it looks, because the median is terrible: most engineers use AI as autocomplete, have never written an eval, never built an agent loop, never shipped an LLM feature to a real user. The gap is closed by reps at the frontier + shipped artifacts + measurement + visibility — not by knowing more.

Four leverage points, in order:

Reps — use frontier tools at the edge, daily, on real work. Taste comes from volume.
Ships — a deployed, public artifact per week. After ~10, the portfolio is the signal.
Measurement — evals are the most underpriced skill in the field. Eval everything.
Visibility — write-ups, OSS contributions, being demonstrably early on new capability.

What gets deliberately deferred (not in v1; revisited on day 91, not skipped by accident): the math ladder, classical ML, and Kaggle competitions; training models from scratch and deep serving/inference internals (vLLM internals, kernels); heavy MLOps platform work and multi-agent systems at scale. Each is named here so its absence is a choice, not a blind spot — the program trains the builder, and these are the model- researcher and platform-engineer adjacencies to grow into next. The pedagogy that is here isn't gone from the schedule — it's demand-driven: when the RAG week goes badly, that's when you learn embeddings and reranking, and it sticks because it cost you something.

Two things that are emphatically not deferred, because a builder who skips them ships liabilities: security (every ship has a "Secure it" checkpoint scaled to its attack surface — see the rules) and evaluation (rule 3). They are woven through, not saved for a track that never comes.

The ladder¶

When	Ship	What it proves
Days 1–7	00 — Arm yourself	Your own workflow is AI-native: agentic coding, a personal MCP server, an eval template
Week 2	01 — LLM feature	You can ship a model-powered feature users touch, with structured outputs and a golden-set eval
Week 3	02 — RAG	You can build retrieval that measurably retrieves — baseline → hybrid → rerank, with numbers
Week 4	03 — Agent	You can build the loop, design tools, trace runs, and report a task completion rate
Week 5	04 — Browser agent	Your agents can operate software made for humans, safely, with a measured success rate
Week 6	05 — Fine-tune	You can adapt an open model that beats its base (and challenges a frontier model) on your eval, published to the Hub
Week 7	06 — Multimodal	You can ship beyond text — voice or vision — with latency numbers
Week 8	07 — Eval gauntlet	The differentiator: tracing + evals retrofitted across every ship, one reusable toolkit, one flagship write-up
Weeks 9–12	08 — Big swing	One product you care about, with real users — plus the visibility loop (write-ups, OSS, day-one demos)

Each brief states: Mission (what ships by Sunday) · Why this rung (the so-what) · The mental model (the bridge — see below) · The path (start-here first hour, a default pick so project selection can't stall you, and a Mon–Sun build order of objectives-with-hints where each step feeds the next) · Spec (must-haves, including a Secure it checkpoint) · Eval bar (the numbers that count as done) · JIT learning (pull when stuck, not before) · Key ideas (the recap) · Check yourself (three questions with no artifact to hide behind) · Publish (what goes public) · Stretch · Proof (the sentence you can say after).

Every ship is one turn of the same loop — spec → build → measure → iterate — which is the outer, human-run counterpart to the inner agentic loop Week 0 teaches. "The path" build orders are that loop written out: the spec is day 0, the eval is the measure, the week is the iterate. Naming it is the point: the meta-skill the whole program trains is running that loop faster and more honestly than anyone else, at every altitude from a single agent task to a four-week product.

Domain focus for idea starters (binding)¶

The skills on the ladder are general AI-engineering skills — but every idea starter (the example shapes and the default pick in each brief) is drawn from one domain family: networks, operating systems, automation, security, and cloud. This is deliberate. It plays to the intended learner's existing ground, keeps the portfolio coherent (a body of infra/security AI tooling reads as a specialty, not a scatter of demos), and dovetails with Plaintext. It is a constraint on examples only — a learner with a real project in another domain should take it; the defaults exist to kill selection-stall, not to fence the field. When adding or revising a brief, its default pick and "good shapes" must stay in this family; the mental model, spec, and eval bar stay domain-neutral.

Current defaults, for reference: host-recon MCP server + advisory-watch workflow (00); CVE-advisory → structured record (01); man-page/runbook RAG (02); dependency-vuln triage agent (03); patch-and-advisory browser brief (04); the Week-2 CVE parser, fine-tuned (05); network-diagram → topology extractor (06); portfolio-wide, no theme (07); an infra/security product grown from an earlier ship (08).

The prose model (inherited from Plaintext, binding)¶

Curate the raw explanation; write original prose for the bridge. The step-by-step "how attention works / how LoRA trains" is covered by people who did it better — the JIT learning links carry that, each with a why-line and a time-box. The original value is The mental model: the compressed frame ("RAG is two systems wearing one trenchcoat"), the practitioner translation, the judgment no single link states, and one honest gotcha per ship. Write the bridge; never re-teach the basics.

Two failure modes stay banned: the thin brief (a spec and some links — a lab, not a learning unit) and the regurgitation brief (re-deriving what a linked resource already nails). The bridge frames; the links explain; the ship proves.

Rules of the game¶

Ship every week. Done and public beats complete and private. Scope down, never slip.
Everything public. Each ship is its own public repo (+ a deployed demo where it fits). This repo holds the program and your build log; ships link back here.
Evals on everything. No ship counts without its numbers. "It seems good" is not a number.
AI writes most of the code; you review every line. That's the job now. Directing and verifying at high velocity is the top-1% skill being trained.
JIT learning only. No study weeks. Hit the wall first, then pull from reference/jit-map.md. Timebox rabbit holes to 90 minutes.
Log every Sunday. What shipped (link), what you measured (numbers), what broke, what you learned, what's next. Template in log/TEMPLATE.md.
Stay at the frontier. Release notes and model cards over courses. When something new drops mid-program, being early with a demo is worth more than that week's stretch goals.
Secure what you ship. Every ship carries a Secure it checkpoint scaled to its attack surface: untrusted input is untrusted (prompt injection), model output never reaches a dangerous sink unvalidated, agents get least privilege and the lethal trifecta (untrusted input + private data + outbound channel) is broken, and the big swing gets a full OWASP LLM Top 10 pass. Security is a builder skill, not a track — and, given this program's domain focus, your sharpest differentiator.
License & data note. Any ship that ingests data, scrapes, or publishes a model/dataset states provenance and license and scrubs PII/secrets before indexing or committing. Weights and datasets live on the Hub/Kaggle and are linked, never committed.
Build the raw loop; no cm-deep tool coverage. Agent loops, RAG orchestration, and the eval harness are hand-written against the model SDK — the durable skill is the loop, tool design, and context flow, which a framework hides. We name a tool only if a ship digs into it; marketable modules we don't use (MLflow, Pinecone, Airflow, LangChain…) aren't listed as keywords. The two worth one genuine rep — LangGraph and a cloud AI platform — appear as optional stretches, done after the raw build so you leave with an opinion, not a name-drop. Full stack, stance, and rationale in TOOLKIT.md.

Definition of done (per ship)¶

Deployed / runnable by a stranger from the README in ≤5 minutes.
Eval bar met, with the numbers in the README (accuracy/success-rate + cost + latency).
Its own public repo with an honest README: what it does, what it can't, what you measured.
Build-log entry committed here, same Sunday.

A ship missing its numbers is a demo, not a ship.

Cost & accounts¶

Runs near-$0 by design: agentic coding on whatever plan you already have; free-tier GPUs (Colab/Kaggle) for the fine-tune; HF Spaces/Hub for deploys and publishing; Ollama for local models. A frontier-API budget of ~$20–40 across the 90 days removes friction but every ship has a local-model fallback. Accounts needed by Week 2: GitHub (public), Hugging Face, one frontier API key, Ollama installed.

After day 90¶

The repo's endgame: 8+ public ships, a measured eval story, a flagship write-up, one OSS contribution, and a big-swing product with users — the portfolio that is the top-1% claim. Where it goes next (deepening into serving/perf, data engineering, or research reproduction) is a decision for day 91, made from taste you didn't have on day 1.