Skip to content

Groundtruth — The 90-Day Blueprint

Top-1% AI user & builder, in 90 days, without the pedagogy. The program is a ship ladder: one real, public, measured project per week, ascending in difficulty, with learning pulled in just-in-time by whatever the current ship demands.

(v1 of this file was a 13-track curriculum modeled on Plaintext. Deliberately retired — it optimized for comprehensiveness, not speed. It lives in git history; the useful residue is reference/jit-map.md.)

The thesis

The top 1% of AI users/builders is a lower bar than it looks, because the median is terrible: most engineers use AI as autocomplete, have never written an eval, never built an agent loop, never shipped an LLM feature to a real user. The gap is closed by reps at the frontier + shipped artifacts + measurement + visibility — not by knowing more.

Four leverage points, in order:

  1. Reps — use frontier tools at the edge, daily, on real work. Taste comes from volume.
  2. Ships — a deployed, public artifact per week. After ~10, the portfolio is the signal.
  3. Measurement — evals are the most underpriced skill in the field. Eval everything.
  4. Visibility — write-ups, OSS contributions, being demonstrably early on new capability.

What gets deliberately deferred (not in v1; revisited on day 91, not skipped by accident): the math ladder, classical ML, and Kaggle competitions; training models from scratch and deep serving/inference internals (vLLM internals, kernels); heavy MLOps platform work and multi-agent systems at scale. Each is named here so its absence is a choice, not a blind spot — the program trains the builder, and these are the model- researcher and platform-engineer adjacencies to grow into next. The pedagogy that is here isn't gone from the schedule — it's demand-driven: when the RAG week goes badly, that's when you learn embeddings and reranking, and it sticks because it cost you something.

Two things that are emphatically not deferred, because a builder who skips them ships liabilities: security (every ship has a "Secure it" checkpoint scaled to its attack surface — see the rules) and evaluation (rule 3). They are woven through, not saved for a track that never comes.

The ladder

When Ship What it proves
Days 1–7 00 — Arm yourself Your own workflow is AI-native: agentic coding, a personal MCP server, an eval template
Week 2 01 — LLM feature You can ship a model-powered feature users touch, with structured outputs and a golden-set eval
Week 3 02 — RAG You can build retrieval that measurably retrieves — baseline → hybrid → rerank, with numbers
Week 4 03 — Agent You can build the loop, design tools, trace runs, and report a task completion rate
Week 5 04 — Browser agent Your agents can operate software made for humans, safely, with a measured success rate
Week 6 05 — Fine-tune You can adapt an open model that beats its base (and challenges a frontier model) on your eval, published to the Hub
Week 7 06 — Multimodal You can ship beyond text — voice or vision — with latency numbers
Week 8 07 — Eval gauntlet The differentiator: tracing + evals retrofitted across every ship, one reusable toolkit, one flagship write-up
Weeks 9–12 08 — Big swing One product you care about, with real users — plus the visibility loop (write-ups, OSS, day-one demos)

Each brief states: Mission (what ships by Sunday) · Why this rung (the so-what) · The mental model (the bridge — see below) · The path (start-here first hour, a default pick so project selection can't stall you, and a Mon–Sun build order of objectives-with-hints where each step feeds the next) · Spec (must-haves, including a Secure it checkpoint) · Eval bar (the numbers that count as done) · JIT learning (pull when stuck, not before) · Key ideas (the recap) · Check yourself (three questions with no artifact to hide behind) · Publish (what goes public) · Stretch · Proof (the sentence you can say after).

Every ship is one turn of the same loopspec → build → measure → iterate — which is the outer, human-run counterpart to the inner agentic loop Week 0 teaches. "The path" build orders are that loop written out: the spec is day 0, the eval is the measure, the week is the iterate. Naming it is the point: the meta-skill the whole program trains is running that loop faster and more honestly than anyone else, at every altitude from a single agent task to a four-week product.

Domain focus for idea starters (binding)

The skills on the ladder are general AI-engineering skills — but every idea starter (the example shapes and the default pick in each brief) is drawn from one domain family: networks, operating systems, automation, security, and cloud. This is deliberate. It plays to the intended learner's existing ground, keeps the portfolio coherent (a body of infra/security AI tooling reads as a specialty, not a scatter of demos), and dovetails with Plaintext. It is a constraint on examples only — a learner with a real project in another domain should take it; the defaults exist to kill selection-stall, not to fence the field. When adding or revising a brief, its default pick and "good shapes" must stay in this family; the mental model, spec, and eval bar stay domain-neutral.

Current defaults, for reference: host-recon MCP server + advisory-watch workflow (00); CVE-advisory → structured record (01); man-page/runbook RAG (02); dependency-vuln triage agent (03); patch-and-advisory browser brief (04); the Week-2 CVE parser, fine-tuned (05); network-diagram → topology extractor (06); portfolio-wide, no theme (07); an infra/security product grown from an earlier ship (08).

The prose model (inherited from Plaintext, binding)

Curate the raw explanation; write original prose for the bridge. The step-by-step "how attention works / how LoRA trains" is covered by people who did it better — the JIT learning links carry that, each with a why-line and a time-box. The original value is The mental model: the compressed frame ("RAG is two systems wearing one trenchcoat"), the practitioner translation, the judgment no single link states, and one honest gotcha per ship. Write the bridge; never re-teach the basics.

Two failure modes stay banned: the thin brief (a spec and some links — a lab, not a learning unit) and the regurgitation brief (re-deriving what a linked resource already nails). The bridge frames; the links explain; the ship proves.

Rules of the game

  1. Ship every week. Done and public beats complete and private. Scope down, never slip.
  2. Everything public. Each ship is its own public repo (+ a deployed demo where it fits). This repo holds the program and your build log; ships link back here.
  3. Evals on everything. No ship counts without its numbers. "It seems good" is not a number.
  4. AI writes most of the code; you review every line. That's the job now. Directing and verifying at high velocity is the top-1% skill being trained.
  5. JIT learning only. No study weeks. Hit the wall first, then pull from reference/jit-map.md. Timebox rabbit holes to 90 minutes.
  6. Log every Sunday. What shipped (link), what you measured (numbers), what broke, what you learned, what's next. Template in log/TEMPLATE.md.
  7. Stay at the frontier. Release notes and model cards over courses. When something new drops mid-program, being early with a demo is worth more than that week's stretch goals.
  8. Secure what you ship. Every ship carries a Secure it checkpoint scaled to its attack surface: untrusted input is untrusted (prompt injection), model output never reaches a dangerous sink unvalidated, agents get least privilege and the lethal trifecta (untrusted input + private data + outbound channel) is broken, and the big swing gets a full OWASP LLM Top 10 pass. Security is a builder skill, not a track — and, given this program's domain focus, your sharpest differentiator.
  9. License & data note. Any ship that ingests data, scrapes, or publishes a model/dataset states provenance and license and scrubs PII/secrets before indexing or committing. Weights and datasets live on the Hub/Kaggle and are linked, never committed.
  10. Build the raw loop; no cm-deep tool coverage. Agent loops, RAG orchestration, and the eval harness are hand-written against the model SDK — the durable skill is the loop, tool design, and context flow, which a framework hides. We name a tool only if a ship digs into it; marketable modules we don't use (MLflow, Pinecone, Airflow, LangChain…) aren't listed as keywords. The two worth one genuine rep — LangGraph and a cloud AI platform — appear as optional stretches, done after the raw build so you leave with an opinion, not a name-drop. Full stack, stance, and rationale in TOOLKIT.md.

Definition of done (per ship)

  • Deployed / runnable by a stranger from the README in ≤5 minutes.
  • Eval bar met, with the numbers in the README (accuracy/success-rate + cost + latency).
  • Its own public repo with an honest README: what it does, what it can't, what you measured.
  • Build-log entry committed here, same Sunday.

A ship missing its numbers is a demo, not a ship.

Cost & accounts

Runs near-$0 by design: agentic coding on whatever plan you already have; free-tier GPUs (Colab/Kaggle) for the fine-tune; HF Spaces/Hub for deploys and publishing; Ollama for local models. A frontier-API budget of ~$20–40 across the 90 days removes friction but every ship has a local-model fallback. Accounts needed by Week 2: GitHub (public), Hugging Face, one frontier API key, Ollama installed.

After day 90

The repo's endgame: 8+ public ships, a measured eval story, a flagship write-up, one OSS contribution, and a big-swing product with users — the portfolio that is the top-1% claim. Where it goes next (deepening into serving/perf, data engineering, or research reproduction) is a decision for day 91, made from taste you didn't have on day 1.