Transparent by design. Every choice — sources, vocabulary, the analysis rules, the plan logic — is explained here.
Three questions Strata Mundo answers
I
Where is the learner, really?
The condition observed
No tool reads how a learner reasons
Schools push a fixed curriculum, ignoring what each learner has mastered
Khan-style probes measure performance, not understanding
Without the full picture, tailored learning is guesswork
Strata Mundo · the remedy
A telemetry-based diagnostic, with a probe loop
Reads the full trajectory, not just the final answer
Names specific misconceptions with traceable evidence
Categorical states — never percentages
Loop: assess → diagnose → plan → probe → declare
II
What should they work on next?
The condition observed
Knowing the gaps doesn’t tell you the order
Concept dependencies are real but invisible
Boxed curricula assume linear order
Guides re-cover mastered material or skip foundational gaps
Strata Mundo · the remedy
A mastery atlas grounded in published progressions
Every standard, every prerequisite, in one view
Skip what is mastered; focus where it is needed
Built on the Coherence Map and the IM Sections
III
What tools will actually work for them?
The condition observed
Hours hunting for the right activity
Boxed curricula offer one type of practice
Hands-on, real-world activities are hard to find
Math learned in isolation gets forgotten
Strata Mundo · the remedy
A tailored plan from a curated, multimodal library
Concrete → representational → abstract per gap
On-screen + off-screen + hands-on per concept
Library grown by the community: AI-vetted, human-approved
Authoritative sources
Strata Mundo doesn't invent terminology, groupings, or sequencing. Every level of the hierarchy comes from a published, authoritative source.
Progression
The full vertical arc of one mathematical domain across grades — every standard a 4th grader must master in that domain, plus the prerequisite standards (often from neighboring domains) that the published Progressions document explicitly cites as foundations. The 5 strata on the voyage page are the 5 progressions.
Source: Progressions for the Common Core State Standards in Mathematics
Bill McCallum, Hung-Hsi Wu, Phil Daro et al. (University of Arizona, Institute for Mathematics and Education)
The 8 named misconceptions Strata Mundo detects (e.g., 'Bigger denominator means bigger fraction', 'Notational confusion'). Each carries diagnostic signals, manifestation patterns, and prerequisite links used by the analyzer and Plan Architect.
Source: Synthesized from Van de Walle's Elementary and Middle School Mathematics, the K-5 CCSS-M Progressions documents (3-5 Number and Operations — Fractions), and rational-number misconception research
Behr, Lesh, Post (rational-number quartet); Mack (informal knowledge of fractions); Siegler (overlapping waves theory of strategy use)
The hierarchy: Progression → Section → Standard. Each level uses its source's actual published terminology.
Glossary
Mastery map — the structured output of analysis. Every standard gets one of four states.
Mastered (green, emerald-600) — reliably understood with clear reasoning across multiple problems. Meets analysis rule R10.
Building the skill (amber) — reasoning is on the right track but not yet reliable. Some right, some wrong, OR right only after multiple attempts. More varied practice needed.
Misconception detected (red, with brass warning cartouche) — a specific named wrong mental model is firing. Targeted intervention required.
Not yet probed (stone-400) — this standard hasn't been touched in any completed assessment yet. Neither known nor unknown.
Telemetry — every interaction during an assessment recorded as a timestamped event: placement, removal, commit_attempt, reset.
Focused probe — a narrow re-assessment of one standard (4–6 problems, ~10 min). Run after the recommended activities to verify a misconception has resolved.
Plan Architect — Anthropic Managed Agent that reads a mastery map and writes a tailored plan with 2–3 activities per priority gap.
Smart-skip — when generating a plan, the Plan Architect skips Sections that are already mastered and starts at the first Section with any flagged standard.
How the assessment works
~10 minutes, 7 problems. Drag-and-build mechanic: the learner drags unit fraction pieces (1/2, 1/3, 1/4, 1/6, 1/8) onto a target bar to construct the requested fraction.
Some problems force equivalence reasoning by restricting the palette (e.g., "build 2/3 using only sixths").
No typed answers. No multiple choice. The mechanic asks the learner to show, not tell.
Every interaction is recorded as process telemetry — drags, removals, commits, resets, timing.
v1 covers 11 standards across grades 2–4 (the part of the K-5 Fractions Progression we currently probe, plus 2 prerequisite Geometry standards on partitioning).
How analysis works
A single Claude Opus 4.7 call reads the telemetry and produces the mastery map. Analysis follows ten reasoning rules (R1–R10) that prioritize process over outcome.
R1 — Process over outcome. Don't infer mastery from a correct final answer alone.
R2 — First-commit-success with deliberate pacing is a strong "demonstrated" signal.
R3 — Strategy-switching on reset (different denominators on the second attempt) is comparably strong evidence — self-correction is one of the strongest mastery signals research has (Rittle-Johnson 2017, Siegler's overlapping-waves theory).
R4 — Same-strategy resets = guessing/fiddling, not reasoning.
R5 — Three or more commit attempts with the same composition = working, not mastered.
R6 — Rapid commits AND wrong = guessing (Wise 2017). Speed alone is not a guessing signal.
R7 — Specific wrong-commit content maps to specific named misconceptions, declared in each problem's response map.
R8 — No commit attempt → not_assessed.
R9 — Evidence in data, not narrative. Use plain language; problem IDs go in audit fields, not in prose.
R10 — "Mastered" requires success across multiple problems for a standard, with clear reasoning.
How the plan is generated
The Plan Architect is an Anthropic Managed Agent running on Claude Opus 4.7. It reads the mastery map and writes a tailored plan in 1–3 minutes. The agent is given the mastery map, a curated resource library, the misconception taxonomy, the Coherence Map of CCSS prerequisites, and any prior plans for this learner.
Which standards make the cut
Red and amber only. Activities are written only for standards in misconception or building the skill. Mastered and not-yet-probed standards get no activities.
Smart-skip the curriculum. The 7 IM fractions sections are evaluated in order; the FIRST section with any flagged standard becomes “now”. Standards in later sections are deferred — the plan stays focused on one section, not scattered across the whole progression.
Cap of 5 priority gaps. If more standards qualify, misconceptions outrank building-the-skill, and prerequisites outrank downstream standards.
Differential diagnosis. For each priority gap, the agent checks whether the real issue is the standard itself or an earlier prerequisite. If prerequisite, activities target the prerequisite — not the downstream standard.
Which activities, and how many
2–3 activities per priority gap, picked from a curated resource library — never generated. Each activity in the library is pre-tagged with the misconceptions it addresses and its modality.
Modality spread. Each gap gets at least one hands-on activity (manipulative or physical) and at least one visual/digital (video, app), optionally one symbolic (worksheet). Different children land at different doors.
Avoids failed resources. If a prior plan already tried a resource and the same misconception is still flagged, the agent picks a different one — same modality is fine, but a different resource.
In what order
Within a gap: Concrete → Representational → Abstract. Hands-on first, then app/video, then worksheet (Van de Walle 2014).
Across gaps: severity (misconceptions before building-the-skill), then by Coherence Map layer (prerequisites before downstream).
The voyage view deduplicates. If one activity helps two standards, it shows once on the page; the “Why this activity?” disclosure tells you which standards and misconceptions it serves.
What every activity carries
Plain-language rationale. A 1–2 sentence explanation of why this resource for this learner's specific misconception, hidden behind the “Why this activity?” disclosure.
Tags. The misconception(s) and CCSS standard(s) it addresses, also inside “Why this activity?”.
The probe loop
The general assessment maps the broad mastery picture across many standards.
The plan prescribes activities for the flagged standards.
After the learner does the activities, a focused probe runs on one standard — ~4–6 problems, ~10 minutes — to verify the misconception has resolved.
If resolved → the standard moves to Mastered in the overall mastery map.
If not resolved → the Plan Architect re-plans with options: same activities + more time, different modality, or escalate to a prerequisite.
This loop is what distinguishes a diagnostic of current misconception from proof of mastery. Mastery is earned through the loop over time, not claimed from a single assessment.
How the diagnosis is grounded
Named misconception detection with traceable evidence. Wrong-answer patterns are mapped to misconceptions from the literature, citing the problems where they fired. Educators see a specific cognitive error, not a percentage.
Strategy-switching on reset is positive evidence of mastery. A learner who tries one approach, gaps, resets, and tries another successfully is demonstrating self-correction — one of the strongest mastery signals research has (Rittle-Johnson 2017, Siegler's overlapping-waves theory).
Community contributions: AI-vetted, human-approved
The library of activities grows the way good teaching practice has always grown — through the contributions of many practitioners. Anyone can submit a new activity for any standard via the Contributepage, or directly from a learner's plan via the "Suggest an activity for this standard" link next to each gap.
Every submission goes through a two-stage review: an AI reviewer (Claude Opus 4.7) first, then a human reviewer. The AI never approves directly — it only passes, flags, or rejects. Final approval is always human. Both sets of criteria are documented below.
Stage 1 — AI vetting criteria
The AI applies the criteria in order. Each criterion has an ID; when a submission is flagged or rejected, the specific IDs are cited so the contributor knows exactly what to address.
Section 1 — Completeness (must pass all)
1.1 Title is a specific name, not a generic phrase. ✗ "Math activity" ✓ "Build-a-fraction interactive — PhET"
1.2 Description explains what the learner does (action + concept), not just what they learn.
1.3 Modality matches the description.
1.4 At least one CCSS-M standard is selected, plausibly related to the description.
Section 2 — Pedagogical fit (project hard rules; reject if violated)
2.1 NOT a learner-facing chatbot or AI tutor.
2.2 NOT primarily gamified with tokens, coins, streaks, or leaderboards.
2.3 Grade band fits 3rd–4th grade (or a Coherence Map prerequisite like 2.G.A.3).
2.4 Activity actually teaches the standards selected, not adjacent ones.
Section 3 — Source quality (borderline if any concern)
3.1 URL (if provided) is from a recognizable educational source OR is specific enough to verify.
3.2 Source/vendor name matches the URL's domain.
3.3 For physical materials, brand or vendor identifiable.
3.4 No obvious blocklist domains (gambling, ads, content farms).
Section 4 — Safety + appropriateness (reject if violated)
4.1 Description is on-topic for math education.
4.2 No promotional/advertorial language.
4.3 No personally identifying info about specific children.
4.4 Language appropriate for an educational context.
Section 5 — Non-duplication (best-effort, flag for human)
5.1 If submission appears identical to a known curated resource, flag.
Section 6 — Instructions to the AI itself
6.1 Never approve. Only humans approve.
6.2 Never reject for stylistic preferences.
6.3 Never reject "different from typical" approaches that meet pedagogical fit. Distinctive approaches are valuable.
6.4 When uncertain, prefer "borderline" over "rejected."
6.5 Always cite the specific criterion ID(s) violated.
6.6 Respond only with valid JSON.
Stage 2 — Human review criteria
The human reviewer applies all of the AI criteria above plus the following judgments, which require human discernment:
Human-only judgments
H1 Does the activity exemplify quality teaching practice — does it model the kind of learning we want children to have?
H2 Is the activity additive to the existing library, or is it materially redundant with what we already have?
H3 If a URL is provided, the human verifies it actually points to the activity described (the AI cannot fetch URLs).
H4 For physical materials, the human verifies the material is purchasable/findable.
H5 Does the description set realistic expectations? (Misleading promises about what a learner will achieve are rejected.)
H6 Final pedagogical judgment: does this belong in a Strata Mundo learner's plan? When the human says yes, the activity is approved.
The criteria are versioned with the codebase and revised as we learn what works. The current source of truth lives in lib/ai-vet-activity.ts.
What we deliberately don't do
No learner-facing chatbot. All learner-facing interactions are structured: forms, problems, visual feedback. The LLM does cognitive work behind the scenes, never as a chat with a child.
No percentage scores. Categorical states only. Percentages collapse different mastery realities (fluent guessing, slow reasoning, partial understanding) into one number that hides the diagnosis.
No gamification. No XP, streaks, leaderboards, badges, or extrinsic rewards. Mastery-based settings reject these mechanics; we honor that.
No gated progression. The system suggests; the human decides. Mastery is declared by a human reviewer, supported by the evidence we surface.
No selling of learner data. Ever.
What v1 doesn't yet do
v1 only renders build_fraction problems. Problem types for number-line placement, comparison, identification, and partitioning are in the bank but not yet rendered. Each focused probe currently varies in surface features (denominators, palettes, magnitudes) but not across representation types. That arrives in v1.1.
v1 covers fractions only. v1.5 extends to all of 4th-grade math (Operations, Place Value, Measurement, Geometry).
Multi-curriculum resource picker (Beast Academy / Saxon / Singapore / Math-U-See / Montessori) is post-v2.
Privacy and data
Learner data lives in a Supabase Postgres database with row-level security.
Email is collected at setup so a learner can return to the same voyage by clicking a link in their inbox. No password.
Email is also used by contributors when proposing an activity (so the human reviewer can follow up if needed).
Telemetry events (drags, commits, etc.) are stored alongside the assessment row. No third-party analytics.
The Plan Architect agent runs on Anthropic's Managed Agents infrastructure. No learner names or PII are sent to the agent — only mastery-map states + standard codes.
License
Code: MIT licensed.
Illustrative Mathematics K-5 Section structure: CC BY 4.0. We use IM section names verbatim with attribution.
PhET Interactive Simulations referenced as a resource: CC BY 4.0. Attribution: "PhET Interactive Simulations, University of Colorado Boulder."
CCSS-M and the Coherence Map: referenced for the standards taxonomy and prerequisite structure.