Open data

Get the raw study data.

Every number on the study page is reproducible. Enter your email and we'll send you direct links to the full dataset — 1,747 curated profiles, 70,192 adversarial profiles, and everything else.

Request data access

We'll email you the download links. No spam, no newsletter — just the data.

What's in the dataset

Two datasets produced by the autoresearch validation pipeline, plus manifest and checksums. JSON and gzipped JSON, all licensed CC BY-SA 4.0. Use them however you want.

Curated Profiles

1,747 hand-labeled developmental profiles plus promoted high-confidence adversarial profiles. Each profile simulates a child's milestone responses across 8 domains (1–24 months) with ground-truth labels and the engine's classification.

JSON 1,747 profiles ~8 MB

Adversarial Profiles

70,192 profiles specifically designed to break the engine: regression patterns, sparse milestone data, contradictory answers, preterm edge cases. Includes per-profile disagreement analysis and failure mode classification. Gzip-compressed (~3.7 MB, ~124 MB uncompressed).

JSON.GZ 70,192 profiles ~3.7 MB compressed

Data Manifest

Package manifest with dataset descriptions, file sizes, SHA-256 checksums for verification, and data provenance. Available as PDF.

PDF 17 KB

Complete Package (ZIP)

Everything in one download — curated profiles, adversarial profiles, manifest PDF, and SHA-256 checksums. Easiest way to get all the data at once.

ZIP ~3.7 MB

How the data was generated

A plain-language explanation of the data pipeline.

Synthetic profile generation

An AI system generates developmental profiles that simulate realistic milestone response patterns. Each profile represents a hypothetical child at a specific age (1–24 months) with answers across 129 evidence-weighted questions spanning 8 developmental domains. Profiles include both "on track" children and children with simulated delays of varying severity.

Ground-truth labeling

Each profile is independently evaluated by an AI clinical evaluator that assigns a ground-truth label: "no concern," "monitor," or "refer." This evaluator uses the same developmental milestone guidelines (CDC 2022, WHO) but applies them through a different reasoning pathway than the engine. It's not a gold standard — a licensed clinician would be — but it provides a consistent, reproducible benchmark.

Engine classification

The MyChild engine processes each profile through its rule-based scoring system and outputs its own classification. The engine's thresholds are deterministic and inspectable — every score is traceable to specific question weights and domain rules.

Comparison and analysis

The engine's classifications are compared against the ground-truth labels to compute Cohen's kappa, sensitivity, specificity, and other metrics. Disagreements are analyzed to identify failure modes: under-flagging (engine missed a concern), over-flagging (false alarm), and threshold boundary cases.

Adversarial stress testing

The autoresearch system generates increasingly difficult edge cases: children who regress after meeting milestones, profiles with sparse or contradictory data, preterm-adjusted ages, and boundary conditions where the correct classification is genuinely ambiguous. This is where most of the 781 disagreements come from.

Data format

Each synthetic profile is a JSON object with this structure:

profile-sample.json

{
  "id": "syn-00042",
  "age_months": 14,
  "preterm_weeks": 0,
  "domains": {
    "gross_motor":     { "score": 0.82, "flags": [] },
    "fine_motor":      { "score": 0.65, "flags": ["pincer_grasp_absent"] },
    "communication":   { "score": 0.91, "flags": [] },
    "cognitive":       { "score": 0.78, "flags": [] },
    "social_emotional": { "score": 0.88, "flags": [] },
    "adaptive":        { "score": 0.73, "flags": [] },
    "sensory":         { "score": 0.95, "flags": [] },
    "feeding":         { "score": 0.80, "flags": [] }
  },
  "engine_result": "monitor",
  "ground_truth": "monitor",
  "agreement": true,
  "generation_type": "adversarial"
}

engine_result — What the MyChild engine classified: "no_concern", "monitor", or "refer"

ground_truth — What the AI clinical evaluator assigned as the expected label

agreement — Whether engine_result matches ground_truth

generation_type — Either "hand_verified" (Phase 1) or "adversarial" (Phase 2)

domains.*.flags — Specific developmental red flags detected in that domain

License & attribution

Data: CC BY-SA 4.0. Use it for research, products, publications — just attribute and share derivatives under the same license.

Engine: Apache-2.0. Use it commercially, modify it, distribute it. The engine code is in a separate repository.

Citation: If you use this data in academic work, please cite as: Songra, H. & Ansari, A. (2026). "MyChild Engine Validation Study: Synthetic Developmental Screening Data." Available at mychild.app/study-data.

Read the study summary Browse on GitHub