Get the raw study data.
Every number on the study page is reproducible. Enter your email and we'll send you direct links to the full dataset — 1,747 curated profiles, 70,192 adversarial profiles, and everything else.
Request data access
We'll email you the download links. No spam, no newsletter — just the data.
Check your inbox
We've sent download links to . Check spam if you don't see it within a few minutes.
Download files directly:
Something went wrong
Please try again or grab the data directly from GitHub.
What's in the dataset
Two datasets produced by the autoresearch validation pipeline, plus manifest and checksums. JSON and gzipped JSON, all licensed CC BY-SA 4.0. Use them however you want.
Curated Profiles
1,747 hand-labeled developmental profiles plus promoted high-confidence adversarial profiles. Each profile simulates a child's milestone responses across 8 domains (1–24 months) with ground-truth labels and the engine's classification.
Adversarial Profiles
70,192 profiles specifically designed to break the engine: regression patterns, sparse milestone data, contradictory answers, preterm edge cases. Includes per-profile disagreement analysis and failure mode classification. Gzip-compressed (~3.7 MB, ~124 MB uncompressed).
Data Manifest
Package manifest with dataset descriptions, file sizes, SHA-256 checksums for verification, and data provenance. Available as PDF.
Complete Package (ZIP)
Everything in one download — curated profiles, adversarial profiles, manifest PDF, and SHA-256 checksums. Easiest way to get all the data at once.
How the data was generated
A plain-language explanation of the data pipeline.
Synthetic profile generation
An AI system generates developmental profiles that simulate realistic milestone response patterns. Each profile represents a hypothetical child at a specific age (1–24 months) with answers across 129 evidence-weighted questions spanning 8 developmental domains. Profiles include both "on track" children and children with simulated delays of varying severity.
Ground-truth labeling
Each profile is independently evaluated by an AI clinical evaluator that assigns a ground-truth label: "no concern," "monitor," or "refer." This evaluator uses the same developmental milestone guidelines (CDC 2022, WHO) but applies them through a different reasoning pathway than the engine. It's not a gold standard — a licensed clinician would be — but it provides a consistent, reproducible benchmark.
Engine classification
The MyChild engine processes each profile through its rule-based scoring system and outputs its own classification. The engine's thresholds are deterministic and inspectable — every score is traceable to specific question weights and domain rules.
Comparison and analysis
The engine's classifications are compared against the ground-truth labels to compute Cohen's kappa, sensitivity, specificity, and other metrics. Disagreements are analyzed to identify failure modes: under-flagging (engine missed a concern), over-flagging (false alarm), and threshold boundary cases.
Adversarial stress testing
The autoresearch system generates increasingly difficult edge cases: children who regress after meeting milestones, profiles with sparse or contradictory data, preterm-adjusted ages, and boundary conditions where the correct classification is genuinely ambiguous. This is where most of the 781 disagreements come from.
Data format
Each synthetic profile is a JSON object with this structure:
{
"id": "syn-00042",
"age_months": 14,
"preterm_weeks": 0,
"domains": {
"gross_motor": { "score": 0.82, "flags": [] },
"fine_motor": { "score": 0.65, "flags": ["pincer_grasp_absent"] },
"communication": { "score": 0.91, "flags": [] },
"cognitive": { "score": 0.78, "flags": [] },
"social_emotional": { "score": 0.88, "flags": [] },
"adaptive": { "score": 0.73, "flags": [] },
"sensory": { "score": 0.95, "flags": [] },
"feeding": { "score": 0.80, "flags": [] }
},
"engine_result": "monitor",
"ground_truth": "monitor",
"agreement": true,
"generation_type": "adversarial"
} engine_result — What the MyChild engine classified: "no_concern", "monitor", or "refer"
ground_truth — What the AI clinical evaluator assigned as the expected label
agreement — Whether engine_result matches ground_truth
generation_type — Either "hand_verified" (Phase 1) or "adversarial" (Phase 2)
domains.*.flags — Specific developmental red flags detected in that domain
License & attribution
Data: CC BY-SA 4.0. Use it for research, products, publications — just attribute and share derivatives under the same license.
Engine: Apache-2.0. Use it commercially, modify it, distribute it. The engine code is in a separate repository.
Citation: If you use this data in academic work, please cite as: Songra, H. & Ansari, A. (2026). "MyChild Engine Validation Study: Synthetic Developmental Screening Data." Available at mychild.app/study-data.