Trilingual R, Python, and Stata library for downloading UNICEF child welfare indicators via SDMX API
The unicefData package provides lightweight, consistent interfaces to the UNICEF SDMX Data Warehouse in R, Python, and Stata. Fetch any indicator series by specifying its SDMX key, date range, and optional filters.
unicefData-dev/
├── r/ # R package (CRAN) - branch here for devtools::release()
│ ├── R/ # 16 source files
│ ├── tests/ # R package tests
│ ├── NEWS.md # R-specific changelog
│ ├── DESCRIPTION # Package metadata
│ └── ...other R package files
├── python/ # Python package (PyPI)
│ ├── CHANGELOG.md # Python-specific changelog
│ └── ...Python package files
├── stata/ # Stata package (SSC)
│ ├── CHANGELOG.md # Stata-specific changelog
│ └── ...Stata package files
├── paper/ # Academic documentation (LaTeX)
├── tests/ # Cross-language validation tests
├── metadata/ # Shared YAML/CSV metadata
├── docs/ # Technical documentation
├── README.md # This file (repository overview)
├── CHANGELOG.md # Multi-language changelog (overview)
└── CONTRIBUTING.md # Developer guidelines
| Platform | README | Changelog | Version |
|---|---|---|---|
| R | r/README.md | r/NEWS.md | 2.3.0 (CRAN) |
| Python | python/README.md | python/CHANGELOG.md | 2.1.0 |
| Stata | stata/README.md | stata/CHANGELOG.md | 2.3.0 |
| Document | Purpose |
|---|---|
| CHANGELOG.md | Multi-language changelog overview |
| CONTRIBUTING.md | How to contribute (all languages) |
| docs/ | Technical documentation index |
All three platforms use the same functions with nearly identical parameters.
from unicef_api import unicefData, search_indicators, list_categories
# Search for indicators
search_indicators("mortality")
list_categories()
# Fetch data (dataflow auto-detected)
df = unicefData(
indicator="CME_MRY0T4",
countries=["ALB", "USA", "BRA"],
year="2015:2023"
)library(unicefData)
# Search for indicators
search_indicators("mortality")
list_categories()
# Fetch data (dataflow auto-detected)
df <- unicefData(
indicator = "CME_MRY0T4",
countries = c("ALB", "USA", "BRA"),
year = "2015:2023"
)* Search for indicators
unicefdata, search(mortality)
unicefdata, flows
* Fetch data (dataflow auto-detected)
unicefdata, indicator(CME_MRY0T4) countries(ALB USA BRA) year(2015:2023) cleardevtools::install_github("unicef-drp/unicefData")
library(unicefData)git clone https://github.com/unicef-drp/unicefData.git
cd unicefData/python
pip install -e .* Using github package (recommended)
net install github, from("https://haghish.github.io/github/")
github install unicef-drp/unicefData, package(stata)See platform-specific READMEs for detailed installation options.
| Feature | R | Python | Stata |
|---|---|---|---|
| Unified API | unicefData() |
unicefData() |
unicefdata |
| Search indicators | search_indicators() |
search_indicators() |
unicefdata, search() |
| List categories | list_categories() |
list_categories() |
unicefdata, categories |
| Auto dataflow detection | ✅ | ✅ | ✅ |
| Filter by country, year, sex | ✅ | ✅ | ✅ |
| Wide/long formats | ✅ | ✅ | ✅ |
| Latest value per country | ✅ | ✅ | ✅ |
| MRV (most recent values) | ✅ | ✅ | ✅ |
| Circa (nearest year) | ✅ | ✅ | ✅ |
| Add metadata (region, income) | ✅ | ✅ | 🔜 |
| 700+ indicators | ✅ | ✅ | ✅ |
| Automatic retries | ✅ | ✅ | ✅ |
| Cache management | clear_unicef_cache() |
clear_cache() |
unicefdata, clearcache |
| Timeout exceptions | ✅ | SDMXTimeoutError |
✅ |
| Parameter | Type | Description |
|---|---|---|
indicator |
string/vector | Indicator code(s), e.g., "CME_MRY0T4" |
countries |
vector | ISO3 codes, e.g., ["ALB", "USA"] |
year |
int/string | Single (2020), range ("2015:2023"), or list |
sex |
string | "_T" (total), "F", "M", or "ALL" |
format |
string | "long", "wide", or "wide_indicators" |
latest |
boolean | Keep only most recent value per country |
mrv |
integer | Keep N most recent values per country |
circa |
boolean | Find closest available year |
See platform READMEs for complete parameter documentation.
| Category | Count | Description |
|---|---|---|
| NUTRITION | 112 | Stunting, wasting, etc. |
| CAUSE_OF_DEATH | 83 | Causes of death |
| CHILD_RELATED_SDG | 77 | SDG targets |
| WASH_HOUSEHOLDS | 57 | Water & Sanitation |
| PT | 43 | Child Protection |
| CHLD_PVTY | 43 | Child Poverty |
| CME | 39 | Child Mortality |
| EDUCATION | 38 | Education |
| HIV_AIDS | 38 | HIV/AIDS |
| MNCH | 38 | Maternal & Child Health |
| IMMUNISATION | 18 | Immunization |
Use list_categories() for the complete list (733 indicators across 22 categories).
| Indicator | Dataflow | Description |
|---|---|---|
CME_MRY0T4 |
CME | Under-5 mortality rate |
CME_MRM0 |
CME | Neonatal mortality rate |
NT_ANT_HAZ_NE2_MOD |
NUTRITION | Stunting prevalence |
IM_DTP3 |
IMMUNISATION | DTP3 coverage |
IM_MCV1 |
IMMUNISATION | Measles coverage |
WS_PPL_W-SM |
WASH | Safely managed water |
PT_CHLD_Y0T4_REG |
PT | Birth registration |
unicefData/
├── R/ # R package source
├── python/ # Python package source
├── stata/ # Stata package source
│ └── qa/ # Stata QA test suite (63 tests)
├── tests/
│ ├── fixtures.zip # Authoritative test fixtures (single source)
│ ├── fixtures/ # Unpacked fixtures (auto-extracted)
│ └── testthat/ # R unit tests (105 tests)
├── scripts/
│ ├── generate_fixtures.py # Download + pack fixtures from API
│ └── unpack_fixtures.py # Extract ZIP to all platform dirs
├── .githooks/ # Auto-unpack on clone/pull
├── validation/ # Cross-platform validation
├── internal/docs/ # Dev-only documentation
│ ├── TEST_REFERENCE.md # Full cross-platform test map
│ ├── QA_SETUP.md # Environment setup guide
│ └── FIXTURE_INFRASTRUCTURE.md # ZIP fixture system
├── DESCRIPTION # R package metadata
├── NEWS.md # Changelog
└── README.md # This file
The R package is in the r/ directory and ready for CRAN submission:
# Navigate to R package
setwd("C:/GitHub/myados/unicefData-dev/r")
# Final validation
devtools::check(args = "--as-cran")
# Expected: 0 errors | 0 warnings | 2 notes (acceptable for new submission)
# Test on R-hub builders (Windows/macOS/Linux)
devtools::check_win_devel()
devtools::check_win_release()
# Build package
devtools::build()
# Submit to CRAN (interactive)
devtools::release()See r/cran-comments.md for current submission status and reviewer responses.
unicefData/
├── R/ # R package source
├── python/ # Python package source
├── stata/ # Stata package source
│ └── qa/ # Stata QA test suite (63 tests)
├── tests/
│ ├── fixtures.zip # Authoritative test fixtures (single source)
│ ├── fixtures/ # Unpacked fixtures (auto-extracted)
│ └── testthat/ # R unit tests (105 tests)
├── scripts/
│ ├── generate_fixtures.py # Download + pack fixtures from API
│ └── unpack_fixtures.py # Extract ZIP to all platform dirs
├── .githooks/ # Auto-unpack on clone/pull
├── validation/ # Cross-platform validation
├── internal/docs/ # Dev-only documentation
│ ├── TEST_REFERENCE.md # Full cross-platform test map
│ ├── QA_SETUP.md # Environment setup guide
│ └── FIXTURE_INFRASTRUCTURE.md # ZIP fixture system
├── DESCRIPTION # R package metadata
├── NEWS.md # Changelog
└── README.md # This file
- Frame-based discovery caching: indicator search parsed once per session, subsequent
search()calls near-instantaneous (Stata 16+) - New
nocacheoption and automatic cache invalidation onunicefdata_sync - Archived vestigial wbopendata helper programs
- Bug fixes from v2.2.1 code review (4 high-priority fixes, 7 cleanup items)
Cross-Platform Testing Infrastructure
- 328+ automated tests across 11 test families (unit, integration, deterministic, sync-pipeline, discovery, error-conditions, transformations, cross-language, API-mock, regression, smoke)
- Deterministic fixture system: Single
tests/fixtures.zipsource with automated extraction via git hooks - Full CI matrix: R (devel/release/oldrel × Ubuntu/macOS/Windows), Python 3.9-3.11, YAML validation
- R testthat suite: 5 new test files (transformations, deterministic, discovery, sync-pipeline, error-conditions)
- Bug fixes: Category resolution fallback in R, indicator validation in Python, retry logic improvements
- Documentation: Cross-platform testing framework paper, test audit, versioning policy
Cross-Language Quality & Testing
- Cache management APIs:
clear_cache()(Python),clear_unicef_cache()(R),clearcache(Stata) - Error handling improvements: Configurable timeouts with
SDMXTimeoutError(Python), fixedapply_circa()NA handling (R) - Portability: Removed all hardcoded paths; R uses
system.file(), Stata uses 3-tier resolution - Error context: All 404 errors now show which dataflows were tried
- Cross-language test suite: 39 shared fixture tests (Python 14, R 13, Stata 12)
- YAML schema documentation: Comprehensive format reference for all 7 YAML file types
Major Quality Milestone
- SYNC-02 fix: Resolved critical metadata enrichment bug
- 100% test coverage: R (26), Python (28), Stata (38/38)
- Cross-platform parity: All platforms aligned
- Fixed 404 fallback behavior
- Added dynamic User-Agent strings
- Added comprehensive test coverage
See NEWS.md for complete changelog.
The package automatically downloads and caches indicator metadata on first use. Cache refreshes every 30 days.
Stata:
unicefdata_refresh_all, verbosePython:
from unicef_api import refresh_indicator_cache
refresh_indicator_cache()R:
refresh_indicator_cache()Python:
from unicefdata import clear_cache
clear_cache() # Clears all 5 cache layers, reloads YAMLR:
clear_unicef_cache() # Clears all 6 cache layers, reloads YAMLStata:
unicefdata, clearcache# Sync metadata across all platforms
.\scripts\sync_metadata_cross_language.ps1See docs/METADATA_GENERATION_GUIDE.md for detailed metadata sync documentation.
443 automated tests across all three platforms (63 Stata, 160 Python, 220 R).
All tests run offline using frozen fixtures from tests/fixtures.zip.
Full suite executes in under 14 minutes (12m 13s Stata, 34s Python, 7s R).
Python (160 tests):
cd python && pytest tests/ -vR (220 expectations):
testthat::test_dir("tests/testthat/")Stata (63 tests):
cd stata/qa
do run_tests.doTests are organized into 16 families aligned across platforms:
| Family | Stata | Python | R | Description |
|---|---|---|---|---|
| DET | 11 | 37 | 32 | Deterministic / offline (frozen CSV) |
| SYNC | 4 | 12 | 12 | Metadata sync (XML → YAML) |
| DISC | 3 | 24 | 24 | Discovery (YAML → output) |
| DL | 9 | 15 | 8 | Download / API fetch |
| ERR | 8 | 18 | 6 | Error handling / input validation |
| TRANS | 2 | 23 | 14 | Transformations (wide, latest, MRV) |
| REGR | 1 | 4 | 2 | Regression baselines (value pinning) |
| Other | 25 | 27 | 122 | DATA, TIER, META, MULTI, EDGE, EXT, PERF, ENV, XPLAT |
See internal/docs/TEST_REFERENCE.md for the complete cross-platform test map.
Test fixtures are stored in a single ZIP file (tests/fixtures.zip) and
auto-extracted by git hooks on clone and pull:
git config core.hooksPath .githooks # one-time setup
python scripts/unpack_fixtures.py # manual alternativeSee internal/docs/FIXTURE_INFRASTRUCTURE.md for the full extraction map and regeneration workflow.
cd validation
python run_validation.py --limit 10 --languages python r stataSee validation/ for validation documentation, including the Quick Start, Indicator Testing Guide, and Documentation Index.
See CONTRIBUTING.md for full guidelines.
- Report bugs — Open an issue
- Request features — Suggest new indicators or functionality
- Submit code — Fork, create branch, open pull request
git clone https://github.com/unicef-drp/unicefData.git
cd unicefData
git config core.hooksPath .githooks # enable auto-unpack fixtures
# Python
cd python && pip install -e ".[dev]"
# R (in RStudio)
devtools::load_all()
# Stata
cd stata && do install_local.doSee internal/docs/QA_SETUP.md for detailed setup instructions across all three platforms.
- UNICEF Data Portal: https://data.unicef.org/
- SDMX API Docs: https://data.unicef.org/sdmx-api-documentation/
- GitHub: https://github.com/unicef-drp/unicefData
- Issues: https://github.com/unicef-drp/unicefData/issues
This trilingual package ecosystem was developed at the UNICEF Data and Analytics Section. The author gratefully acknowledges the collaboration of Lucas Rodrigues, Yang Liu, and Karen Avanesian, whose technical contributions and feedback were instrumental in the development of this comprehensive data access library.
Special thanks to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse infrastructure that makes this package possible.
The author also acknowledges the UNICEF database managers and technical teams who ensure data quality, as well as the country office staff and National Statistical Offices whose data collection efforts make this work possible.
Development of this package was supported by UNICEF institutional funding for data infrastructure and statistical capacity building. The author also acknowledges UNICEF colleagues who provided testing and feedback during development, as well as the broader open-source communities across R, Python, and Stata.
This package is provided for research and analytical purposes.
The unicefData package provides programmatic access to UNICEF's public data warehouse. While the author is affiliated with UNICEF, this package is not an official UNICEF product and the statements in this documentation are the views of the author and do not necessarily reflect the policies or views of UNICEF.
Data accessed through this package comes from the UNICEF Data Warehouse. Users should verify critical data points against official UNICEF publications at data.unicef.org.
This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or UNICEF be liable for any claim, damages or other liability arising from the use of this software.
The designations employed and the presentation of material in this package do not imply the expression of any opinion whatsoever on the part of UNICEF concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.
Important Note on Data Vintages
Official statistics are subject to revisions as new information becomes available and estimation methodologies improve. UNICEF indicators are regularly updated based on new surveys, censuses, and improved modeling techniques. Historical values may be revised retroactively to reflect better information or methodological improvements.
For reproducible research and proper data attribution, users should:
- Document the indicator code - Specify the exact SDMX indicator code(s) used (e.g.,
CME_MRY0T4) - Record the download date - Note when data was accessed (e.g., "Data downloaded: 2026-02-09")
- Cite the data source - Reference both the package and the UNICEF Data Warehouse
- Archive your dataset - Save a copy of the exact data used in your analysis
Example citations for data used in research:
- R:
Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData R package (v2.2.0) on 2026-02-15. Data available at: https://sdmx.data.unicef.org/ - Python:
Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData Python package (v2.2.0) on 2026-02-15. Data available at: https://sdmx.data.unicef.org/ - Stata:
Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData Stata package (v2.3.0) on 2026-02-18. Data available at: https://sdmx.data.unicef.org/
This practice ensures that others can verify your results and understand any differences that may arise from data updates. For official UNICEF statistics in publications, always cross-reference with the current version at data.unicef.org.
If you use this package in published work, please cite:
Azevedo, J.P. (2026). "unicefdata: Unified access to UNICEF indicators across R, Python, and Stata." Working paper. URL: https://github.com/unicef-drp/unicefData
@article{azevedo2026unicefdata,
title = {unicefdata: Unified access to {UNICEF} indicators across {R}, {Python}, and {Stata}},
author = {Azevedo, Joao Pedro},
year = {2026},
note = {Working paper},
url = {https://github.com/unicef-drp/unicefData}
}Development assisted by AI coding tools (GitHub Copilot, Claude). All code reviewed and validated by maintainers.
Joao Pedro Azevedo (@jpazvd) Chief Statistician, UNICEF Data and Analytics Section
MIT License — See LICENSE