Synthetic Sickle Cell Anaemia data — scd

This data is a transformed version of the SCD data from the paper by Al-Dhamari et al. Synthetic datasets for open software development in rare disease research. Orphanet J Rare Dis 19, 265 (2024).We have retained a subset of the data columns that are relevant to our model and transformed the data into a representative cohort by retaining an expected prevalence of SCD (0.3%), with the rest converted to non-SCD patients by distributing the biomarker values around a healthy value. These columns are described below.

Usage

scd_cohort

Format

`scd_cohort`

A data frame with 100,403 rows and 9 columns:

age: Patient Age
sex: Patient gender assuming only Male and Female genders
race: Patient race. One of "Others", "African-American", "European-American"
birthDate: Patient birth date
diagDate: Patient diagnosis date
CBC: Complete Blood Count biomarker test in g/dL
RC: Reticulocytes Count biomarker test in % Reticulocytes
highrisk: Flag for high risk ethnicity
SCD: Flag indicating SCD observations to test model performance

Source

Al-Dhamari (2024) doi:10.1186/s13023-024-03254-2.