SDTM Programming in R using {sdtm.oak}
EDC-agnostic SDTM dataset creation with reusable algorithms
Overview
Intermediate SDTM CDISC
{sdtm.oak} is an EDC and data standard-agnostic solution designed to empower pharmaceutical programmers to develop CDISC SDTM datasets using R. The package offers a modular programming framework with reusable algorithms that can potentially automate SDTM creation based on standard specifications.
What You’ll Learn
- 🌳 {sdtm.oak} fundamentals - V0.1 package overview
- 🔧 Modular programming - Reusable algorithm approach
- 📊 SDTM creation - From raw data to standard domains
- 📝 EDC-agnostic design - Works with any data source
- 🎯 Automation potential - Standards-based generation
Prerequisites
Required Knowledge:
- Intermediate R programming
- CDISC SDTM standards familiarity
- Clinical trial data structure understanding
Helpful:
- Experience with SDTM programming
- Knowledge of EDC systems
Key Package
{sdtm.oak}
Pharmaverse
Workshop Materials
Workshop Slides: pharmaverse.github.io/rinpharma-SDTM-workshop
R Environment: Provided for workshop participants (no pre-installation needed)
Package Philosophy
EDC-Agnostic Approach
Works with data from any source:
- ✅ Medidata Rave
- ✅ Oracle Clinical
- ✅ Veeva Vault
- ✅ Custom EDC systems
- ✅ CSV files
Reusable Algorithms
Instead of custom code for each study:
- Define once, use many times
- Standards-based transformations
- Metadata-driven approach
Modular Framework
Break SDTM creation into components:
- Data reading
- Domain mapping
- Variable derivation
- Standardization
Key Features
1. Domain Creation
library(sdtm.oak)
# Create Demographics (DM) domain
dm <- create_dm(
raw_data = raw_subjects,
spec = dm_specification
)2. Variable Mapping
Automatic mapping from raw to SDTM:
- USUBJID derivation
- Date standardization
- Controlled terminology
- Unit conversions
3. Validation
Built-in checks for:
- Required variables
- Data types
- Value constraints
- SDTM conformance
4. Metadata-Driven
Use SDTM specifications as input:
- Excel specification sheets
- Standard templates
- Reusable across studies
Workshop Content
Module 1: Package Introduction
- {sdtm.oak} architecture
- Installation and setup
- Key functions overview
- Integration with pharmaverse
Module 2: Simple Domain Creation
Hands-on: Demographics (DM)
Create DM domain from raw subject data:
- Subject identifiers
- Demographics variables
- Dates formatting
- Controlled terminology
Module 3: Complex Domains
Hands-on: Adverse Events (AE)
- Multiple records per subject
- Start/end dates
- Severity grading
- Relationship coding
Module 4: Automation
- Using specifications to drive creation
- Batch processing multiple domains
- Quality checks
- Documentation generation
Practical Example
library(sdtm.oak)
library(dplyr)
# Load raw data
raw_ae <- read_raw_data("adverse_events.csv")
# Define SDTM specification
ae_spec <- read_specification("SDTM_AE_spec.xlsx")
# Create AE domain
ae <- raw_ae %>%
create_sdtm_domain(
domain = "AE",
spec = ae_spec,
mappings = list(
USUBJID = derive_usubjid(SUBJID, SITEID),
AESTDTC = format_date(AE_START_DATE),
AEENDTC = format_date(AE_END_DATE),
AEDECOD = standardize_term(AE_TERM, dictionary = "MedDRA"),
AESEV = map_severity(AE_SEVERITY)
)
) %>%
validate_domain("AE")
# Export to XPT
write_xpt(ae, "ae.xpt")Benefits
For Programmers
- Less custom coding - Reuse algorithms
- Faster development - Metadata-driven
- Fewer errors - Automated validation
- Better documentation - Standards-based
For Organizations
- Consistency across studies
- Efficiency gains
- Quality improvements
- Compliance with CDISC
For Industry
- Open-source collaboration
- Shared algorithms
- Best practices dissemination
Learning Outcomes
✅ Understand {sdtm.oak} architecture
✅ Create simple SDTM domains
✅ Handle complex domain mappings
✅ Use specifications to drive automation
✅ Validate SDTM datasets
✅ Integrate with pharmaverse ecosystem
Future Roadmap
Planned Features:
- More domain templates
- Enhanced automation
- Integration with {admiral}
- Specification validator
- Machine learning-assisted mapping
Getting Help
- GitHub Issues: Report bugs and requests
- Pharmaverse Slack: #sdtm-oak channel
- Documentation: Package website
- Examples: Vignettes and demos
Similar Workshops
- datasetjson - Modern CDISC formats
- Building R Packages - Package structure
Next Steps
- After SDTM: Learn Cardinal for TFLs
Last updated: November 2025 | R/Pharma 2025 Conference