SDTM Programming in R using {sdtm.oak}

EDC-agnostic SDTM dataset creation with reusable algorithms

CDISC
SDTM
Intermediate
Author

Rammprasad Ganapathy (Principal Data Scientist)

Overview

Intermediate SDTM CDISC

{sdtm.oak} is an EDC and data standard-agnostic solution designed to empower pharmaceutical programmers to develop CDISC SDTM datasets using R. The package offers a modular programming framework with reusable algorithms that can potentially automate SDTM creation based on standard specifications.

What You’ll Learn

  • 🌳 {sdtm.oak} fundamentals - V0.1 package overview
  • 🔧 Modular programming - Reusable algorithm approach
  • 📊 SDTM creation - From raw data to standard domains
  • 📝 EDC-agnostic design - Works with any data source
  • 🎯 Automation potential - Standards-based generation

Prerequisites

Required Knowledge:

  • Intermediate R programming
  • CDISC SDTM standards familiarity
  • Clinical trial data structure understanding

Helpful:

  • Experience with SDTM programming
  • Knowledge of EDC systems

Key Package

{sdtm.oak}

Pharmaverse

Workshop Materials

NoteResources

Workshop Slides: pharmaverse.github.io/rinpharma-SDTM-workshop

R Environment: Provided for workshop participants (no pre-installation needed)

Package Philosophy

EDC-Agnostic Approach

Works with data from any source:

  • ✅ Medidata Rave
  • ✅ Oracle Clinical
  • ✅ Veeva Vault
  • ✅ Custom EDC systems
  • ✅ CSV files

Reusable Algorithms

Instead of custom code for each study:

  • Define once, use many times
  • Standards-based transformations
  • Metadata-driven approach

Modular Framework

Break SDTM creation into components:

  • Data reading
  • Domain mapping
  • Variable derivation
  • Standardization

Key Features

1. Domain Creation

library(sdtm.oak)

# Create Demographics (DM) domain
dm <- create_dm(
  raw_data = raw_subjects,
  spec = dm_specification
)

2. Variable Mapping

Automatic mapping from raw to SDTM:

  • USUBJID derivation
  • Date standardization
  • Controlled terminology
  • Unit conversions

3. Validation

Built-in checks for:

  • Required variables
  • Data types
  • Value constraints
  • SDTM conformance

4. Metadata-Driven

Use SDTM specifications as input:

  • Excel specification sheets
  • Standard templates
  • Reusable across studies

Workshop Content

Module 1: Package Introduction

  • {sdtm.oak} architecture
  • Installation and setup
  • Key functions overview
  • Integration with pharmaverse

Module 2: Simple Domain Creation

Hands-on: Demographics (DM)

Create DM domain from raw subject data:

  • Subject identifiers
  • Demographics variables
  • Dates formatting
  • Controlled terminology

Module 3: Complex Domains

Hands-on: Adverse Events (AE)

  • Multiple records per subject
  • Start/end dates
  • Severity grading
  • Relationship coding

Module 4: Automation

  • Using specifications to drive creation
  • Batch processing multiple domains
  • Quality checks
  • Documentation generation

Practical Example

library(sdtm.oak)
library(dplyr)

# Load raw data
raw_ae <- read_raw_data("adverse_events.csv")

# Define SDTM specification
ae_spec <- read_specification("SDTM_AE_spec.xlsx")

# Create AE domain
ae <- raw_ae %>%
  create_sdtm_domain(
    domain = "AE",
    spec = ae_spec,
    mappings = list(
      USUBJID = derive_usubjid(SUBJID, SITEID),
      AESTDTC = format_date(AE_START_DATE),
      AEENDTC = format_date(AE_END_DATE),
      AEDECOD = standardize_term(AE_TERM, dictionary = "MedDRA"),
      AESEV = map_severity(AE_SEVERITY)
    )
  ) %>%
  validate_domain("AE")

# Export to XPT
write_xpt(ae, "ae.xpt")

Benefits

For Programmers

  • Less custom coding - Reuse algorithms
  • Faster development - Metadata-driven
  • Fewer errors - Automated validation
  • Better documentation - Standards-based

For Organizations

  • Consistency across studies
  • Efficiency gains
  • Quality improvements
  • Compliance with CDISC

For Industry

  • Open-source collaboration
  • Shared algorithms
  • Best practices dissemination

Learning Outcomes

✅ Understand {sdtm.oak} architecture
✅ Create simple SDTM domains
✅ Handle complex domain mappings
✅ Use specifications to drive automation
✅ Validate SDTM datasets
✅ Integrate with pharmaverse ecosystem

Future Roadmap

Planned Features:

  • More domain templates
  • Enhanced automation
  • Integration with {admiral}
  • Specification validator
  • Machine learning-assisted mapping

Getting Help

  • GitHub Issues: Report bugs and requests
  • Pharmaverse Slack: #sdtm-oak channel
  • Documentation: Package website
  • Examples: Vignettes and demos

Similar Workshops

Next Steps


Last updated: November 2025 | R/Pharma 2025 Conference