Python for Clinical Study Report and Submission

Modern Python toolchain for TFLs and eCTD packages

Python
Clinical Reporting
Intermediate
Authors

Nan Xiao (Statistician, Merck)

Yilong Zhang (Biostatistician, Meta)

Overview

Intermediate Python Clinical Reporting

Open-source Python offers powerful capabilities for clinical trial analysis and reporting. This workshop introduces practical strategies for preparing tables, listings, and figures (TLFs) in a Clinical Study Report (CSR) and assembling submission-ready eCTD packages.

What You’ll Learn

  • 🐍 Python environment setup with uv
  • πŸ“Š Clinical data engineering with polars
  • πŸ“ˆ TLF creation with plotnine and rtflite
  • πŸ“¦ eCTD packages with py-pkglite
  • πŸ”„ Reproducible workflows end-to-end

Prerequisites

Required Knowledge:

  • Basic Python programming
  • Understanding of clinical trial analysis
  • Familiarity with TFLs

Helpful:

  • R experience (for comparison)
  • CDISC standards knowledge

Key Tools

Python

uv

polars

plotnine

rtflite

py-pkglite

Workshop Materials

NoteResources

Workshop Slides: pycsr.org/slides/workshop-slides.html

Online Book: Python for Clinical Study Reports and Submission - pycsr.org

GitHub: github.com/nanxstats/pycsr

Development: GitHub Codespaces, VS Code, or Positron

Workshop Modules

Module 1: Python Environment Setup

Using uv for project management:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create project
uv init my-clinical-trial
cd my-clinical-trial

# Add dependencies
uv add polars plotnine rtflite

# Run scripts
uv run analysis.py

Benefits:

  • βœ… Fast dependency resolution
  • βœ… Reproducible environments
  • βœ… No conda/virtualenv complexity
  • βœ… Lock files for exact versions

Module 2: Clinical Reporting Packages

polars - High-Performance DataFrames

import polars as pl

# Read CDISC data
adsl = pl.read_parquet("data/adsl.parquet")

# Fast data manipulation
demographics = (
    adsl
    .filter(pl.col("SAFFL") == "Y")
    .group_by("ARM")
    .agg([
        pl.count().alias("N"),
        pl.col("AGE").mean().alias("Age_Mean"),
        pl.col("AGE").std().alias("Age_SD")
    ])
)

Why polars?

  • 10-100x faster than pandas
  • Lazy evaluation
  • Better memory efficiency
  • Expressive API

plotnine - Grammar of Graphics

from plotnine import *

# Kaplan-Meier plot
(
    ggplot(survival_data, aes(x="time", y="survival", color="arm")) +
    geom_step(size=1) +
    geom_ribbon(aes(ymin="lower", ymax="upper", fill="arm"), alpha=0.2) +
    labs(title="Overall Survival", x="Time (months)", y="Probability") +
    theme_minimal()
)

ggplot2 equivalent for Python!

rtflite - RTF Generation

from rtflite import *

# Create RTF document
doc = RtfDocument()

# Add table
table = create_table(demographics_df)
doc.add_table(table)

# Save
doc.save("demographics.rtf")

Module 3: Complete Project Management

Project Structure:

my-trial/
β”œβ”€β”€ data/           # CDISC datasets
β”œβ”€β”€ src/            # Python scripts
β”‚   β”œβ”€β”€ tables/
β”‚   β”œβ”€β”€ listings/
β”‚   └── figures/
β”œβ”€β”€ outputs/        # Generated TFLs
β”œβ”€β”€ pyproject.toml  # Dependencies
└── README.md

Execution:

# main.py
from src.tables import demographics, adverse_events
from src.figures import km_plot

# Generate all outputs
demographics.create()
adverse_events.create()
km_plot.create()

Module 4: eCTD Submission Packages

py-pkglite for packaging:

from pkglite import *

# Create submission package
pkg = Package()
pkg.add_directory("src/", pattern="*.py")
pkg.add_directory("outputs/", pattern="*.rtf")
pkg.pack("submission.txt")

# Includes source code + outputs
# Aligned with eCTD requirements

Practical Exercises

Exercise 1: Demographics Table

Create standard demographics table:

  • Age (mean, SD)
  • Sex (n, %)
  • Race (n, %)
  • By treatment arm

Exercise 2: Adverse Events Listing

Generate AE listing with:

  • Subject ID
  • AE term
  • Start/end dates
  • Severity
  • Relationship

Exercise 3: Survival Analysis Figure

Create Kaplan-Meier plot:

  • Survival curves by arm
  • Confidence intervals
  • Risk table
  • Publication quality

Exercise 4: Full Submission Package

Assemble eCTD package:

  • All TFLs
  • Source code
  • Documentation
  • Validation artifacts

Data Sources

CDISC Pilot Study:

  • Publicly available
  • Standard structure (SDTM/ADaM)
  • Realistic scenarios
  • Pre-converted to Parquet

Location: github.com/nanxstats/pycsr/tree/main/data

Python vs R

When to Use Python

βœ… Large datasets (polars performance)
βœ… ML/AI integration needed
βœ… Team already uses Python
βœ… Cloud-native deployments

When to Use R

βœ… Statistical depth required
βœ… Established R workflows
βœ… Pharmaverse ecosystem
βœ… Regulatory precedent

Best Approach

Use both! Many organizations adopting hybrid:

  • R for statistical analysis
  • Python for data engineering
  • Shared CDISC data formats

Learning Outcomes

βœ… Set up reproducible Python projects with uv
βœ… Process clinical data efficiently with polars
βœ… Create TFLs with plotnine and rtflite
βœ… Manage A&R projects professionally
βœ… Prepare eCTD submission packages
βœ… Understand Python’s role in clinical trials

Resources

Book Chapters:

  1. Python Setup and Environment
  2. Essential Packages for Clinical Reporting
  3. Project Structure and Workflow
  4. Creating Tables
  5. Creating Listings
  6. Creating Figures
  7. Submission Package Assembly

Community:

  • pharmaverse-py initiative
  • Python in Pharma meetups
  • Stack Overflow [python] + [clinical-trials]

Next Steps

  • Complete pycsr.org tutorials
  • Try on your own data
  • Explore pharmaverse-py
  • Contribute to open-source Python clinical tools

Similar Workshops

Next Steps


Last updated: November 2025 | R/Pharma 2025 Conference