Python for Clinical Study Report and Submission
Modern Python toolchain for TFLs and eCTD packages
Overview
Intermediate Python Clinical Reporting
Open-source Python offers powerful capabilities for clinical trial analysis and reporting. This workshop introduces practical strategies for preparing tables, listings, and figures (TLFs) in a Clinical Study Report (CSR) and assembling submission-ready eCTD packages.
What Youβll Learn
- π Python environment setup with uv
- π Clinical data engineering with polars
- π TLF creation with plotnine and rtflite
- π¦ eCTD packages with py-pkglite
- π Reproducible workflows end-to-end
Prerequisites
Required Knowledge:
- Basic Python programming
- Understanding of clinical trial analysis
- Familiarity with TFLs
Helpful:
- R experience (for comparison)
- CDISC standards knowledge
Key Tools
Python
uv
polars
plotnine
rtflite
py-pkglite
Workshop Materials
Workshop Slides: pycsr.org/slides/workshop-slides.html
Online Book: Python for Clinical Study Reports and Submission - pycsr.org
GitHub: github.com/nanxstats/pycsr
Development: GitHub Codespaces, VS Code, or Positron
Workshop Modules
Module 1: Python Environment Setup
Using uv for project management:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create project
uv init my-clinical-trial
cd my-clinical-trial
# Add dependencies
uv add polars plotnine rtflite
# Run scripts
uv run analysis.pyBenefits:
- β Fast dependency resolution
- β Reproducible environments
- β No conda/virtualenv complexity
- β Lock files for exact versions
Module 2: Clinical Reporting Packages
polars - High-Performance DataFrames
import polars as pl
# Read CDISC data
adsl = pl.read_parquet("data/adsl.parquet")
# Fast data manipulation
demographics = (
adsl
.filter(pl.col("SAFFL") == "Y")
.group_by("ARM")
.agg([
pl.count().alias("N"),
pl.col("AGE").mean().alias("Age_Mean"),
pl.col("AGE").std().alias("Age_SD")
])
)Why polars?
- 10-100x faster than pandas
- Lazy evaluation
- Better memory efficiency
- Expressive API
plotnine - Grammar of Graphics
from plotnine import *
# Kaplan-Meier plot
(
ggplot(survival_data, aes(x="time", y="survival", color="arm")) +
geom_step(size=1) +
geom_ribbon(aes(ymin="lower", ymax="upper", fill="arm"), alpha=0.2) +
labs(title="Overall Survival", x="Time (months)", y="Probability") +
theme_minimal()
)ggplot2 equivalent for Python!
rtflite - RTF Generation
from rtflite import *
# Create RTF document
doc = RtfDocument()
# Add table
table = create_table(demographics_df)
doc.add_table(table)
# Save
doc.save("demographics.rtf")Module 3: Complete Project Management
Project Structure:
my-trial/
βββ data/ # CDISC datasets
βββ src/ # Python scripts
β βββ tables/
β βββ listings/
β βββ figures/
βββ outputs/ # Generated TFLs
βββ pyproject.toml # Dependencies
βββ README.md
Execution:
# main.py
from src.tables import demographics, adverse_events
from src.figures import km_plot
# Generate all outputs
demographics.create()
adverse_events.create()
km_plot.create()Module 4: eCTD Submission Packages
py-pkglite for packaging:
from pkglite import *
# Create submission package
pkg = Package()
pkg.add_directory("src/", pattern="*.py")
pkg.add_directory("outputs/", pattern="*.rtf")
pkg.pack("submission.txt")
# Includes source code + outputs
# Aligned with eCTD requirementsPractical Exercises
Exercise 1: Demographics Table
Create standard demographics table:
- Age (mean, SD)
- Sex (n, %)
- Race (n, %)
- By treatment arm
Exercise 2: Adverse Events Listing
Generate AE listing with:
- Subject ID
- AE term
- Start/end dates
- Severity
- Relationship
Exercise 3: Survival Analysis Figure
Create Kaplan-Meier plot:
- Survival curves by arm
- Confidence intervals
- Risk table
- Publication quality
Exercise 4: Full Submission Package
Assemble eCTD package:
- All TFLs
- Source code
- Documentation
- Validation artifacts
Data Sources
CDISC Pilot Study:
- Publicly available
- Standard structure (SDTM/ADaM)
- Realistic scenarios
- Pre-converted to Parquet
Python vs R
When to Use Python
β
Large datasets (polars performance)
β
ML/AI integration needed
β
Team already uses Python
β
Cloud-native deployments
When to Use R
β
Statistical depth required
β
Established R workflows
β
Pharmaverse ecosystem
β
Regulatory precedent
Best Approach
Use both! Many organizations adopting hybrid:
- R for statistical analysis
- Python for data engineering
- Shared CDISC data formats
Learning Outcomes
β
Set up reproducible Python projects with uv
β
Process clinical data efficiently with polars
β
Create TFLs with plotnine and rtflite
β
Manage A&R projects professionally
β
Prepare eCTD submission packages
β
Understand Pythonβs role in clinical trials
Resources
Book Chapters:
- Python Setup and Environment
- Essential Packages for Clinical Reporting
- Project Structure and Workflow
- Creating Tables
- Creating Listings
- Creating Figures
- Submission Package Assembly
Community:
- pharmaverse-py initiative
- Python in Pharma meetups
- Stack Overflow [python] + [clinical-trials]
Next Steps
- Complete pycsr.org tutorials
- Try on your own data
- Explore pharmaverse-py
- Contribute to open-source Python clinical tools
Similar Workshops
- Polars: Python Framework - Deep dive on polars
- datasetjson - Data exchange
Next Steps
- R equivalent: See officer/flextable and Cardinal
Last updated: November 2025 | R/Pharma 2025 Conference