Europe/US Sessions
8 thematic sessions with 30+ presentations
Session 1: Practical AI & Industry Adoption
Practical AI for Data Science
Simon Couch (Posit)
Focus on unglamorous but practical AI uses in pharma: structured data extraction, tool calling, and coding assistance. Using {ellmer} for secure, on-premise deployments that protect proprietary information.
Key Points:
- Most AI discourse focuses on flashy applications, but practical uses dominate daily work
- Secure deployments enable AI even with confidential data
- Three core use cases: extraction, tool calling, coding
Resources: github.com/simonpcouch
Session 2: Data Engineering & Training
duckplyr: Analyze Large Data with DuckDB
Kirill Muller (Cynkra)
Stable release (v1.1.2) of {duckplyr} brings DuckDB’s performance to dplyr syntax. Handle larger-than-memory data from disk or cloud with familiar tidyverse semantics.
Key Features:
- Speed up existing dplyr code
- Analyze Parquet/CSV directly
- Access DuckDB functionality
- Maintain R/tidyverse compatibility
Beyond Training: Teaching R Adoption at GSK
Alanah Jonas (GSK)
GSK’s evolution from basic R courses to Rburst - a comprehensive support model embedding R into daily workflows across Biostatistics.
Evolution:
- Two core courses in bookdown
- Self-certification and Resource Hub
- AccelerateR for early adopters
- Rburst for cross-functional integration
Validating Shiny Apps in Regulated Environments
Pedro Silva (Jumping Rivers)
Practical validation approaches for Shiny in clinical/healthcare settings using Litmusverse suite.
Key Topics:
- Traceability and documentation
- Risk-based validation
- Versioning strategies
- Code quality assessment tools
Beyond {gtsummary}: The {crane} Package
Daniel Sjoberg, Davide Garolini (Genentech)
{crane} extends {gtsummary} for pharmaceutical reporting with ARD-based QC and LLM summarization.
Advantages:
- Instant upgrades when
{crane}is loaded - ARD-based QC is straightforward
- LLMs summarize results for medical writers
- Adapt for your needs or build your own
Resources: danieldsjoberg.com/RinPharma-crane-2025
Session 3: Advanced AI Systems
{llumen}: Agentic LLM Framework for Biomedical Documents
Sven-Eric Schelhorn (Merck KGaA)
Internal package at Merck enabling pharmaceutical researchers to work with biomedical documents, databases, and foundation models using LLMs as orchestrators.
Capabilities:
- Multi-source LLMs (Azure, OpenAI, local)
- Vector DB with office document support (Word, Excel, PDF)
- RAG with PaperQA2 approach
- Database querying (SQL, Cypher, GraphQL)
- Foundation models (e.g., TxGemma)
- Histopathology image analysis
Use Cases:
- Extract results from clinical trial documents
- Query tabular databases and knowledge graphs
- Biological foundation models integration
- Automated literature review
- Histopathology analysis
- Drug discovery support
Resources: Drive presentation
Build Model Context Protocol Servers in R
John Coene (Opifex)
{mcpr} provides R implementation of Model Context Protocol (MCP) - standardized JSON-RPC interface for AI models.
Features:
- Schema-based tool definitions
- Multi-modal responses
- Integration with Claude Code, Cursor, VS Code
- Server and client functionality
Impact: Democratizes AI-R integration with standards-based approach.
Session 4: Automation & Innovation
Mosaic: ARS-Driven Automation of Standard TFLs
Conor Moloney (Novartis)
CDISC Analysis Results Standard (ARS) coupled with open-source stack for automated TFL generation.
Architecture:
- YAML captures ARD requirements (ARS-aligned)
- LinkML validation
- Python storage via SQL Alchemy
- R derives ARD (language-agnostic rules)
- React UI for customization and export
Benefits:
- Replaces ad-hoc programming
- Standards-based pipeline
- Accelerates delivery
- Safeguards traceability
Integrating Collaborative Programming with Traceability
Jennifer Dusendang, Sundeep Bath (Graticule Inc)
Adapting DevOps best practices for epidemiological studies and RWD projects.
Implementation:
- Parameterized pipelines in Docker containers
- CI/CD for automatic reanalysis
- Code review with validated outputs
- Automated traceability and audit trails
Tools: Git, GitHub Actions, SQL, Python, R, Docker, AWS S3
Resources: PharmaSUG 2025 Paper
Session 5: Machine Learning & AI Tools
TabPFN: Deep Learning for Tabular Data
Max Kuhn (Posit)
Version 2 of TabPFN offers Bayesian-like approach for tabular data with significant advantages.
Key Points:
- Trained on simulated tabular datasets
- Fast inference (no training needed)
- Emulates Bayesian posterior
- Notable trade-offs to consider
Resources: topepo.github.io/2025-r-pharma
The LLM Lounge: Live Coding with Databot
Joe Cheng, Eric Nantz
Interactive demonstration of Databot for exploratory data analysis.
Topics:
- Databot origin story
- AI trends in life sciences
- Guiding principles for AI tools
- Audience Q&A format
Resources: github.com/posit-dev/querychat
LLM-Powered {gtsummary}: QC-Ready Tables
Davide Garolini (Roche/NEST)
Workflow fusing {gtsummary}, {cards} validation, and offline LLM helper.
Process:
- Create table with
{gtsummary} - Validate with
{cards} - LLM explains steps and results
- Generate descriptive log
- Rerun with real data unchanged
Benefits: Submission-ready tables in minutes with enhanced clarity.
Post-Approval Drug Exposure Estimation
Feifei Yang, Yu Zhang (AstraZeneca)
R Shiny app automating post-marketing exposure calculations.
Results:
- Report generation: 1 week → 1 day
- Built-in QC functions
- Trend visualization
- Team accessibility without training
Implementing End-to-End NCA Software
Gerardo Rodriguez, Jana Spinner (Lucid Analytics)
aNCA - open-source Non-Compartmental Analysis app within Pharmaverse.
Features:
- Interactive plots for exploration
- Half-life customization
- TLGs and report generation
- 100% testing coverage
- Uses PKNCA (200+ PK parameters)
- Industry-standard validation (±0.1%)
Resources: github.com/pharmaverse/aNCA
Integrating LLM for Clinical Data Review
Zhen Wu, Peng Zhang (CIMS Global)
{DataChat} - conversational interface for clinical data with privacy and validity emphasis.
Technologies: {ellmer}, {shinychat}, {ragnar}, RAG capabilities
R We There Yet? {admiral}’s Journey to Stability
Edoardo Mancini (Roche)
Discussion on transitioning mature packages from active development to maintenance.
Questions Addressed:
- Sustaining team momentum at stability
- New priorities post-feature completeness
- Lessons from
{admiral}evolution
Session 6: Enterprise Transformation
GSK’s Journey to Clinical Study Reporting Using Open Source
Sam Warden, Tim Colman (GSK)
Chronicles GSK’s transformation from SAS-dominated era to open-source innovation.
Key Milestones:
- FDA clarification on software neutrality
- Rise of R and open-source platforms
- GSK’s 50%+ open-source code commitment
- COVID-19 acceleration
Challenges:
- Technical validation
- Regulatory uncertainty
- Cultural resistance
- Governance and training needs
Success Factors: Growth mindset, adaptability, collaborative vision
Session 7: AI Integration Case Studies
Leveraging ellmer and GPT in Shiny for Trials
Xing Chen, Xiaolin Chang (Moderna)
AI-enhanced Shiny app for CMI data across mRNA infectious disease programs.
Features:
- Natural language queries → R operations
- Interactive data exploration
- Customizable visualizations
- No manual coding required
Results:
- Faster insight extraction
- Reduced ad-hoc programming
- Enhanced cross-functional collaboration
autoslideR: Streamlining Slide Deck Generation
Yolanda Zhou, Joe Zhu (Roche)
R package automating slide decks for clinical reporting events.
Benefits:
- 0.5-4 days saved per deck
- Customized layout from templates
- Placeholder slides for rapid prep
- Multiple event types supported
Resources: pharmaverse.github.io/examples/digit_files/autoslider.html
Putting the ‘R’ in RWD
Sachin Heerah, Darren Jeng (Pfizer)
Pfizer RWD team’s R package and tools for programmers with varying backgrounds.
Tools:
- Custom R package for database queries
- Dual R/SAS syntax support
- Shiny apps for workflow support
- Code snippets in RStudio
- Quarto website for documentation
Generating Synthetic Data with synthpop
Sophie Furlow (Abbott Diagnostics)
Introduction to synthetic data generation for pharma and diagnostics.
Topics:
- Synthetic vs simulated vs resampled data
- Healthcare applications -
{synthpop}machine learning algorithms - Quality evaluation features
- Generation caveats
Session 8: Innovation & Advanced Methods
GenAI in Production
Devin Pastoor (A2-AI)
Moving beyond prototypes to production AI applications in GxP contexts.
Topics:
- Testing approaches for GenAI
- Validation strategies
- Handling user interaction flexibility
- Release and maintenance
Building the Ultimate R AI Assistant
Pawel Rucki (Roche)
Multi-agent R co-pilot with LangGraph framework.
Architecture:
- Each R package gets its own AI agent
- Network of specialized agents
- Context-aware code generation
- Integration with cursor via MCP
Features: Write, debug, and explain complex R code for clinical trials
The Dependency Whisperer: AI for Impact Analysis
Ming Yan, Vina Ro (Eli Lilly)
AI-powered tool for identifying dependencies in clinical programming.
Problem: Traditional tools can’t distinguish active code from comments or identify indirect dependencies.
Solution:
- Parses SDTM, ADaM, TFL specs/programs
- Learns dependency structure
- Generates graphical impact reports
- Ensures accurate refreshes
BayesERtools: Bayesian Exposure-Response Analysis
Kenta Yoshida (Genentech)
New R package for Bayesian ER analysis with user-friendly interface.
Features:
- Linear and Emax models
- Continuous and binary endpoints
- Based on Stan ecosystem
- Comprehensive online book (BayesERbook)
Resources: genentech.github.io/BayesERtools
Adapting to Regulatory Guidance
Alex Przybylski (Novartis)
FDA 2023 guidance on covariate adjustment and R-enabled submissions.
Case Study:
- Strategic response to FDA feedback
- Lightweight R solutions (
{beeca}) - ASA-BIOP collaboration (
{RobinCar2}) - Benefits of open-source community
R Library Validation using ATDD
Brian Repko (ex-Novartis)
Acceptance-Test Driven Development for R package libraries.
Approach:
- Tests written in Quarto markdown
- “Given-when-then” format
- Regex-matched against annotated functions
- Can drive Shiny apps with
{chromote}
Benefits: Plain-language tests shareable with regulators
🔑 Key Themes Across Sessions
1. AI Integration
- Practical, secure implementations
- Agentic systems and MCP
- Privacy-preserving approaches
- Production-ready validation
2. Open Source Adoption
- GSK leading with 50%+ R code
- Industry-wide shift from SAS
- Collaborative development (pharmaverse)
- FDA acceptance growing
3. Automation & Efficiency
- ARS-driven TFL generation
- Automated slide decks
- One-day turnaround for reports
- CI/CD for clinical workflows
4. Validation & Compliance
- GxP-ready AI applications
- Risk-based validation approaches
- Shared validation repositories
- ATDD for package libraries
5. Advanced Analytics
- Bayesian methods with Stan
- Machine learning (TabPFN)
- Synthetic data generation
- High-performance computing
Europe/US Sessions from R/Pharma 2025 Conference | Last updated: November 2025