Trends & Insights

Key themes and future directions from R/Pharma 2025

Executive Summary

R/Pharma 2025 marked a pivotal moment in pharmaceutical R adoption, with three dominant themes emerging: AI/LLM integration, open-source transformation, and regulatory acceptance. The conference showcased 19 workshops and 30+ presentations revealing an industry in rapid transition from proprietary tools to collaborative, standards-based approaches.

Key Statistics:

  • ๐Ÿค– 8+ sessions on AI/LLM (50% increase from previous years)
  • ๐Ÿ“Š GSK achieved 50%+ R code across Biostatistics
  • โœ… FDA neutrality on statistical software increasingly normalized
  • ๐ŸŒ CDISC ARS/ARM driving automation and standardization

1. The AI/LLM Revolution ๐Ÿค–

Emergence as Dominant Theme

AI and Large Language Models were the most discussed topic, appearing in:

  • 3 dedicated workshops (ellmer basics, enterprise tooling, clinical data privacy)
  • 8+ presentations across all sessions
  • Multiple vendor tools ({ellmer}, {llumen}, {mcpr}, Databot, {meRlin})

From Prototype to Production

The conversation has matured from โ€œCan we use AI?โ€ to โ€œHow do we deploy it safely?โ€

Key Developments:

1. Enterprise-Ready Frameworks

Organizations are building production AI systems:

  • Merckโ€™s {llumen} - Comprehensive agentic framework used internally
    • Supports multiple LLM sources (Azure, OpenAI, local)
    • RAG with vector databases for documents
    • Database querying (SQL, Cypher, GraphQL)
    • Foundation model integration (TxGemma)
  • Rocheโ€™s Multi-Agent Copilot - Package-specific AI agents
    • Each internal package gets its own expert agent
    • LangGraph orchestration
    • Integration with Cursor via MCP
  • A2-AIโ€™s GxP Solutions - Validated AI workflows
    • AWS Bedrock integration
    • MCP server implementations
    • Compliance documentation

2. Privacy-Preserving Approaches

Critical for pharma with sensitive clinical data:

  • On-premise LLM deployments (llama.cpp)
  • Query sanitization removing PII
  • Audit logging for compliance
  • Role-based access controls
  • {DataChat} example - Chat interface with data never leaving secure environment

3. Practical Applications

Moving beyond novelty to genuine productivity:

Data Exploration:

  • Natural language queries on CDISC data
  • Conversational interfaces for non-programmers
  • RAG-powered document search

Code Assistance:

  • Context-aware programming help
  • Debugging complex statistical code
  • Documentation generation

Report Automation:

  • LLM-powered table summarization
  • Narrative generation from results
  • QC log generation

Standards Emerging

Model Context Protocol (MCP):

  • Standardized interface for AI tools
  • R implementation via {mcpr}
  • Cross-language interoperability
  • Growing ecosystem (Claude Code, Cursor, VS Code)

Challenges Identified

Despite enthusiasm, several barriers remain:

  • โš ๏ธ Validation complexity - How to validate non-deterministic outputs
  • โš ๏ธ Cost management - API costs at scale
  • โš ๏ธ Regulatory uncertainty - Limited guidance on AI in submissions
  • โš ๏ธ Skills gap - Prompt engineering and AI orchestration expertise

2. Open-Source Transformation ๐ŸŒ

GSKโ€™s Landmark Achievement

Sam Wardenโ€™s presentation on GSKโ€™s journey was a conference highlight:

Timeline:

  • 2018-2020: Pilot programs and early adopters
  • 2020-2022: COVID acceleration and platform adoption
  • 2023-2024: 50%+ R code target achieved
  • 2025: Rburst initiative for full integration

Success Factors:

  1. Executive commitment - Clear target setting (50%+ R)
  2. Multi-wave approach - Gradual transformation
  3. Training investment - Bookdown courses, Resource Hub, AccelerateR
  4. Platform deployment - Posit Workbench for infrastructure
  5. Cultural change - Growth mindset and adaptability

Industry-Wide Movement

GSK is not alone - the tipping point has been reached:

Novartis:

  • Mosaic platform for ARS-driven TFL automation
  • Standards-based, language-agnostic approach
  • React UI with R backend

Roche/Genentech:

  • Leading {admiral}, {teal}, {gtsummary} development
  • autoslideR for slide automation
  • {crane} for pharma-specific reporting

Pfizer:

  • Custom R packages for RWD programming
  • Dual R/SAS syntax support
  • Quarto documentation sites

Moderna:

  • AI-enhanced Shiny apps for trial data
  • ellmer/GPT integration
  • Accelerating exploratory analysis

Pharmaverse Ecosystem

The collaborative open-source movement is thriving:

Key Packages:

  • {admiral} - Transitioning to stable maintenance (feature complete)
  • {teal} - Version 1.0 released (interactive apps)
  • {gtsummary}/{crane} - Clinical table generation
  • {cardinal} - Standardized TLG templates
  • {sdtm.oak} - SDTM programming framework

Benefits Realized:

  • โœ… Reduced duplication across companies
  • โœ… Shared validation burden
  • โœ… Faster innovation cycles
  • โœ… Transparent, auditable code

3. Regulatory Evolution โœ…

FDA Software Neutrality

The 2015 FDA clarification on software neutrality is now fully operationalized:

  • No preference for SAS vs R vs Python
  • Focus on methodology and validation, not tools
  • R-based submissions increasingly routine

New FDA Guidance Impact

2023 Covariate Adjustment Guidance driving change:

  • Novartis developed {beeca} in response
  • ASA-BIOP collaboration on {RobinCar2}
  • Academic methods reaching practice faster
  • R enabling rapid response to guidance

Validation Maturity

Key Developments:

1. Risk-Based Approaches

  • {riskmetric} for package assessment
  • Shared metric repositories (R Validation Hub)
  • Automated quality scoring
  • Focus on critical packages

2. Acceptance-Test Driven Development (ATDD)

  • Plain-language tests in Quarto
  • Shareable with regulators
  • โ€œGiven-when-thenโ€ format
  • Brian Repko bringing JBehave concepts to R

3. Shiny App Validation

  • Litmusverse suite (Jumping Rivers)
  • Risk-based validation frameworks
  • Code quality assessment
  • Traceability and documentation tools

GxP-Ready AI

Pioneering work on validating AI applications:

  • Devin Pastoor (A2-AI): Production AI in GxP contexts
  • Testing strategies for non-deterministic outputs
  • Audit trails and logging
  • Version control and rollback procedures

4. Automation & Efficiency ๐Ÿ“Š

ARS/ARM as Game Changers

CDISC Analysis Results Standard driving automation:

Mosaic (Novartis):

  • YAML captures ARD requirements
  • Language-agnostic rules (metadata โ†’ R or any language)
  • React UI for customization
  • Push-button TFL generation

{cards} Ecosystem:

  • Analysis Results Data objects
  • QC becomes straightforward (compare ARDs)
  • LLMs can summarize language-agnostic results
  • {crane} and {gtsummary} building on this foundation

Report Automation

Time Savings Achieved:

  • autoslideR (Roche): 0.5-4 days per slide deck
  • DMC materials automation: Week โ†’ Day for exposure reports (AstraZeneca)
  • LLM-powered {gtsummary}: Submission-ready tables in minutes vs hours

Technologies Enabling This:

  1. officer/flextable - Programmatic Word reports
  2. Quarto - Multi-format publishing (PDF, HTML, presentations)
  3. Template systems - Reusable structures (Cardinal)
  4. CI/CD pipelines - Automated reanalysis on code changes

5. Advanced Analytics & Methods ๐Ÿ“ˆ

Bayesian Methods Maturing

Tools Reaching Production:

  • BayesERtools (Genentech) - Exposure-response with Stan
  • bmstate (Generable) - Multistate survival models
  • Stan debugging workshop - Making Bayesian accessible

Advantages in Pharma:

  • Prior information integration
  • Uncertainty quantification
  • Small sample performance
  • Complex model flexibility

High-Performance Computing

Polars for Clinical Data:

  • 10-100x faster than pandas
  • Apache Arrow native
  • Lazy evaluation
  • Parallel processing out-of-the-box

HPC Integration:

  • Offloading Shiny computations to clusters
  • Maintaining interactive UX
  • Resource optimization

Machine Learning

TabPFN - Novel deep learning for tabular data:

  • No training required (pre-trained on priors)
  • Fast inference
  • Bayesian-like uncertainty
  • Promising for exploratory analysis

Synthetic Data:

  • {synthpop} for privacy-preserving data sharing
  • Software testing without PHI
  • Model training on synthetic patients

6. Data Quality & Validation ๐Ÿ”

{pointblank} Adoption

Comprehensive data validation framework:

Use Cases:

  • Quick dataset understanding (scan_data())
  • Validation at scale (35+ tables daily)
  • Beautiful automated documentation
  • Integration with databases, Arrow, Shiny

Pharmaceutical Applications:

  • SDTM compliance checks
  • ADaM dataset verification
  • Cross-domain consistency
  • Longitudinal data integrity

Automated Traceability

DevOps Principles in Pharma:

Graticuleโ€™s approach:

  • Docker containers for reproducibility
  • CI/CD for automatic reanalysis
  • Version control with validated outputs
  • Cloud storage (AWS S3) for results

Benefits:

  • Clear audit trail
  • Reproducible analyses
  • Automated documentation
  • Integrated QC

7. Specialized Applications ๐Ÿงฌ

Real-World Evidence

Pfizerโ€™s RWD Programming:

  • Custom R package for database queries
  • Dual R/SAS syntax support
  • Shiny apps for workflow support
  • Quarto documentation

Niche Medical Devices

Abbottโ€™s Synthetic Data:

  • {synthpop} for diagnostics
  • Privacy-preserving test data
  • Validation datasets

Oncology Innovations

PMDA Inquiries:

  • {cards} for rapid response (Japan regulatory)
  • Structured ARD approach
  • Accelerated turnaround

Pharmacokinetics

aNCA (Pharmaverse):

  • Open-source NCA software
  • PKNCA backend (200+ parameters)
  • 100% test coverage
  • Validated against commercial tools (ยฑ0.1%)

8. Regional Insights ๐ŸŒ

Asia/Pacific Innovations

Strengths:

  • Strong PMDA regulatory focus
  • Practical automation tools
  • Under-represented population research
  • Genetic diversity considerations

Notable Presentations:

  • SHIONOGIโ€™s open-source culture
  • {meRlin} AI assistant
  • PMDA e-CRT automation
  • Vibe Coding approaches

Global Collaboration

Conference demonstrated truly global R pharma community:

  • Workshops from US, Europe, Asia
  • Shared challenges and solutions
  • Pharmaverse as unifying force

9. Future Directions ๐Ÿ”ฎ

Short-Term (2025-2026)

AI Integration:

  • โœ… More validated AI applications in production
  • โœ… MCP ecosystem expansion
  • โœ… Regulatory guidance on AI in submissions
  • โœ… Cost-effective on-premise LLM solutions

Automation:

  • โœ… ARS/ARM adoption across industry
  • โœ… Push-button TFL generation becomes standard
  • โœ… Automated slide decks and CSRs routine

Validation:

  • โœ… Shared metric repositories operational
  • โœ… Risk-based validation industry standard
  • โœ… Shiny app validation frameworks mature

Medium-Term (2027-2028)

Tool Consolidation:

  • Language-agnostic analysis platforms
  • R/Python/Julia interoperability
  • Cloud-native pharma analytics

Regulatory:

  • AI-specific guidance documents
  • Electronic submissions with code
  • Real-time regulatory review platforms

Skills:

  • AI/LLM literacy standard for programmers
  • Bayesian methods more accessible
  • DevOps practices ubiquitous

Long-Term (2029+)

Transformative Possibilities:

  1. AI-Designed Clinical Trials

    • LLMs suggesting optimal designs
    • Automated SAP generation
    • Real-time adaptation
  2. Fully Automated TFLs

    • From data lock to submission package
    • Human review only, no coding
    • Multi-language outputs for global submissions
  3. Real-Time Safety Monitoring

    • Continuous analysis during trials
    • AI-detected signals
    • Predictive adverse event modeling
  4. Personalized Trial Design

    • Bayesian adaptive with real-time learning
    • Biomarker-driven enrollment
    • Precision medicine integration

10. Challenges & Barriers โš ๏ธ

Technical Challenges

Validation Complexity:

  • Non-deterministic AI outputs
  • Rapidly evolving tools
  • Keeping pace with innovation

Integration Issues:

  • Legacy system compatibility
  • Data silos
  • IT security constraints

Organizational Challenges

Cultural Resistance:

  • โ€œSAS is what weโ€™ve always usedโ€
  • Fear of change
  • Risk aversion in regulated environment

Resource Constraints:

  • Training investment required
  • Dual maintenance (SAS + R) during transition
  • Validation documentation burden

Regulatory Challenges

Uncertainty:

  • Limited AI/LLM guidance
  • Evolving expectations
  • Regional variations (FDA vs EMA vs PMDA)

Documentation:

  • What level of detail for AI systems?
  • How to handle model updates?
  • Third-party API dependencies

11. Success Patterns ๐ŸŽฏ

What Works

Based on successful implementations presented:

1. Executive Sponsorship

  • Clear targets (GSKโ€™s 50%)
  • Resource commitment
  • Long-term vision

2. Gradual Transformation

  • Multi-wave approach
  • Pilot programs
  • Learn and adapt

3. Support Infrastructure

  • Training programs (not just one-time)
  • Help desks and office hours
  • Documentation and resources

4. Community Building

  • Internal R user groups
  • External pharmaverse participation
  • Shared learning culture

5. Platform Investment

  • Posit Workbench
  • Version control (Git/GitHub)
  • CI/CD pipelines
  • Cloud infrastructure

6. Standards Adoption

  • CDISC ARS/ARM
  • Pharmaverse conventions
  • Style guides and linters

12. Actionable Insights ๐Ÿ’ก

For Organizations Starting R Adoption

  1. Start with low-risk projects (exploratory analysis, visualizations)
  2. Invest in training early and continuously
  3. Build internal community (lunch & learns, Slack channels)
  4. Adopt pharmaverse packages (donโ€™t reinvent)
  5. Establish validation framework from day one
  6. Use Posit products (Workbench, Connect, Package Manager)

For Organizations Scaling R

  1. Embrace automation (ARS/ARM, template systems)
  2. Implement CI/CD for reproducibility
  3. Explore AI carefully (privacy first, validate thoroughly)
  4. Contribute to pharmaverse (share internal packages)
  5. Formalize support model (beyond training)
  6. Plan SAS sunset (donโ€™t maintain dual indefinitely)

For R Professionals

  1. Learn AI/LLM basics (will be essential skill)
  2. Master one reporting framework (officer/flextable or Quarto)
  3. Understand validation (not just coding)
  4. Contribute to open source (career differentiator)
  5. Stay current (R/Pharma, webinars, blogs)
  6. Network (pharmaverse Slack, conferences)

13. The Bigger Picture ๐ŸŒ

R/Pharma 2025 in Context

This conference revealed an industry at an inflection point:

From:

  • Proprietary, siloed tools
  • Manual, repetitive coding
  • Conservative, risk-averse culture
  • Isolated company solutions

To:

  • Open-source, collaborative development
  • Automated, standards-driven workflows
  • Innovative, evidence-based adoption
  • Shared industry platforms

Why This Matters

For Patients:

  • Faster drug development
  • More rigorous analysis
  • Transparent, reproducible science
  • Precision medicine enablement

For Industry:

  • Reduced costs
  • Faster time to market
  • Better talent recruitment
  • Competitive advantage

For Science:

  • Reproducibility crisis addressed
  • Method innovation accelerated
  • Global collaboration enabled
  • Democratized advanced analytics

14. How R/Pharma Has Evolved ๐Ÿ“ˆ

Conference Evolution (2018-2025)

Based on publicly available information and this yearโ€™s content, hereโ€™s how the conference themes have shifted:

2018-2020: Foundation Years

Key Themes:

  • ๐Ÿ“ฆ Package development basics - Building pharma packages
  • ๐Ÿ“Š Basic Shiny apps - Interactive visualizations
  • ๐Ÿ”„ SAS vs R debates - โ€œShould we switch?โ€
  • ๐Ÿ“ Simple reporting - Basic TFLs

Sentiment: Cautious optimism, early adopters sharing success stories

Notable:

  • Pharmaverse not yet established
  • Validation was major concern
  • FDA software neutrality clarification (2015) still fresh

2021-2022: Acceleration Phase

Key Themes:

  • ๐Ÿš€ COVID acceleration - Remote work drove R adoption
  • ๐Ÿค Pharmaverse launch - Collaborative ecosystem born
  • โœ… Validation frameworks - {riskmetric}, R Validation Hub
  • ๐Ÿ“ฆ Production packages - {admiral}, {teal} gaining traction

Sentiment: Growing confidence, momentum building

Milestones:

  • Major pharma (Roche, Novartis) sharing production solutions
  • Regulatory submissions with R increasing
  • Training programs formalized (GSK example)

2023-2024: Mainstream Adoption

Key Themes:

  • ๐Ÿ“Š CDISC integration - Analysis Results Standard (ARS/ARM)
  • ๐Ÿ”ง Automation focus - Template-based TFL generation
  • ๐Ÿ“ˆ Advanced analytics - Bayesian methods, ML
  • ๐ŸŒ Multi-language - R + Python workflows

Sentiment: R is mainstream, focus shifts to optimization

Evidence:

  • Multiple talks on automation (not just feasibility)
  • Validation becoming standardized, less controversial
  • Advanced topics (multistate models, Stan) gaining space

2025: AI Revolution Year

Dominant Themes:

  • ๐Ÿค– AI/LLM explosion - 8+ sessions (50% increase)
  • ๐Ÿข Enterprise transformation - GSKโ€™s 50%+ achievement
  • โœ… Regulatory maturity - GxP-ready AI, validated Shiny
  • ๐Ÿš€ Production at scale - From โ€œcan we?โ€ to โ€œhow fast?โ€

Shift in Discourse:

  • 2018: โ€œIs R viable for pharma?โ€
  • 2022: โ€œHow do we migrate from SAS?โ€
  • 2025: โ€œHow do we integrate AI with R?โ€

Whatโ€™s New in 2025:

  1. AI Dominance

    • 2018-2022: Barely mentioned
    • 2023: Experimental projects
    • 2024: Pilot implementations
    • 2025: Production systems ({llumen}, multi-agent copilots, enterprise frameworks)
  2. Regulatory Confidence

    • 2018-2020: โ€œWill FDA accept this?โ€
    • 2021-2022: โ€œHereโ€™s how we validatedโ€
    • 2023-2024: โ€œValidation frameworks establishedโ€
    • 2025: โ€œValidated AI in GxPโ€ - previously unthinkable
  3. Speed of Innovation

    • 2018: Yearly package updates
    • 2022: Quarterly releases (pharmaverse)
    • 2025: Real-time AI integration - tools released months ago
  4. Collaboration Level

    • 2018: Individual company solutions
    • 2020: Pharmaverse begins
    • 2023: Cross-company working groups
    • 2025: Shared AI infrastructure (MCP, metric repos)

Topic Frequency Evolution

Topic 2020 2022 2024 2025
Package Development ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ
Validation ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ
Shiny Apps ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ
Clinical Reporting ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ
Bayesian Methods ๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ
AI/LLM - - ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ
Automation ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ
Python Integration ๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ

Interpretation:

  • Traditional topics (package dev, validation) declining as they become solved problems
  • AI/LLM emerged from nowhere to dominate the conversation
  • Automation shifted from โ€œnice to haveโ€ to strategic imperative
  • Python integration normalized (not R vs Python, but R and Python)

What Changed Between 2024 and 2025?

Major Shifts:

  1. {ellmer} Package Launch (2024)

    • Made LLM integration trivial in R
    • Multiple workshops/presentations built on it
    • Enterprise adoption accelerated
  2. Claude/ChatGPT API Maturity

    • Function calling standardized
    • MCP protocol emerged
    • Privacy-preserving deployments viable
  3. GSK Milestone

    • 50%+ R code achieved
    • Proof that full transformation possible
    • Other pharma following suit
  4. CDISC ARS Adoption

    • Analysis Results Standard gaining traction
    • Mosaic (Novartis) and others automating TFLs
    • Machine-readable results enabling AI
  5. Posit Product Evolution

    • Positron IDE launched (VS Code + R)
    • Better Python support
    • AI assistants integrated

What Stayed Constant:

  • โœ… Pharmaverse continues thriving
  • โœ… Validation remains priority (but less controversial)
  • โœ… Collaboration over competition
  • โœ… Open-source commitment

Predictions for R/Pharma 2026

Based on the trajectory, expect:

Likely Topics:

  1. โ€œAI in Submissionsโ€ - First AI-powered analysis in regulatory filing
  2. โ€œReal-time Clinical Trial Analysisโ€ - Adaptive designs with AI
  3. โ€œSynthetic Patient Generationโ€ - Privacy-preserving AI for trials
  4. โ€œQuantum ML for Drug Discoveryโ€ - Cutting edge
  5. โ€œFully Automated Study Reportsโ€ - From data lock to CSR in hours

Maturing Topics:

  • LLM integration will be assumed (not taught from scratch)
  • Validation frameworks standardized across industry
  • Python-R seamless interop (not separate tracks)

Declining Topics:

  • โ€œIntro to R packagesโ€ (basics widely known)
  • โ€œWhy leave SAS?โ€ (decision already made)
  • โ€œCan we validate R?โ€ (settled question)

Bold Prediction:

R/Pharma 2026 will feature the first fully AI-generated clinical study report accepted by a regulatory agency. The discussion will shift from โ€œIs this possible?โ€ to โ€œWhat should humans still review?โ€


15. Conclusion

R/Pharma 2025 demonstrated that open-source pharmaceutical analytics is not the future itโ€™s the present. With AI integration, regulatory acceptance, and enterprise adoption converging, the question is no longer โ€œShould we adopt R?โ€ but โ€œHow fast can we transform?โ€

The conference showcased:

  • โœ… Production-ready AI systems
  • โœ… Validated, GxP-compliant workflows
  • โœ… Major pharma success stories (GSK 50%+)
  • โœ… Thriving ecosystem (pharmaverse)
  • โœ… Regulatory confidence (FDA neutrality)

The momentum is undeniable. The future is open source. The time is now.


TipStay Connected

Analysis compiled from R/Pharma 2025 Conference materials | Last updated: November 2025