Overview • workr

Introduction

Clinical trial pipelines are repeatable but rarely identical. A typical clinical pipieline may include dozens of studies, regular snapshots, and multiple metrics. Studies might share 90 percent of the same logic with a few study-specific tweaks. Maintaining one-off scripts for each study does not scale, yet many pipeline frameworks impose heavy abstractions (or custom code) that are harder to audit in regulated settings.

{workr} is an open-source R package that was created as a deliberately minimal alternative: a small set of functions for describing and running data workflows as ordered function calls. Workflows are defined in YAML with meta and steps. Each step runs a function, resolves parameters from workflow metadata or shared data, and stores the output back into the shared list. This approach grew out of the {gsm} framework for risk-based quality monitoring (RBQM), where the same metric logic must be re-used across many studies and snapshots. The simplicity proved robust enough that we extracted it into a standalone package.

Objective

We introduce {workr} as a simple, auditable workflow engine for R, and demonstrate how it can orchestrate domain-specific pipelines without imposing a heavy framework. As a concrete example, we use a pharmaverse-based reporting workflow (SDTM to ADaM to TFLs and ARS) to show how established clinical reporting packages can be composed into a reproducible pipeline using {workr}.

Methods

Workflow model

{workr} workflows are YAML files with two sections:

Meta: workflow metadata and configuration
Steps: an ordered list of function calls

Each step is written as an {output, name, params} block:

- output:
  name:
  params:

At runtime, RunWorkflow() and RunWorkflows() resolve parameters from meta, the shared data list (lData), or literal values, then execute steps sequentially. The result of each step is stored in lData under output, making it available to downstream steps.

Mini-example: Hello Cars

The workflow below computes the mean speed from cars using two steps: pull a column, then apply mean.

# hello_cars.yaml
meta:
  ID: hello_cars
  col: speed
steps:
  - name: dplyr::pull
    output: speed
    params:
      df: df
      col: col
  - name: mean
    output: result
    params:
      lData: speed

wf <- yaml::read_yaml("hello_cars.yaml")
lData <- list(df = cars)

workr::RunWorkflow(
  lWorkflow = wf,
  lData = lData
)

{gsm} usage

{gsm} is a standardized RBQM framework that pairs a flexible data pipeline with robust reporting. {gsm.core} (https://gilead-biostats.github.io/gsm.core/) provides the analytics foundation and workflow utilities, while {gsm.kri} (https://gilead-biostats.github.io/gsm.kri/) focuses on metric visualizations and reporting outputs. A broader RBQM overview is available here. {workr} functions were originally developed and released as part of the core {gsm} pacakges and are still used to execute YAML-defined steps that orchestrate reusable pipelines across studies and snapshots.

In practice, a typical {gsm} pipeline follows a consistent progression from source data to reports:

Data mapping to standardized domains via {gsm.mapping}
Metric computation using {gsm.core} and {gsm.kri}
Report-ready data models produced by {gsm.reporting}
Interactive HTML reports and widgets generated by {gsm.kri}

All {gsm.core} assessments follow a standardized six-step pipeline: Input_Rate, Transform, Analyze, Threshold, Flag, and Summarize. This structure makes metric logic auditable and consistent across domains.

Example resources and implementations:

Adverse Event workflow example: https://gilead-biostats.github.io/gsm.kri/examples/Cookbook_AdverseEventWorkflow.html
Reporting workflow example: https://gilead-biostats.github.io/gsm.kri/examples/Cookbook_ReportingWorkflow.html
Extensions vignette: https://gilead-biostats.github.io/gsm.core/articles/gsmExtensions.html
Data model and pipeline overview: https://gilead-biostats.github.io/gsm.core/articles/DataModel.html

{Pharmaverse} case study

To ground the discussion, we implement a clinical reporting workflow using pharmaverse packages, with {workr} providing the orchestration layer:

Raw to SDTM: workflows invoke sdtm.oak functions aligned with the VS domain example from the official vignette.
SDTM to ADaM: admiral derivations (for example, derive_vars_merged() and derive_param_map()) construct analysis-ready datasets with transparent derivation logic expressed via !expr tags.
ADaM to TFLs: gtsummary and safetyCharts generate tables and safety visualizations in the same pipeline, enabling publishable outputs and optional refresh of HTML or web-based review artifacts.
ADaM to ARS: cards prototypes ARS-aligned outputs to illustrate how emerging standards can be introduced without retooling the workflow engine.

Results and Conclusion

The pharmaverse example demonstrates that {workr} can orchestrate end-to-end clinical reporting workflows while keeping the execution model simple, transparent, and audit-friendly. Each package remains modular, and the workflow definition stays readable and easy to customize for study-specific variations.

This paper positions {workr} as a minimal workflow backbone for clinical reporting. The pharmaverse pipeline illustrates how domain-specific tools can plug into that backbone, enabling reproducible pipelines that scale across studies without sacrificing clarity or auditability.