Introduction
Clinical trial pipelines are repeatable but rarely identical. A typical clinical pipieline may include dozens of studies, regular snapshots, and multiple metrics. Studies might share 90 percent of the same logic with a few study-specific tweaks. Maintaining one-off scripts for each study does not scale, yet many pipeline frameworks impose heavy abstractions (or custom code) that are harder to audit in regulated settings.
{workr} is an
open-source R package that was created as a deliberately minimal
alternative: a small set of functions for describing and running data
workflows as ordered function calls. Workflows are defined in YAML with
meta and steps. Each step runs a function,
resolves parameters from workflow metadata or shared data, and stores
the output back into the shared list. This approach grew out of the
{gsm} framework for risk-based quality monitoring (RBQM), where the same
metric logic must be re-used across many studies and snapshots. The
simplicity proved robust enough that we extracted it into a standalone
package.
Objective
We introduce {workr} as a simple, auditable workflow engine for R, and demonstrate how it can orchestrate domain-specific pipelines without imposing a heavy framework. As a concrete example, we use a pharmaverse-based reporting workflow (SDTM to ADaM to TFLs and ARS) to show how established clinical reporting packages can be composed into a reproducible pipeline using {workr}.
Methods
Workflow model
{workr} workflows are YAML files with two sections:
- Meta: workflow metadata and configuration
- Steps: an ordered list of function calls
Each step is written as an {output, name, params} block:
- output:
name:
params:
At runtime, RunWorkflow() and
RunWorkflows() resolve parameters from meta,
the shared data list (lData), or literal values, then
execute steps sequentially. The result of each step is stored in
lData under output, making it available to
downstream steps.
Mini-example: Hello Cars
The workflow below computes the mean speed from cars
using two steps: pull a column, then apply mean.
# hello_cars.yaml
meta:
ID: hello_cars
col: speed
steps:
- name: dplyr::pull
output: speed
params:
df: df
col: col
- name: mean
output: result
params:
lData: speed
wf <- yaml::read_yaml("hello_cars.yaml")
lData <- list(df = cars)
workr::RunWorkflow(
lWorkflow = wf,
lData = lData
){gsm} usage
{gsm} is a standardized RBQM framework that pairs a flexible data pipeline with robust reporting. {gsm.core} (https://gilead-biostats.github.io/gsm.core/) provides the analytics foundation and workflow utilities, while {gsm.kri} (https://gilead-biostats.github.io/gsm.kri/) focuses on metric visualizations and reporting outputs. A broader RBQM overview is available here. {workr} functions were originally developed and released as part of the core {gsm} pacakges and are still used to execute YAML-defined steps that orchestrate reusable pipelines across studies and snapshots.
In practice, a typical {gsm} pipeline follows a consistent progression from source data to reports:
- Data mapping to standardized domains via {gsm.mapping}
- Metric computation using {gsm.core} and {gsm.kri}
- Report-ready data models produced by {gsm.reporting}
- Interactive HTML reports and widgets generated by {gsm.kri}
All {gsm.core} assessments follow a standardized six-step pipeline: Input_Rate, Transform, Analyze, Threshold, Flag, and Summarize. This structure makes metric logic auditable and consistent across domains.
Example resources and implementations:
- Adverse Event workflow example: https://gilead-biostats.github.io/gsm.kri/examples/Cookbook_AdverseEventWorkflow.html
- Reporting workflow example: https://gilead-biostats.github.io/gsm.kri/examples/Cookbook_ReportingWorkflow.html
- Extensions vignette: https://gilead-biostats.github.io/gsm.core/articles/gsmExtensions.html
- Data model and pipeline overview: https://gilead-biostats.github.io/gsm.core/articles/DataModel.html
{Pharmaverse} case study
To ground the discussion, we implement a clinical reporting workflow using pharmaverse packages, with {workr} providing the orchestration layer:
- Raw to SDTM: workflows invoke
sdtm.oakfunctions aligned with the VS domain example from the official vignette. - SDTM to ADaM:
admiralderivations (for example,derive_vars_merged()andderive_param_map()) construct analysis-ready datasets with transparent derivation logic expressed via!exprtags. - ADaM to TFLs:
gtsummaryandsafetyChartsgenerate tables and safety visualizations in the same pipeline, enabling publishable outputs and optional refresh of HTML or web-based review artifacts. - ADaM to ARS:
cardsprototypes ARS-aligned outputs to illustrate how emerging standards can be introduced without retooling the workflow engine.
Results and Conclusion
The pharmaverse example demonstrates that {workr} can orchestrate end-to-end clinical reporting workflows while keeping the execution model simple, transparent, and audit-friendly. Each package remains modular, and the workflow definition stays readable and easy to customize for study-specific variations.
This paper positions {workr} as a minimal workflow backbone for clinical reporting. The pharmaverse pipeline illustrates how domain-specific tools can plug into that backbone, enabling reproducible pipelines that scale across studies without sacrificing clarity or auditability.