Risk-Based Quality Monitoring (RBQM) is a proactive approach to clinical trial monitoring that focuses on identifying and addressing the most critical risks to the integrity of study data and patient safety. This approach aims to ensure that study data are accurate, reliable, and credible while optimizing the use of resources and minimizing the burden on study sites.

The gsm R package supports RBQM by performing risk assessments primarily focused on detecting differences in quality at the site level. This approach is intended to detect potential issues related to critical data or process(es) across the major risk categories of safety, efficacy, disposition, treatment, and general quality. Each category consists of one or more risk assessment(s). Each risk assessment analyzes the data to flag sites with potential outliers and provides a visualization to help the user understand the issue.


Qualification testing ensures that core functions execute as expected on a system-wide scale. Qualification includes executing various functional, performance, and usability testing. Qualification tests are designed to provide developers with a repeatable process that is easy to update and document. This document summarizes the qualification testing performed on gsm functions essential to the analysis workflow.

Process Overview

Each essential gsm workflow function is independently qualified using specifications and test cases compiled in this report. Details are provided below.


Specifications capture the most critical use cases for a given function. Each function must have at least one (1) specification, and each specification must have at least one (1) associated test case. Multiple specifications may exist for a function, and multiple test cases may exist for a specification.

Each specification includes the following components:

  • Description: outlines the use case for the specification

  • Risk Assessment

    • Risk Level: assigned a value of “Low”, “Medium”, or “High”, corresponding to the risk associated with the specification failing

    • Risk Impact: assigned a value of “Low”, “Medium”, or “High”, corresponding to the severity of the impact associated with the specification failing

  • Test Cases: lists measurable test cases associated with the specification

Test Cases

Test cases translate specifications into testable scripts to confirm that the package functions meet the established requirements. Test cases represent how a user may utilize the function to help identify code gaps and support testing automation.

Test cases for gsm are written using the standard testthat workflow. A single test script is saved for each test case and is named following the convention test_qual_{TestID}.R, where TestID is the test case number. Test code within these scripts is written clearly and concisely to facilitate quick execution and interpretability. Note that a single test case may be associated with multiple specifications.

Test Results: Overview

Function Name Number of Tests Number Passed Number Failed Number Skipped
Adverse Event Assessment 52 52 0 0
Analysis workflow 22 22 0 0
Analyze_NormalApprox_PredictBounds 19 19 0 0
Data Change Rate Assessment 19 19 0 0
Data Entry Lag Assessment 19 19 0 0
Disposition Assessment 35 35 0 0
Flag_NormalApprox 7 7 0 0
Labs Assessment 19 19 0 0
Mapping workflow 4 3 0 0
Protocol Deviation Assessment 30 30 0 0
Query Age Assessment 35 35 0 0
Query Rate Assessment 35 35 0 0
Summarize 16 16 0 0

Test Results: Detailed

One Row Per Specification

Spec ID Spec Description Risk Impact Associated Test IDs
S1_1 Given raw participant-level data, all necessary data.frame transformations are made to create input data for all workflows High High T1_1
S2_1 Given raw participant-level data, a properly specified Workflow for a KRI creates summarized and flagged data High High T2_1
S2_2 Given raw participant-level data with missingness, a properly specified Workflow for a KRI creates summarized and flagged data High High T2_2
S3_1 Given pre-processed input data, a properly specified Workflow for a KRI creates summarized and flagged data High High T3_1
S4_1 Given appropriate metadata (i.e. vThresholds), flagged observations are properly marked in summary data High High T4_1
S4_2 Given appropriate metadata (i.e. vThresholds), data.frame of bounds can be created High High T4_2
S5_1 Given appropriate raw participant-level data, flag values can be correctly assigned to records that meet flagging criteria, including custom thresholding. High High T5_1
S5_2 Given appropriate raw participant-level data, flag values are correctly assigned as NA for sites with low enrollment. High High T5_2
S6_1 Given appropriate raw participant-level data, an Adverse Event Assessment can be done using the Normal Approximation method. High High T6_1
S6_2 Adverse Event Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. High High T6_2
S7_1 Given appropriate raw participant-level data, a Protocol Deviation Assessment can be done using the Normal Approximation method. High High T7_1
S7_2 Protocol Deviation Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. High High T7_2
S8_1 Given appropriate raw participant-level data, a Dispositon Assessment can be done using the Normal Approximation method. High High T8_1
S8_2 Disposition Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. High High T8_2
S9_1 Given appropriate raw participant-level data, a Labs Assessment can be done using the Normal Approximation method. High High T9_1
S9_2 Labs Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. High High T9_2
S10_1 Given appropriate raw participant-level data, a Data Change Rate Assessment can be done using the Normal Approximation method. High High T10_1
S10_2 Data Change Rate Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. High High T10_2
S11_1 Given appropriate raw participant-level data, a Data Entry Lag Assessment can be done using the Normal Approximation method. High High T11_1
S11_2 Data Entry Lag Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. High High T11_2
S12_1 Given appropriate raw participant-level data, a Query Age Assessment can be done using the Normal Approximation method. High High T12_1
S12_2 Query Age Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. High High T12_2
S13_1 Given appropriate raw participant-level data, a Query Rate Assessment can be done using the Normal Approximation method. High High T13_1
S13_2 Query Rate Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. High High T13_2

One Row Per Test

Function Spec ID Test ID Test Description Test Result
Mapping workflow S1_1 T1_1 lData is correctly mapped for processing using mapping.yaml in conjunction with MakeWorkflowList() and RunWorkflow Pass
Analysis workflow S2_1 T2_1 Given raw participant-level data, a properly specified Workflow for a KRI creates summarized and flagged data Pass
Analysis workflow S2_2 T2_2 Given raw participant-level data with missingness, a properly specified Workflow for a KRI creates summarized and flagged data |Pass
Analysis workflow S3_1 T3_1 Given pre-processed input data, a properly specified Workflow for a KRI creates summarized and flagged data Pass
Flag_NormalApprox S4_1 T4_1 Given appropriate metadata (i.e. vThresholds), flagged observations are properly marked in summary data Pass
Analyze_NormalApprox_PredictBounds S4_2 T4_2 Given appropriate metadata (i.e. vThresholds), bounds are properly applied to generate flags Pass
Summarize S5_1 T5_1 Given appropriate raw participant-level data, flag values can be correctly assigned to records that meet flagging criteria, including custom thresholding. Pass
Summarize S5_2 T5_2 Given appropriate raw participant-level data, flag values are correctly assigned as NA for sites with low enrollment. Pass
Adverse Event Assessment S6_1 T6_1 Given appropriate raw participant-level data, an Adverse Event Assessment can be done using the Normal Approximation method. Pass
Adverse Event Assessment S6_2 T6_2 Adverse Event Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. Pass
Protocol Deviation Assessment S7_1 T7_1 Given appropriate raw participant-level data, a Protocol Deviation Assessment can be done using the Normal Approximation method. Pass
Protocol Deviation Assessment S7_2 T7_2 Protocol Deviation Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. Pass
Disposition Assessment S8_1 T8_1 Given appropriate raw participant-level data, a Dispositon Assessment can be done using the Normal Approximation method. Pass
Disposition Assessment S8_2 T8_2 Disposition Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. Pass
Labs Assessment S9_1 T9_1 Given appropriate raw participant-level data, a Labs Assessment can be done using the Normal Approximation method. Pass
Labs Assessment S9_2 T9_2 Labs Assessments can be done correctly using a grouping variable, such as Site or Country for KRIs, and Study for QTLs, when applicable. Pass
Data Change Rate Assessment S10_1 T10_1 Given appropriate raw participant-level data, a Data Change Rate Assessment can be done using the Normal Approximation method. Pass
Data Change Rate Assessment S10_2 T10_2 Data Change Rate Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. Pass
Data Entry Lag Assessment S11_1 T11_1 Given appropriate raw participant-level data, a Data Entry Lag Assessment can be done using the Normal Approximation method. Pass
Data Entry Lag Assessment S11_2 T11_2 Data Entry Lag Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. Pass
Query Age Assessment S12_1 T12_1 Given appropriate raw participant-level data, a Query Age Assessment can be done using the Normal Approximation method. Pass
Query Age Assessment S12_2 T12_2 Query Age Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. Pass
Query Rate Assessment S13_1 T13_1 Given appropriate raw participant-level data, a Query Rate Assessment can be done using the Normal Approximation method. Pass
Query Rate Assessment S13_2 T13_2 Query Rate Assessments can be done correctly using a grouping variable, such as Site, Country, or Study, when applicable. Pass

Unit Tests

Unit Testing Overview

Unit testing was performed in addition to qualification testing to help ensure that individual pieces of code within the R package function correctly and produce the expected results. By testing individual units of code in isolation, developers can identify and fix issues early in the development process before more significant and scaled problems arise.

Unit Test Coverage

The table below summarizes unit test coverage of each gsm function. This metric quantifies the extent of unit testing. The closer this measure is to 100%, the more unit testing that function has.

Function File Coverage
R/aaa-shared.R 0.00 %
R/Analyze_Fisher.R 100.00 %
R/Analyze_Identity.R 100.00 %
R/Analyze_NormalApprox_PredictBounds.R 100.00 %
R/Analyze_NormalApprox.R 100.00 %
R/Analyze_Poisson_PredictBounds.R 97.92 %
R/Analyze_Poisson.R 100.00 %
R/Flag_Fisher.R 100.00 %
R/Flag_NormalApprox.R 100.00 %
R/Flag_Poisson.R 100.00 %
R/Flag.R 100.00 %
R/Input_Rate.R 94.92 %
R/Report_FlagOverTime.R 94.68 %
R/Report_FormatFlag.R 100.00 %
R/Report_KRI.R 0.00 %
R/Report_MetricCharts.R 100.00 %
R/Report_MetricTable.R 97.50 %
R/Report_OverviewText.R 100.00 %
R/Report_Setup.R 93.75 %
R/Report_StudyInfo.R 100.00 %
R/Report_Timeline.R 84.92 %
R/RunQuery.R 100.00 %
R/RunStep.R 100.00 %
R/RunWorkflow.R 88.89 %
R/RunWorkflows.R 0.00 %
R/Summarize.R 100.00 %
R/Transform_Count.R 100.00 %
R/Transform_Rate.R 100.00 %
R/util-BindResults.R 100.00 %
R/util-clindata.R 72.73 %
R/util-gt.R 100.00 %
R/util-MakeBounds.R 100.00 %
R/util-MakeCharts.R 100.00 %
R/util-MakeLongMeta.R 100.00 %
R/util-MakeMetric.R 100.00 %
R/util-MakeWideGroups.R 100.00 %
R/util-MakeWorkflowList.R 96.77 %
R/util-ParseThreshold.R 100.00 %
R/util-RenderRmd.R 0.00 %
R/util-Report.R 72.88 %
R/Visualize_Metric.R 94.38 %
R/Visualize_Scatter.R 100.00 %
R/Visualize_Score.R 98.28 %
R/Widget_BarChart.R 76.32 %
R/Widget_FlagOverTime.R 51.61 %
R/Widget_GroupOverview.R 78.95 %
R/Widget_ScatterPlot.R 75.00 %
R/Widget_TimeSeries.R 76.32 %
Total Coverage 90.70 %

Qualification Testing Environment

Session Information

R version 4.4.1 (2024-06-14)

Platform: x86_64-pc-linux-gnu


attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: RSQLite(v.2.3.7), gsm(v.2.0.1), testthat(v., riskmetric(v.0.2.4), stringr(v.1.5.1), gh(v.1.4.1), pander(v.0.6.5), purrr(v.1.0.2), dplyr(v.1.1.4) and knitr(v.1.48)

loaded via a namespace (and not attached): tidyselect(v.1.2.1), viridisLite(v.0.4.2), blob(v.1.2.4), urltools(v.1.7.3), fastmap(v.1.2.0), lazyeval(v.0.2.2), promises(v.1.3.0), rex(v.1.2.1), digest(v.0.6.36), mime(v.0.12), lifecycle(v.1.0.4), waldo(v.0.5.2), ellipsis(v.0.3.2), magrittr(v.2.0.3), compiler(v.4.4.1), rlang(v.1.1.4), sass(v.0.4.9), tools(v.4.4.1), utf8(v.1.2.4), yaml(v.2.3.10), htmlwidgets(v.1.6.4), bit(v.4.0.5), pkgbuild(v.1.4.4), curl(v.5.2.1), here(v.1.0.1), xml2(v.1.3.6), cranlogs(v.2.1.1), pkgload(v.1.4.0), miniUI(v., covr(v.3.6.4), withr(v.3.0.1), desc(v.1.4.3), triebeard(v.0.4.1), grid(v.4.4.1), fansi(v.1.0.6), urlchecker(v.1.0.1), profvis(v.0.3.8), xtable(v.1.8-4), colorspace(v.2.1-1), ggplot2(v.3.5.1), scales(v.1.3.0), cli(v.3.6.3), chron(v.2.3-61), rmarkdown(v.2.27), ragg(v.1.3.2), generics(v.0.1.3), remotes(v.2.5.0), rstudioapi(v.0.16.0), httr(v.1.4.7), sessioninfo(v.1.2.2), DBI(v.1.2.3), cachem(v.1.1.0), BiocManager(v.1.30.23), vctrs(v.0.6.5), devtools(v.2.4.5), jsonlite(v.1.8.8), bit64(v.4.0.5), systemfonts(v.1.1.0), tidyr(v.1.3.1), jquerylib(v.0.1.4), proto(v.1.0.0), glue(v.1.7.0), clindata(v.1.0.5), pkgdown(v.2.1.0), stringi(v.1.8.4), gtable(v.0.3.5), later(v.1.3.2), munsell(v.0.5.1), tibble(v.3.2.1), pillar(v.1.9.0), htmltools(v., brio(v.1.1.5), R6(v.2.5.1), tcltk(v.4.4.1), textshaping(v.0.4.0), rprojroot(v.2.0.4), kableExtra(v.1.4.0), evaluate(v.0.24.0), shiny(v.1.9.1), highr(v.0.11), gsubfn(v.0.7), backports(v.1.5.0), memoise(v.2.0.1), broom(v.1.0.6), httpuv(v.1.6.15), bslib(v.0.8.0), Rcpp(v.1.0.13), svglite(v.2.1.3), sqldf(v.0.4-11), xfun(v.0.46), fs(v.1.6.4), usethis(v.3.0.0) and pkgconfig(v.2.0.3)

Package List

The table below utilizes the riskmetric package, which quantifies the robustness of an R package. The pkg_score column captures the risk involved with using a package. The risk level ranges from 0 (low risk) to 1 (high risk).

package version pkg_score
broom 1.0.6 0.373
cli 3.6.3 0.406
DBI 1.2.3 0.345
dplyr 1.1.4 0.392
fs 1.6.4 0.336
ggplot2 3.5.1 0.404
glue 1.7.0 0.287
htmltools 0.404
htmlwidgets 1.6.4 0.316
jsonlite 1.8.8 0.474
magrittr 2.0.3 0.284
purrr 1.0.2 0.369
rlang 1.1.4 0.423
stats 4.4.1 0.704
stringr 1.5.1 0.364
tibble 3.2.1 0.357
tidyr 1.3.1 0.385
tools 4.4.1 0.734
utils 4.4.1 0.687
withr 3.0.1 0.324
yaml 2.3.10 0.429

