Skip to contents

[Stable]

Add columns flagging sites that represent possible statistical outliers when the Identity statistical method is used.

Usage

Flag(dfAnalyzed, strColumn = "Score", vThreshold = NULL, strValueColumn = NULL)

Arguments

dfAnalyzed

data.frame where flags should be added.

strColumn

character Name of the column to use for thresholding. Default: "Score"

vThreshold

numeric Vector of 2 numeric values representing lower and upper threshold values. All values in strColumn are compared to vThreshold using strict comparisons. Values less than the lower threshold or greater than the upper threshold are flagged. Values equal to the threshold values are set to 0 (i.e., not flagged). If NA is provided for either threshold value, it is ignored and no values are flagged based on the threshold. NA and NaN values in strColumn are given NA flag values.

strValueColumn

character Name of the column to use for sign of Flag. If the value for that row is higher than the median of strValueColumn, then Flag is set to 1. Similarly, if the value for that row is lower than the median of strValueColumn, then Flag is set to -1.

Value

data.frame with one row per site with columns: GroupID, TotalCount, Metric, Score, Flag

Details

This function provides a generalized framework for flagging sites as part of the GSM data model (see vignette("DataModel")).

Data Specification

Flag is designed to support the input data (dfAnalyzed) from the Analyze_Identity() function. At a minimum, the input data must have a strGroupCol parameter and a numeric strColumn parameter defined. strColumn will be compared to the specified thresholds in vThreshold to define a new Flag column, which identifies possible statistical outliers. If a user would like to see the directionality of those identified points, they can define the strValueColumn parameter, which will assign a positive or negative indication to already flagged points.

The following columns are considered required:

  • GroupID - Group ID; default is SiteID

  • GroupLevel - Group Type

  • strColumn - A column to use for thresholding

The following column is considered optional:

  • strValueColumn - A column to be used for the sign/directionality of the flagging

Examples

dfTransformed <- Transform_Count(analyticsInput, strCountCol = "Numerator")

dfAnalyzed <- Analyze_Identity(dfTransformed)
#> `Score` column created from `Metric`.

dfFlagged <- Flag(dfAnalyzed, vThreshold = c(0.001, 0.01))