Thresholds needed to create the extended confusion matrix

Calculate the two thresholds distinguishing certain negatives/positives from uncertain predictions. The thresholds are needed to create the extended confusion matrix and are further used for confidence calculation.

Usage

thresholds(observations, predictions = NULL, type = "mean", range = 0.5)

Arguments

observations: Either an integer or logical vector containing the binary observations where presences are encoded as 1s/TRUEs and absences as 0s/FALSEs.
predictions: A numeric vector containing the predicted probabilities of occurrence typically within the [0, 1] interval. length(predictions) should be equal to length(observations) and the order of the elements should match. predictions is optional: needed and used only if type is 'mean' and ignored otherwise.
type: A character vector of length one containing the value 'mean' (for calculating mean of the predictions within known presences and absences) or 'information' (for calculating thresholds based on relative information gain) . Defaults to 'mean'.
range: A numeric vector of length one containing a value from the ]0, 0.5] interval. It is the parameter of the information-based method and is used only if type is 'information'. The larger the range is, the more predictions are treated as uncertain. Defaults to 0.5.

Value

A named numeric vector of length 2. The first element ('threshold1') is the mean of probabilities predicted to the absence locations distinguishing certain negatives (certain absences) from uncertain predictions. The second element ('threshold2') is the mean of probabilities predicted to the presence locations distinguishing certain positives (certain presences) from uncertain predictions. For a typical model better than the random guess, the first element is smaller than the second one. The returned value might contain NaN(s) if the number of observed presences and/or absences is 0.

Note

thresholds() should be called using the whole dataset containing both training and evaluation locations.

Examples

set.seed(12345)

# Using logical observations:
observations_1000_logical <- c(rep(x = FALSE, times = 500),
                               rep(x = TRUE, times = 500))
predictions_1000 <- c(runif(n = 500, min = 0, max = 0.7),
                      runif(n = 500, min = 0.3, max = 1))
thresholds(observations = observations_1000_logical,
           predictions = predictions_1000) # 0.370 0.650
#> threshold1 threshold2 
#>  0.3703913  0.6492754 

# Using integer observations:
observations_4000_integer <- c(rep(x = 0L, times = 3000),
                               rep(x = 1L, times = 1000))
predictions_4000 <- c(runif(n = 3000, min = 0, max = 0.8),
                      runif(n = 1000, min = 0.2, max = 0.9))
thresholds(observations = observations_4000_integer,
           predictions = predictions_4000) # 0.399 0.545
#> threshold1 threshold2 
#>  0.3988011  0.5445960 

# Wrong parameterization:
try(thresholds(observations = observations_1000_logical,
               predictions = predictions_4000)) # error
#> Error in thresholds(observations = observations_1000_logical, predictions = predictions_4000) : 
#>   The length of the two parameters ('observations' and 'predictions') should be the same.
set.seed(12345)
observations_4000_numeric <- c(rep(x = 0, times = 3000),
                               rep(x = 1, times = 1000))
predictions_4000_strange <- c(runif(n = 3000, min = -0.3, max = 0.4),
                              runif(n = 1000, min = 0.6, max = 1.5))
try(thresholds(observations = observations_4000_numeric,
               predictions = predictions_4000_strange)) # multiple warnings
#> Warning: I found that parameter 'observations' is not an integer or logical vector. Coercion is done.
#> Warning: Strange predicted values found. Parameter 'predictions' preferably contains numbers falling within the [0,1] interval.
#> threshold1 threshold2 
#> 0.05447816 1.04132376 
mask_of_normal_predictions <- predictions_4000_strange >= 0 & predictions_4000_strange <= 1
thresholds(observations = as.integer(observations_4000_numeric)[mask_of_normal_predictions],
           predictions = predictions_4000_strange[mask_of_normal_predictions]) # OK
#> threshold1 threshold2 
#>  0.2006408  0.7979505