Calculate the two thresholds distinguishing certain negatives/positives from uncertain predictions. The thresholds are needed to create the extended confusion matrix and are further used for confidence calculation.
Arguments
- observations
Either an integer or logical vector containing the binary observations where presences are encoded as
1
s/TRUE
s and absences as0
s/FALSE
s.- predictions
A numeric vector containing the predicted probabilities of occurrence typically within the
[0, 1]
interval.length(predictions)
should be equal tolength(observations)
and the order of the elements should match.predictions
is optional: needed and used only iftype
is 'mean' and ignored otherwise.- type
A character vector of length one containing the value 'mean' (for calculating mean of the predictions within known presences and absences) or 'information' (for calculating thresholds based on relative information gain) . Defaults to 'mean'.
- range
A numeric vector of length one containing a value from the
]0, 0.5]
interval. It is the parameter of the information-based method and is used only iftype
is 'information'. The larger therange
is, the more predictions are treated as uncertain. Defaults to 0.5.
Value
A named numeric vector of length 2. The first element
('threshold1
') is the mean of probabilities predicted to the absence
locations distinguishing certain negatives (certain absences) from
uncertain predictions. The second element ('threshold2
') is the mean
of probabilities predicted to the presence locations distinguishing certain
positives (certain presences) from uncertain predictions. For a typical
model better than the random guess, the first element is smaller than the
second one. The returned value might contain NaN
(s) if the number of
observed presences and/or absences is 0.
Note
thresholds()
should be called using the whole dataset containing
both training and evaluation locations.
See also
confidence
for calculating confidence,
consistency
for calculating consistency
Examples
set.seed(12345)
# Using logical observations:
observations_1000_logical <- c(rep(x = FALSE, times = 500),
rep(x = TRUE, times = 500))
predictions_1000 <- c(runif(n = 500, min = 0, max = 0.7),
runif(n = 500, min = 0.3, max = 1))
thresholds(observations = observations_1000_logical,
predictions = predictions_1000) # 0.370 0.650
#> threshold1 threshold2
#> 0.3703913 0.6492754
# Using integer observations:
observations_4000_integer <- c(rep(x = 0L, times = 3000),
rep(x = 1L, times = 1000))
predictions_4000 <- c(runif(n = 3000, min = 0, max = 0.8),
runif(n = 1000, min = 0.2, max = 0.9))
thresholds(observations = observations_4000_integer,
predictions = predictions_4000) # 0.399 0.545
#> threshold1 threshold2
#> 0.3988011 0.5445960
# Wrong parameterization:
try(thresholds(observations = observations_1000_logical,
predictions = predictions_4000)) # error
#> Error in thresholds(observations = observations_1000_logical, predictions = predictions_4000) :
#> The length of the two parameters ('observations' and 'predictions') should be the same.
set.seed(12345)
observations_4000_numeric <- c(rep(x = 0, times = 3000),
rep(x = 1, times = 1000))
predictions_4000_strange <- c(runif(n = 3000, min = -0.3, max = 0.4),
runif(n = 1000, min = 0.6, max = 1.5))
try(thresholds(observations = observations_4000_numeric,
predictions = predictions_4000_strange)) # multiple warnings
#> Warning: I found that parameter 'observations' is not an integer or logical vector. Coercion is done.
#> Warning: Strange predicted values found. Parameter 'predictions' preferably contains numbers falling within the [0,1] interval.
#> threshold1 threshold2
#> 0.05447816 1.04132376
mask_of_normal_predictions <- predictions_4000_strange >= 0 & predictions_4000_strange <= 1
thresholds(observations = as.integer(observations_4000_numeric)[mask_of_normal_predictions],
predictions = predictions_4000_strange[mask_of_normal_predictions]) # OK
#> threshold1 threshold2
#> 0.2006408 0.7979505