COMPEL Glossary / f1-score
F1 Score
The F1 score is a model performance metric that combines precision and recall into a single balanced measure, calculated as the harmonic mean of the two.
What this means in practice
F1 scores range from 0 to 1, with 1 representing perfect precision and recall. The F1 score is useful when you need a single number to evaluate model quality and when false positives and false negatives carry roughly equal cost. However, in many real-world applications, the costs are not equal -- a missed cancer diagnosis (false negative) is far more costly than an unnecessary follow-up test (false positive). In such cases, weighted variations or separate evaluation of precision and recall may be more appropriate. The F1 score is commonly reported in COMPEL Model stage use case evaluations and Evaluate stage performance assessments.
Why it matters
The F1 score provides a balanced single metric combining precision and recall, but its usefulness depends on whether false positives and false negatives carry roughly equal cost. In many real-world applications, they do not: a missed cancer diagnosis is far costlier than an unnecessary follow-up test. Understanding when F1 is appropriate and when weighted alternatives are needed prevents organizations from optimizing for the wrong metric.
How COMPEL uses it
F1 scores are commonly reported in COMPEL Model stage use case evaluations when assessing candidate models, and in Evaluate stage performance assessments when measuring production model quality. During Model, the AITP ensures metric selection is appropriate for each use case context. The Evaluate stage tracks F1 trends over time as part of the Quality KPI tier in the four-level KPI hierarchy.
Related Terms
Other glossary terms mentioned in this entry's definition and context.