Skip to main content

COMPEL Glossary / f1-score

F1 Score

The F1 score is a model performance metric that combines precision and recall into a single balanced measure, calculated as the harmonic mean of the two.

What this means in practice

F1 scores range from 0 to 1, with 1 representing perfect precision and recall. The F1 score is useful when you need a single number to evaluate model quality and when false positives and false negatives carry roughly equal cost. However, in many real-world applications, the costs are not equal -- a missed cancer diagnosis (false negative) is far more costly than an unnecessary follow-up test (false positive). In such cases, weighted variations or separate evaluation of precision and recall may be more appropriate. The F1 score is commonly reported in COMPEL Model stage use case evaluations and Evaluate stage performance assessments.

Why it matters

The F1 score provides a balanced single metric combining precision and recall, but its usefulness depends on whether false positives and false negatives carry roughly equal cost. In many real-world applications, they do not: a missed cancer diagnosis is far costlier than an unnecessary follow-up test. Understanding when F1 is appropriate and when weighted alternatives are needed prevents organizations from optimizing for the wrong metric.

How COMPEL uses it

F1 scores are commonly reported in COMPEL Model stage use case evaluations when assessing candidate models, and in Evaluate stage performance assessments when measuring production model quality. During Model, the AITP ensures metric selection is appropriate for each use case context. The Evaluate stage tracks F1 trends over time as part of the Quality KPI tier in the four-level KPI hierarchy.

Related Terms

Other glossary terms mentioned in this entry's definition and context.