When evaluating deepfake voice detectors, vendor marketing is often filled with accuracy claims like "99% detection." In biometrics and signal detection theory, such percentages are meaningless without context. The standard metric for validating verification systems is the Equal Error Rate (EER).
To build defensible controls, risk managers and compliance officers must understand what EER represents and how it translates to their actual operational environments.
What is Equal Error Rate?
A spoof detector makes a binary classification: is this audio authentic (genuine human) or synthetic (spoof)? To do this, the system compares its measurements against a threshold.
- False Acceptance Rate (FAR): The probability that a synthetic or spoofed voice is incorrectly classified as authentic. In fraud terms, this is a false negative: a fake gets through.
- False Rejection Rate (FRR): The probability that a genuine human voice is incorrectly flagged as a spoof. In operational terms, this is a false positive: a legitimate customer or employee call is blocked or delayed.
As you adjust the threshold to be more sensitive, you catch more fakes (FAR decreases), but you flag more genuine calls (FRR increases). If you make the system less sensitive, the opposite occurs.
The Equal Error Rate (EER) is the specific threshold setting where the False Acceptance Rate matches the False Rejection Rate (FAR = FRR).
Why EER is the Correct Baseline
EER is the industry-standard benchmark because it describes the model's performance independent of threshold tuning. A model with a lower EER is inherently more accurate at separating fakes from genuine speech.
However, EER is only a baseline. In production, banks and regulated firms rarely run at the EER point. Flagging 10% of all legitimate calls (a 10% FRR) to catch fakes is operationally unacceptable. Instead, firms select a threshold that minimizes FRR to avoid operational friction, accepting a slightly higher FAR, and relying on step-up verification controls to handle the residual risk.
Evaluating vendor models on actual benchmarks, such as the ASVspoof5 evaluation partition, provides a transparent baseline that risk teams can audit, moving away from opaque marketing numbers to verifiable performance metrics.