OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT
OpenAI Privacy Filter Tops GLiNER on PII Eval
A 600-sample PII benchmark suggests OpenAI Privacy Filter outperforms GLiNER on boundary-overlap scoring, even though its strict exact-match score looks much worse because tokenizer offsets shift spans by one character. The result flips the usual intuition: the model is faster on CPU, but its true accuracy only shows up if you score spans more forgivingly.
// ANALYSIS
The big takeaway is that this is less a model-vs-model knockout than a reminder that eval methodology can swamp the headline numbers. If you score character spans the wrong way, you can make a strong redaction model look broken.
- –On strict exact match, GLiNER looks better; on boundary overlap, OpenAI Privacy Filter wins overall, which is the metric that better matches redaction workflows.
- –The reported gap is driven by tokenization and span reconstruction, not by the model failing to find the entities in the first place.
- –GLiNER still has a real advantage for custom schemas because you can pass arbitrary entity types at inference.
- –OpenAI Privacy Filter’s CPU throughput is materially better here, so it is the more practical option when you need fast local redaction at scale.
- –The threshold sweep matters: GLiNER at 0.7 beats the default 0.5, which means out-of-the-box comparisons can be misleading.
// TAGS
openai-privacy-filterglinerevaluationbenchmarkopen-weightsmoelocal-firstinferencellm
DISCOVERED
1d ago
2026-05-01
PUBLISHED
1d ago
2026-05-01
RELEVANCE
8/ 10
AUTHOR
gvij