REDDIT · REDDIT// 1d agoBENCHMARK RESULT

OpenAI Privacy Filter Tops GLiNER on PII Eval

A 600-sample PII benchmark suggests OpenAI Privacy Filter outperforms GLiNER on boundary-overlap scoring, even though its strict exact-match score looks much worse because tokenizer offsets shift spans by one character. The result flips the usual intuition: the model is faster on CPU, but its true accuracy only shows up if you score spans more forgivingly.

// ANALYSIS

The big takeaway is that this is less a model-vs-model knockout than a reminder that eval methodology can swamp the headline numbers. If you score character spans the wrong way, you can make a strong redaction model look broken.

–On strict exact match, GLiNER looks better; on boundary overlap, OpenAI Privacy Filter wins overall, which is the metric that better matches redaction workflows.
–The reported gap is driven by tokenization and span reconstruction, not by the model failing to find the entities in the first place.
–GLiNER still has a real advantage for custom schemas because you can pass arbitrary entity types at inference.
–OpenAI Privacy Filter’s CPU throughput is materially better here, so it is the more practical option when you need fast local redaction at scale.
–The threshold sweep matters: GLiNER at 0.7 beats the default 0.5, which means out-of-the-box comparisons can be misleading.

// TAGS

openai-privacy-filterglinerevaluationbenchmarkopen-weightsmoelocal-firstinferencellm

DISCOVERED

1d ago

2026-05-01

PUBLISHED

1d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

gvij