ScreenLeak nears frontier on PII redaction

// 45d agoBENCHMARK RESULT

ScreenLeak nears frontier on PII redaction

ScreenLeak benchmarks a local redaction stack for computer-use data, pairing a 278 MB text model with a small image detector and a trace-level leakage eval. The key claim is speed plus privacy: its text model runs offline at 9 ms p50 on CPU while landing near frontier APIs on synthetic PII-removal tests.

// ANALYSIS

This is a solid niche benchmark, not just a flashy model claim. The important takeaway is that privacy redaction for screen telemetry looks like a solvable systems problem, but the hardest part is still behavior, not detection.

–The text redaction result matters because it compares against desktop-telemetry PII, where generic DLP tools and regex baselines look weak
–The image side reinforces a familiar pattern: frontier multimodal models can spot sensitive content, but specialized small detectors are better at tight localization
–The trace benchmark is the caution flag: recognizing PII does not mean an agent will withhold it when summarizing what it saw
–The strongest caveat is methodology: synthetic, in-distribution validation is useful, but it is still an upper bound rather than proof on messy real-world desktops
–The Reddit response already shows the credibility test the project will face: people want reproducible weights or a Hugging Face link, not just benchmark charts

// TAGS

computer-useevaluationbenchmarksecurityinferencelocal-firstopen-sourcescreenleak

DISCOVERED

45d ago

2026-05-26

PUBLISHED

45d ago

2026-05-26

RELEVANCE

8/ 10

AUTHOR

louis3195

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

RESEARCH34m ago

Meta AI introduces Proactive Memory Agent

Meta AI researchers proposed a decoupled Proactive Memory Agent architecture to address behavioral state decay in long-horizon AI agents. The module runs alongside the primary agent to maintain a structured memory bank and strategically inject memory-grounded reminders, improving performance on complex benchmarks.

UPDATE39m ago

Perplexity Computer adds Claude Opus 4.8

Perplexity has integrated Anthropic's Claude Opus 4.8 in "Fast mode" within its Perplexity Computer workspace. The new tier uses optimized compute to deliver up to 2.5× faster output speeds while maintaining the model's high-quality reasoning for complex workflows.

UPDATE48m ago

Perplexity Computer adds model spend tracking

Perplexity has added an Analytics tab to Perplexity Computer settings, allowing users to track usage and spending across different AI models. The dashboard provides insights into model-specific activity and credit consumption to help manage multi-model workflow costs.