Lightfeed open-sources TypeScript library for LLM data extraction
Lightfeed Extractor manages the entire data extraction pipeline from URL to structured JSON by converting web pages into LLM-optimized markdown and recovering partial data from malformed outputs using Zod schemas. The tool supports LangChain-compatible models, features Playwright browser automation with anti-bot measures, and pairs with their browser agent for AI-driven navigation.
This library targets a common pain point in LLM-based web scraping: brittle JSON outputs that fail validation due to minor hallucinations or formatting errors.
* **Resilience over perfection:** The ability to salvage partial valid data from nested arrays or optional fields is a significant pragmatic improvement for production scraping workloads.
* **End-to-end focus:** By handling headless browser automation, content sanitization, and LLM extraction in one package, it reduces the boilerplate needed to set up reliable scraping pipelines.
* **Ecosystem flexibility:** Compatibility with LangChain ensures developers aren't locked into a single provider and can swap between local (Ollama) and hosted models.
DISCOVERED
16d ago
2026-03-26
PUBLISHED
16d ago
2026-03-26
RELEVANCE
AUTHOR
Visual-Librarian6601