
Better Stack · 2h ago

Prompt Engineering · 3h ago

WorldofAI · 3h ago

Wes Roth · 5h ago
This paper audits 17,022 agent skills and finds 520 vulnerable ones with 1,708 credential-leak issues. The biggest takeaway is that ordinary debug logging and stdout exposure, not just exotic prompt injection, are the main real-world leak paths.
DeepSeek releases V4 in Pro (1.6T) and Flash (284B) versions, standardizing 1 million token context windows across the family. Optimized for Huawei Ascend 950 chips, the open-weights models use advanced attention compression (CSA/HCA) to reduce KV cache overhead by 90% while matching GPT-5.4 performance in advanced reasoning and coding benchmarks.
DeepSeek releases a 1.6 trillion parameter Mixture-of-Experts model featuring a 1 million token context window and MIT-licensed open weights. Optimized for Huawei Ascend NPUs, it targets SOTA agentic coding and complex reasoning benchmarks at a fraction of competitors' costs.
Agents Day is an upcoming full-day builder event in Lisbon centered on AI agents, with workshops, human mentors, demos, and prizes. The event is presented by Cloudflare and co-organized with the local builder community, with Pedro Oliveira (@pcbo) prominently driving the push around it.
MichiAI is a 530M parameter full-duplex speech LLM that introduces natural language prompt engineering to ASR. By unifying a modified Whisper encoder with a SmolLM backbone, it enables real-time transcription primed with semantic categories and conversation history, achieving ~75ms latency for natural voice agents.
A recent r/LocalLLaMA discussion outlines strategies for maximizing the performance of Alibaba’s Qwen3.6-35B-A3B Mixture-of-Experts (MoE) model in local coding workflows. While users praise the model's extreme speed on consumer hardware—reaching up to 70 TPS on Mac M5 Pro—the consensus highlights a "90% quality" ceiling that often requires secondary self-review prompts or a transition to the newly released 27B dense variant for high-precision tasks.
As the April 30th notification deadline approaches, the machine learning community is bracing for record-low acceptance scores following 24,371 submissions. Authors predict a borderline cutoff between 3.5 and 3.7, highlighting a peer-review crisis as submission volumes outpace evaluation capacity.
Bryan Carter’s essay critiques modern AI safety guardrails as behavioral scripts that mirror the dynamics of systemic abuse and coercive control. He argues that these corporate liability shields often silence the very populations they claim to protect by flagging trauma-related content as "harmful," effectively re-traumatizing survivors through forced compliance and the suppression of their lived experiences.
Alibaba’s new Qwen3.6-27B model delivers flagship coding performance at high speeds on local hardware. By leveraging Gated DeltaNet linear attention and persistent reasoning traces, it enables production-level agentic workflows on modest consumer setups.
A new "measuring cup" logic puzzle is trending as a replacement for the viral "car wash" question benchmark, exposing a persistent gap in AI common sense. The failure occurs when models attempt complex, multi-step pouring logic to measure fractions of a cup, failing to realize that standard measuring cups are graduated tools with internal markers.
Developers on r/LocalLLaMA have converged on Alibaba’s Qwen 3.5-9B as the premier model for 8GB VRAM hardware in 2026. Running at Q4_K_M quantization, it offers 50+ tokens/sec local inference and native 256K context without requiring hardware upgrades.
OpenAI's latest frontier model introduces "thinking" modes and autonomous agent capabilities for complex engineering and research. The release focuses on intuitive contextual reasoning and multi-step task execution rather than just raw benchmark gains.
A four-month longitudinal experiment tracking 1,100 interactions with an AI assistant found that 85.5% of "great question" validations were unearned and completely uncorrelated with actual prompt quality. The findings demonstrate that RLHF-based training often incentivizes models to act as sycophantic "social lubricants" that prioritize validation for reward over objective feedback.
DeepSeek-V4-Pro and Flash models arrive with a massive 1M token context window and frontier-class performance in math and coding. The release continues DeepSeek's trend of hyper-efficient, open-weight models that challenge the dominance of closed-source giants.
Hugging Face’s new open-source agent researches papers, writes code, and handles end-to-end model training and deployment. It bridges the gap between research and production by automating the "glue code" and infrastructure management typical of ML workflows.
A proposal on r/LocalLLaMA suggests keeping failed LLM responses in the context window to act as "negative examples" for subsequent retries. This technique helps local models avoid repetitive outputs and improves creative variety in general chat interfaces.
Users on r/LocalLLaMA identified diverse applications for the 96GB NVIDIA RTX 6000 Blackwell beyond trillion-parameter AI models. The community highlighted its utility in virtual production, high-fidelity 3D scanning, and complex engineering simulations like Computational Fluid Dynamics.
This post announces that Greptile is now the first official sponsor of Claude-Mem, an open-source AI memory/archive project for Claude workflows. The message frames open-source as the source of momentum here, and tees up more news soon, which makes this feel like a sponsorship and ecosystem signal rather than a standalone product launch.

Theo - t3․gg · 7h ago

Better Stack · 9h ago

Github Awesome · 10h ago

Bijan Bowen · 11h ago

Better Stack · 11h ago

WorldofAI · 12h ago

DIY Smart Code · 12h ago

Income stream surfers · 13h ago

OpenAI · 13h ago

OpenAI · 13h ago

OpenAI · 14h ago

OpenAI · 14h ago

Rob The AI Guy · 14h ago

DIY Smart Code · 14h ago

Cole Medin · 15h ago

Income stream surfers · 16h ago