social · source
r/LocalLLaMA on WeSearch
Recent social headlines from r/LocalLLaMA.
R/LOCALLLAMA
I’ve done it!!! FINALLY I have become a (quasi-local) summoner!!! AMA [imtiredboss.jpg]
R/LOCALLLAMA
Low-level coding dataset
R/LOCALLLAMA
Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc..
R/LOCALLLAMA
What model weights (quantized included) under 150GB have the best general knowledge depth?
R/LOCALLLAMA
When your LLM treats data center GPUs like an optional DLC
R/LOCALLLAMA
Latest b9274 Addresses MTP VRAM leak
R/LOCALLLAMA
Waiting for Qwen 3.7 open weight... The new King has arrived...
R/LOCALLLAMA
Gorgon Halo is 6.7% faster than predecessor Strix Halo
R/LOCALLLAMA
Strix Halo 128GB vs M5 pro 64GB
R/LOCALLLAMA
Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
R/LOCALLLAMA
110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp
R/LOCALLLAMA
Open-source LLMs are still weak against long reasoning jailbreaks, even with lightweight defenses
R/LOCALLLAMA
Model Golf for some Runpod Credits!
R/LOCALLLAMA
Back again, many changes have taken place.
R/LOCALLLAMA
How can you stop your model from looping
R/LOCALLLAMA
"AWS secures rare Mac Studios while ordinary Apple customers remain completely locked out"
R/LOCALLLAMA
Guide to building smoltorrent | A Distributed ML Checkpoint Storage System
R/LOCALLLAMA
What small speech to text (STT) model is best at recognizing whispered speech?
R/LOCALLLAMA
Gemma 4 MTP with LlamaCPP
R/LOCALLLAMA
Impulse Purchase.
R/LOCALLLAMA
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room
R/LOCALLLAMA
Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]
R/LOCALLLAMA
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
R/LOCALLLAMA
LM Studio finally added support for MTP Speculative Decoding
R/LOCALLLAMA
Claude Code plugins a risk to local ecosystem?
R/LOCALLLAMA
anyone else spending more time managing ai markdown files than actually coding?
R/LOCALLLAMA
Carbon: Decoding the Language of Life
R/LOCALLLAMA
Llama-server and MTP
R/LOCALLLAMA
Qwen is cooking hard
R/LOCALLLAMA
We have sub-agents at home
R/LOCALLLAMA
Why might MTP be net negative for tool heavy agentic flows?
R/LOCALLLAMA
Is there any <3B model with usable 200k+ context window?
R/LOCALLLAMA
How many GPUs do you have on your local system/server/AI PC?
R/LOCALLLAMA
favorite Agentic Coding Harness
R/LOCALLLAMA
Still happy for yall
R/LOCALLLAMA
Is the llama.cpp nixos flake just broken?
R/LOCALLLAMA
MTP (Multi-Token Prediction): 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro
R/LOCALLLAMA
Qwen cant wait to release 3.7 models
R/LOCALLLAMA
Qwen 35b a3b surprises me
R/LOCALLLAMA
Hopes and dreams for Google IO tomorrow? 👀
R/LOCALLLAMA
What happens to local LLM if/when LLMs are no longer released for free?
R/LOCALLLAMA
I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.
R/LOCALLLAMA
Quantizing MTP KV Cache = free lunch?
R/LOCALLLAMA
GGUF with MTP vs MLX without. Is mlx still the way to go for mac users?
R/LOCALLLAMA
New models when? Forecasting release date.
R/LOCALLLAMA
The Lurk Report - The last 30 days of r/LocalLLaMA
R/LOCALLLAMA
Is anyone prioritizing code quality checks via a small local model?
R/LOCALLLAMA
Big new memory tool with local benchmarks
R/LOCALLLAMA
I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how
R/LOCALLLAMA