social · source
r/LocalLLaMA on WeSearch
Recent social headlines from r/LocalLLaMA.
R/LOCALLLAMA
GPU VRAM only for small models with llama.cpp: is it possible?
R/LOCALLLAMA
Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review
R/LOCALLLAMA
Qwen3.6-35B-A3B vs Gemma4-26B-A4B
R/LOCALLLAMA
Measuring AI intelligence vs Human intelligence
R/LOCALLLAMA
gemma 4 e2b quality degrades after ~30-40 continuous inferences on 4gb vram?
R/LOCALLLAMA
Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)
R/LOCALLLAMA
Frustrating results with product searching
R/LOCALLLAMA
Why not dynamic active parameters (and other questions for the knowledgeable)
R/LOCALLLAMA
Choosing an abliterated version of Gemma 4 31B and 26B-A4B
R/LOCALLLAMA
Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP
R/LOCALLLAMA
I built a local GUI for the TradingAgents framework — works with Ollama
R/LOCALLLAMA
Anyone down to test this? Just uploaded a model using rys
R/LOCALLLAMA
TTS Benchmark Comparison (all known TTS up until May 2026)
R/LOCALLLAMA
Performance When Offloading Large Models to System RAM?
R/LOCALLLAMA
How are you all handling agents and sub agents?
R/LOCALLLAMA
Is there any reason for an uncensored model if you have no interest in roleplaying?
R/LOCALLLAMA
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA
R/LOCALLLAMA
minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL
R/LOCALLLAMA
llampart 1.0.0 - I released a standalone local web UI for llama-server with translations, extended settings and a polished conversation sidebar
R/LOCALLLAMA
How to keep up to date on latest models?
R/LOCALLLAMA
llama.cpp server have built-in native tools (exec_shell, edit_file, etc.)
R/LOCALLLAMA
Local model doing accounting tasks
R/LOCALLLAMA
MLID claims nova lake-ax not cancelled just renamed razor lake-ax
R/LOCALLLAMA
For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc)
R/LOCALLLAMA
Command A+ (218B MoE) running on Apple Silicon — MLX port, PR open
R/LOCALLLAMA
Inference provider tiers by Cache-hit rates, using openrouter data
R/LOCALLLAMA
Any reason to run dense over MOE for RAGs?
R/LOCALLLAMA
$16 refactor, 400 steps, 95% routed to open MoE
R/LOCALLLAMA
7900XTX idle power draw when running headless?
R/LOCALLLAMA
Local, low code, node based agentic development workspace... that actually works?
R/LOCALLLAMA
Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image
R/LOCALLLAMA
found this little known channel with some really good content
R/LOCALLLAMA
First AI to Beat Every Human in a Programming Competition - Agentic GRPO Explained
R/LOCALLLAMA
Have we passed the peak of inflated expectations?
R/LOCALLLAMA
DGX Spark agentic usage numbers
R/LOCALLLAMA
Best open-source & proprietary options for Indic language ASR
R/LOCALLLAMA
LLaMa.cpp basic question
R/LOCALLLAMA
Gemma4 26b a4b Apex quant is quite good
R/LOCALLLAMA
Gemma is so much better than Qwen, prove me wrong
R/LOCALLLAMA
G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!
R/LOCALLLAMA
Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps
R/LOCALLLAMA
NVIDIA Removes Gaming Revenue Category From Financial Reports
R/LOCALLLAMA
How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)
R/LOCALLLAMA
If one .gguf makes it past the great filter, humanity survives in some way.
R/LOCALLLAMA
Seeking resources to read about llama.cpp server and how offloading works
R/LOCALLLAMA
OpenBMB presents the model BitCPM-CANN 1.58 bit
R/LOCALLLAMA
Holding machine upgrade waiting for a model?
R/LOCALLLAMA
Quick note on sudden performance loss when running GGUFs
R/LOCALLLAMA
ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster
R/LOCALLLAMA