Hub / social / r/LocalLLaMA

social · source

r/LocalLLaMA on WeSearch

Recent social headlines from r/LocalLLaMA.

GPU VRAM only for small models with llama.cpp: is it possible?

5/24/2026 · 29 views

Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review

5/24/2026 · 36 views

Qwen3.6-35B-A3B vs Gemma4-26B-A4B

5/24/2026 · 31 views

Measuring AI intelligence vs Human intelligence

5/24/2026 · 30 views

gemma 4 e2b quality degrades after ~30-40 continuous inferences on 4gb vram?

5/24/2026 · 30 views

Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)

5/24/2026 · 32 views

Frustrating results with product searching

5/24/2026 · 28 views

Why not dynamic active parameters (and other questions for the knowledgeable)

5/24/2026 · 22 views

Choosing an abliterated version of Gemma 4 31B and 26B-A4B

5/24/2026 · 42 views

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP

5/24/2026 · 32 views

I built a local GUI for the TradingAgents framework — works with Ollama

5/24/2026 · 34 views

Anyone down to test this? Just uploaded a model using rys

5/24/2026 · 32 views

TTS Benchmark Comparison (all known TTS up until May 2026)

5/24/2026 · 31 views

Performance When Offloading Large Models to System RAM?

5/24/2026 · 32 views

How are you all handling agents and sub agents?

5/24/2026 · 29 views

Is there any reason for an uncensored model if you have no interest in roleplaying?

5/24/2026 · 33 views

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA

5/24/2026 · 31 views

minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

5/24/2026 · 29 views

llampart 1.0.0 - I released a standalone local web UI for llama-server with translations, extended settings and a polished conversation sidebar

5/24/2026 · 41 views

How to keep up to date on latest models?

5/23/2026 · 28 views

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.)

5/23/2026 · 27 views

Local model doing accounting tasks

5/23/2026 · 34 views

MLID claims nova lake-ax not cancelled just renamed razor lake-ax

5/23/2026 · 30 views

For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc)

5/23/2026 · 42 views

Command A+ (218B MoE) running on Apple Silicon — MLX port, PR open

5/23/2026 · 35 views

Inference provider tiers by Cache-hit rates, using openrouter data

5/23/2026 · 42 views

Any reason to run dense over MOE for RAGs?

5/23/2026 · 33 views

$16 refactor, 400 steps, 95% routed to open MoE

5/23/2026 · 35 views

7900XTX idle power draw when running headless?

5/23/2026 · 25 views

Local, low code, node based agentic development workspace... that actually works?

5/23/2026 · 42 views

Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image

5/23/2026 · 41 views

found this little known channel with some really good content

5/23/2026 · 27 views

First AI to Beat Every Human in a Programming Competition - Agentic GRPO Explained

5/23/2026 · 31 views

Have we passed the peak of inflated expectations?

5/23/2026 · 24 views

DGX Spark agentic usage numbers

5/23/2026 · 23 views

Best open-source & proprietary options for Indic language ASR

5/23/2026 · 37 views

LLaMa.cpp basic question

5/23/2026 · 38 views

Gemma4 26b a4b Apex quant is quite good

5/23/2026 · 34 views

Gemma is so much better than Qwen, prove me wrong

5/23/2026 · 26 views

G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!

5/23/2026 · 34 views

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

5/22/2026 · 27 views

NVIDIA Removes Gaming Revenue Category From Financial Reports

5/22/2026 · 26 views

How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)

5/22/2026 · 33 views

If one .gguf makes it past the great filter, humanity survives in some way.

5/22/2026 · 34 views

Seeking resources to read about llama.cpp server and how offloading works

5/22/2026 · 42 views

OpenBMB presents the model BitCPM-CANN 1.58 bit

5/22/2026 · 32 views

Holding machine upgrade waiting for a model?

5/22/2026 · 25 views

Quick note on sudden performance loss when running GGUFs

5/22/2026 · 27 views

ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster

5/22/2026 · 28 views

New Release of ROCm based MLX LLM Engine - lemon-mlx-engine

5/22/2026 · 38 views

How WeSearch handles this source

WeSearch's declared handling of r/LocalLLaMA's content. Indexing, snippets, summaries, retrieval and training are separate questions — see the rights registry or read this source's machine-readable record.

Indexing: Allowed Snippet: Allowed AI summary: Limited Retrieval / RAG: Not asserted Model training: Not asserted Commercial reuse: Not permitted

More social sources

r/programming r/webdev r/typescript r/javascript r/Python r/rust r/golang r/cpp r/csharp r/java r/elixir r/haskell r/ruby r/PHP r/reactjs r/vuejs r/sveltejs r/node

Visit r/LocalLLaMA directly →