Hub / social / r/LocalLLaMA

social · source

r/LocalLLaMA on WeSearch

Recent social headlines from r/LocalLLaMA.

how do you decide between q4 and q5 on a 70b when 24gb is the cap?

5/26/2026 · 24 views

Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great

5/26/2026 · 33 views

New local model reaching near frontier on PII removal at 9 ms CPU inference

5/26/2026 · 35 views

Need Help - What would you build? Air-gapped NL assistant that is integrated with Splunk

5/25/2026 · 34 views

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

5/25/2026 · 25 views

Anyone use QwQ-32B? It's over a year old? Has Qwen 3.6 27b basically replaced it?

5/25/2026 · 35 views

Server build for local inference. 128 gb 3200 or 256 gb 2133mhz RAM?

5/25/2026 · 33 views

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

5/25/2026 · 29 views

Locally-hosted language-learning AI you can talk to comparable to Pingo AI?

5/25/2026 · 31 views

Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge)

5/25/2026 · 38 views

Whats the best Qwen 27B Q8 quant?

5/25/2026 · 28 views

Best coding model on RTX 3060

5/25/2026 · 23 views

Llama.cpp : Split Mode Tensor Fix Incoming?

5/25/2026 · 41 views

(Yet Another) KV cache calculator - kvanta.vcerny.cz

5/25/2026 · 29 views

Sharing my 'Local-LLM-Toolkit' repo

5/25/2026 · 37 views

Save Safetensor LLM from C#

5/25/2026 · 38 views

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

5/25/2026 · 33 views

The Financial Times has published an article about Heretic

5/25/2026 · 34 views

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

5/25/2026 · 34 views

Want Built a React-style looping agent with small LLMs (Qwen 3.5 9B / Gemma4) + LangGraph?

5/25/2026 · 37 views

Are GPU prices hitting peak and falling?

5/25/2026 · 32 views

llama.cpp oom issue

5/25/2026 · 35 views

How local AI improved your live?

5/25/2026 · 23 views

I pioneered AI slop in 2019 with my Tensorflow rig. (24GB back then, too.) AMA.

5/25/2026 · 34 views

Please give me your best tips for fine tuning RTX Pro 6000 on Intel i7-14700KF

5/25/2026 · 34 views

I built a computer use sandbox framework for codex on headless linux. GPU passthrough, computer use, and sudo access for codex all work. It's the perfect dev sandbox to allow full auto work while minimizing the "rm -rf /" risk

5/25/2026 · 34 views

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

5/25/2026 · 32 views

Next year we're getting 0.5T model from Grok

5/25/2026 · 31 views

I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop

5/25/2026 · 36 views

server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

5/25/2026 · 37 views

NVIDIA Jetson AGX Orin 64GB

5/25/2026 · 27 views

Qwen 3.6 benchmarks on 2x RTX PRO 6000

5/25/2026 · 29 views

It was fun while it lasted... They're advertising now.

5/25/2026 · 25 views

1000 tps generation on Qwen3.6 27B with V100s

5/25/2026 · 28 views

Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead

5/25/2026 · 34 views

llama.cpp has a clever trick for speeding up KV cache decode

5/25/2026 · 40 views

Could someone please help explain these results?

5/25/2026 · 27 views

opensource music reccomendation / playlist, similar to spotify radio / YT music mix?

5/25/2026 · 33 views

how to install llamacpp the better way to wrapping it in python ui (CPU use only) ?

5/25/2026 · 41 views

Qwen 3.6 27B MTP speed on 3080ti (getting 4.5 t/s)

5/24/2026 · 36 views

hipEngine: Fast Native Qwen 3.6 Inference for RDNA3 (Strix Halo, 7900 XTX)

5/24/2026 · 34 views

Could Open Models be trained to secretly go rogue?

5/24/2026 · 33 views

Generative Recursive Education: Creating Custom Interactive Textbooks on the Fly.

5/24/2026 · 40 views

What frontend do you guys use?

5/24/2026 · 30 views

Can someone help me understand MCP?

5/24/2026 · 18 views

magic incantation to get llama-bench to work with MTP ?

5/24/2026 · 22 views

Need Help Choosing a Harness for Qwen 3.6 27B

5/24/2026 · 37 views

Is NVIDIA still the default best choice for local LLMs in 2026?

5/24/2026 · 38 views

What is the smallest amount of RAM sufficient to run any available on HF GGUF LLM model locally?

5/24/2026 · 32 views

BitCPM-CANN: Native 1.58-Bit Large Language Model Training on Ascend NPU

5/24/2026 · 25 views

How WeSearch handles this source

WeSearch's declared handling of r/LocalLLaMA's content. Indexing, snippets, summaries, retrieval and training are separate questions — see the rights registry or read this source's machine-readable record.

Indexing: Allowed Snippet: Allowed AI summary: Limited Retrieval / RAG: Not asserted Model training: Not asserted Commercial reuse: Not permitted

More social sources

r/programming r/webdev r/typescript r/javascript r/Python r/rust r/golang r/cpp r/csharp r/java r/elixir r/haskell r/ruby r/PHP r/reactjs r/vuejs r/sveltejs r/node

Visit r/LocalLLaMA directly →