social · source
r/LocalLLaMA on WeSearch
Recent social headlines from r/LocalLLaMA.
R/LOCALLLAMA
how do you decide between q4 and q5 on a 70b when 24gb is the cap?
R/LOCALLLAMA
Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great
R/LOCALLLAMA
New local model reaching near frontier on PII removal at 9 ms CPU inference
R/LOCALLLAMA
Need Help - What would you build? Air-gapped NL assistant that is integrated with Splunk
R/LOCALLLAMA
Update on 12x32gb sxm v100 cluster / local AI for legal drafting
R/LOCALLLAMA
Anyone use QwQ-32B? It's over a year old? Has Qwen 3.6 27b basically replaced it?
R/LOCALLLAMA
Server build for local inference. 128 gb 3200 or 256 gb 2133mhz RAM?
R/LOCALLLAMA
CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp
R/LOCALLLAMA
Locally-hosted language-learning AI you can talk to comparable to Pingo AI?
R/LOCALLLAMA
Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge)
R/LOCALLLAMA
Whats the best Qwen 27B Q8 quant?
R/LOCALLLAMA
Best coding model on RTX 3060
R/LOCALLLAMA
Llama.cpp : Split Mode Tensor Fix Incoming?
R/LOCALLLAMA
(Yet Another) KV cache calculator - kvanta.vcerny.cz
R/LOCALLLAMA
Sharing my 'Local-LLM-Toolkit' repo
R/LOCALLLAMA
Save Safetensor LLM from C#
R/LOCALLLAMA
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
R/LOCALLLAMA
The Financial Times has published an article about Heretic
R/LOCALLLAMA
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
R/LOCALLLAMA
Want Built a React-style looping agent with small LLMs (Qwen 3.5 9B / Gemma4) + LangGraph?
R/LOCALLLAMA
Are GPU prices hitting peak and falling?
R/LOCALLLAMA
llama.cpp oom issue
R/LOCALLLAMA
How local AI improved your live?
R/LOCALLLAMA
I pioneered AI slop in 2019 with my Tensorflow rig. (24GB back then, too.) AMA.
R/LOCALLLAMA
Please give me your best tips for fine tuning RTX Pro 6000 on Intel i7-14700KF
R/LOCALLLAMA
I built a computer use sandbox framework for codex on headless linux. GPU passthrough, computer use, and sudo access for codex all work. It's the perfect dev sandbox to allow full auto work while minimizing the "rm -rf /" risk
R/LOCALLLAMA
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro
R/LOCALLLAMA
Next year we're getting 0.5T model from Grok
R/LOCALLLAMA
I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop
R/LOCALLLAMA
server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp
R/LOCALLLAMA
NVIDIA Jetson AGX Orin 64GB
R/LOCALLLAMA
Qwen 3.6 benchmarks on 2x RTX PRO 6000
R/LOCALLLAMA
It was fun while it lasted... They're advertising now.
R/LOCALLLAMA
1000 tps generation on Qwen3.6 27B with V100s
R/LOCALLLAMA
Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead
R/LOCALLLAMA
llama.cpp has a clever trick for speeding up KV cache decode
R/LOCALLLAMA
Could someone please help explain these results?
R/LOCALLLAMA
opensource music reccomendation / playlist, similar to spotify radio / YT music mix?
R/LOCALLLAMA
how to install llamacpp the better way to wrapping it in python ui (CPU use only) ?
R/LOCALLLAMA
Qwen 3.6 27B MTP speed on 3080ti (getting 4.5 t/s)
R/LOCALLLAMA
hipEngine: Fast Native Qwen 3.6 Inference for RDNA3 (Strix Halo, 7900 XTX)
R/LOCALLLAMA
Could Open Models be trained to secretly go rogue?
R/LOCALLLAMA
Generative Recursive Education: Creating Custom Interactive Textbooks on the Fly.
R/LOCALLLAMA
What frontend do you guys use?
R/LOCALLLAMA
Can someone help me understand MCP?
R/LOCALLLAMA
magic incantation to get llama-bench to work with MTP ?
R/LOCALLLAMA
Need Help Choosing a Harness for Qwen 3.6 27B
R/LOCALLLAMA
Is NVIDIA still the default best choice for local LLMs in 2026?
R/LOCALLLAMA
What is the smallest amount of RAM sufficient to run any available on HF GGUF LLM model locally?
R/LOCALLLAMA