social · source
r/LocalLLaMA on WeSearch
Recent social headlines from r/LocalLLaMA.
R/LOCALLLAMA
"Generate a photorealistic realtime render of a human face with webGL" (Qwen3.5-122B-A10B UD-Q3_K_XL)
R/LOCALLLAMA
MTP experiences on 7900xtx?
R/LOCALLLAMA
Grafting vision onto text models for fun and profit.
R/LOCALLLAMA
Are local models good enough yet for AI meeting memory?
R/LOCALLLAMA
llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp
R/LOCALLLAMA
The power of structured workflows and small local models
R/LOCALLLAMA
Developers who use local AI - Q4_0 vs Q8_0 KV quant?
R/LOCALLLAMA
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
R/LOCALLLAMA
How do I get the superfast DFlash / MTP tokens per second that I'm seeing on here? Dual 3090s
R/LOCALLLAMA
Dual GPU llama.cpp speedup
R/LOCALLLAMA
Convert With MPT Support?
R/LOCALLLAMA
Good candidate model to act as a PA
R/LOCALLLAMA
Is that was a right purchase for Qwen3.6 27/35
R/LOCALLLAMA
Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090
R/LOCALLLAMA
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face
R/LOCALLLAMA
Very happy with Qwen 3.5 122B output. But is slowness expected?
R/LOCALLLAMA
LeanLoop, the Tool Claude Leans on
R/LOCALLLAMA
"Elias Thorne" is what eight different LLMs name a lighthouse keeper. He's also selling cancer treatment advice on Amazon
R/LOCALLLAMA
Looking to migrate off of Ollama and LMStudio
R/LOCALLLAMA
Hardware Recommendations for realtime voice and a simple personal assistant/organisation agent.
R/LOCALLLAMA
Meet Ronald
R/LOCALLLAMA
webui: support video files as input by foldl · Pull Request #22830 · ggml-org/llama.cpp
R/LOCALLLAMA
How do I correct a memory that was retrieved without asking for any help from the backend team? (personal experience)
R/LOCALLLAMA
G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!
R/LOCALLLAMA
Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers
R/LOCALLLAMA
an alternative = similar experience to using windsurf but on local?
R/LOCALLLAMA
Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?
R/LOCALLLAMA
WSL can't reach Kobold.cpp running on Windows, even though the API works fine in PowerShell, SillyTavern & a Kenshi SentientSands Mod. Does anyone know the solution?
R/LOCALLLAMA
I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings
R/LOCALLLAMA
Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP
R/LOCALLLAMA
Best llama.cpp launch config for Qwen3.6 27B on RX 7800 XT (16 GB VRAM) for OpenClaw?
R/LOCALLLAMA
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!
R/LOCALLLAMA
Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs
R/LOCALLLAMA
How I started programming differently over the last year. What about you?
R/LOCALLLAMA
Corsair desktop PC with Ryzen 395 and 128GB of unified RAM, has anyone tested it for LLM? Seems "a good" price
R/LOCALLLAMA
Qwen 27b MTP Config, Llama.cpp Single 3090
R/LOCALLLAMA
Using Intel Arc Pro series, any thoughts ?
R/LOCALLLAMA
b9180 llama.ccp MTP landed
R/LOCALLLAMA
LLM Phone Home: Reliable Apps that can deliver inference from local backend
R/LOCALLLAMA
Extension idea: llama-server with custom samplers
R/LOCALLLAMA
Local speech to text for iOS using Apple Watch
R/LOCALLLAMA
I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash
R/LOCALLLAMA
A very important milestone for me in the AI field.
R/LOCALLLAMA
Built a 6x cheaper CodeRabbit alternative using open source models
R/LOCALLLAMA
Reduce your GPU power limit
R/LOCALLLAMA
When you run small LMM on RAM, dont use all Theards.
R/LOCALLLAMA