12 results for "vram"
Nvidia RTX 5070 laptop GPU officially has 12GB of VRAM — and it’s about time
Nvidia has officially announced the RTX 5070 laptop GPU with 12GB of GDDR7 VRAM. This could be a huge win for mid-range gaming laptops.…
To 16GB VRAM users, plug in your old GPU
For those who want to run latest dense ~30b models and only have 16GB VRAM, if you have a old card with 6GB VRAM or more, plug it in. It matters that everything fits on the VRAM, even on 2 cards. Even…
VRAM.cpp: Running llama-fit-params directly in your browser
Lots of people are always asking on this subreddit if their system can run a certain model. A lot of the "VRAM calculators" that I've found only provide either very rough estimates or are severely lim…
Nvidia quietly launches 12GB RTX 5070 laptop GPU — midrange mobile gaming gets more VRAM amid the RAMpocalypse
The new model will use 3GB modules, so memory bandwidth should stay close to the RTX 5070 8GB mobile part.…
[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage
Config CtxSize: 131,072 GpuLayers: 99 CpuMoeLayers: 38 Threads: 16 BatchSize/UBatchSize: 4096/4096 CacheType K/V: q8_0 Tool Context: file mode (tools.kilocode.official.md) Metric M Model XL Model Diff…
Quant Qwen3.6-27B on 16GB VRAM with 100k context length
I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of i…
[7900XT] Qwen3.6 27B for OpenCode
I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far: llama-server --model models/Qwen3.6-27B-IQ4_XS.gguf…
AMG GPUs are faster at pre filling
I did give same prompt same document to 1660ti running Gemma 4 e2b q4 coz of the small vram and another to and igpu running Gemma 4 e4b q8 prefill rate before token generation was like 4-5 times faste…
Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!
A bit of context. I was coding up a little html tower defense game where you can alter the path by placing additional waypoints. My setup: 32gb ram with 16gb vram 5070 ti. Using AesSedai/Qwen3.6-35B-A…
Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!
Been using this for a few days. It is BY FAR the best uncensored model I have found for Qwen 3.6 35B. With IQ4XS, Q8 KVcache, 262K context, it fits in 24GB of VRAM and does not fail on multi turn tool…
Hardware Choice for 27b to 31b models.
I've come to a point where I find the 27b and 31b models quite impressive. I have a 16 GB AMD Radeon 7800xt. It performs quite well. It was $700. Here is my question: Is the dual GPU approach performa…
(Linux) Has anyone succeeded in using NVMe space as substitute RAM for larger models? Is it worthwhile?
So I have a consumer-grade AMD GPU with 24gb VRAM and 64gb DDR5 RAM which have served me well enough for models up to around 120B. Of course, this just isn't enough for larger models in the 300B+ rang…