14 stories tagged with #cpp, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Cpp"
Highlights from the spring economic update, including CPP contribution cuts and new sports funding
convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp
NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcript…
CPPIB among investors looking to sell down stakes in India’s NSE IPO, sources say
Shareholders including the Canadian pension plan, LIC, SBI, Temasek Holdings and Morgan Stanley will offload a 5% stake, according to the sources…
VRAM.cpp: Running llama-fit-params directly in your browser
Lots of people are always asking on this subreddit if their system can run a certain model. A lot of the "VRAM calculators" that I've found only provide either very rough estimates…
Intel B70: LLama.ccp SYCL vs LLama.cpp OpenVino vs LLM-Scaler
In case anyone is interested, I decided to test out LLama.cpp's new OpenVino backend to see how it compares on Intel GPUs. At first glance, it stomps all over the previous best-cas…
mesa PR with 37-130% llama.cpp pp perf gain for vulkan on Linux on Intel Xe2
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model Big claims from Qwen about their latest open weight model: Qwen3.6-27B delivers flagship-level agentic coding performance, s…
Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big.
UPDATE: Vulkan benches arew now included. And yes, I used AI to help me write this post. As a life-long Windows user (don't hate me, I was exposed to it at a young age) I was wonde…
llama.cpp DeepSeek v4 Flash experimental inference
Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even qu…
Will llama.cpp multislot improve speed?
I've heard mostly bad opinions about multiple slots with llama.cpp (--parallel > 1). I guess comparing to vLLM it might be worse at this, but I recently tried vLLM on 4 slots and i…
Experts-Volunteers needed for Vulkan on ik_llama.cpp
ik_llama.cpp is great for both CPU & CUDA. Need legends to make Vulkan better as well. So, after bringing the Vulkan back-end up to speed some time ago, I felt that I simply don't …
I have officially retired from Emacs
Comments…
CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp
CUDA prompt processing speedup on MoE check this…
FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally
Both llama.cpp and ik_llama.cpp now have FP4 support — but with different flavors worth knowing about. llama.cpp recently merged NVFP4 (Nvidia's block-scaled FP4, `GGML_TYPE_NVFP4 …