5 stories tagged with #llm-inference, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Llm Inference"
Built a local LLM inference engine on CachyOS — runs faster than llama.cpp on my 9070 XT
Hey folks, we've been hacking on a Vulkan-based LLM engine the last few weeks, figured I'd share since I'm running it exclusively on CachyOS with Mesa RADV. It's called VulkanForge…
VulkanForge – 14 MB Vulkan LLM engine that runs native FP8 models on AMD (Rust)
interfernece in rust and vulkan. Contribute to maeddesg/vulkanforge development by creating an account on GitHub.…
[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost
vLLM-Compile: Bringing Compiler Optimizations to LLM Inference
vLLM-compile: Bringing Compiler Optimizations to LLM Inference Luka Govedič vLLM Committer Senior Machine Learning Engineer, Red Hat 1…
Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card
Source Article excerpt: With a single PCIe card — powered by six HTX301 chips and 384 GB of memory — enterprises can now run 700B-parameter model inference locally at just ~240W pe…