23 results for "deepseek"
DeepSeek mystery: who is speaking for start-up as CEO Liang Wenfeng remains out of sight?
Researcher Chen Deli is emerging as DeepSeek’s new public face as speculation over the whereabouts of the company’s founder and CEO lingers.…
No GGUFs for DeepSeek V4-Flash as yet?
Wondering why there aren't any "name brand" (like unsloth, bartowski) GGUFs as yet for DeepSeek V4 Flash?…
China's DeepSeek slashes prices for new AI model - Reuters
China's DeepSeek slashes prices for new AI model Reuters…
DeepSeek V4 - almost on the frontier, a fraction of the price
Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December . They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-…
llama.cpp DeepSeek v4 Flash experimental inference
Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at 2 bit, lo…
Decreased Intelligence Density in DeepSeek V4 Pro
In the V3.2 paper, they mentioned: Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., more tokens) to match the output quality of mode…
DeepSeek V4 Update
DeepSeek V4 Update…
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles
We are thrilled to announce Day-0 support for DeepSeek-V4 across both inference and RL training. SGLang and Miles form the first open-source stack to serve and train DeepSeek-V4 on launch day — with s…
Deepseek v4 pricing is genuinely silly, did the math and now i am questioning my entire stack
A 3D Flappy Bird side-scroller game built with DeepSeek V4 Pro
100M tokens for $2.65 (Deepseek V4 Pro)
DeepSeek Unveils Newest Flagship AI Model a Year After Upending Silicon Valley
China’s DeepSeek rolls out a long-anticipated update of its AI model - AP News
Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…
Deepseek Vision Coming
From Xiaokang Chen on 𝕏:…
Kimi K2.6 vs DeepSeek V4 Pro
DeepSeek temporarily slashing prices on V4-Pro by 75%
DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5
anyone actually tried deepseek v4 pro for coding?
so v4 pro dropped and barely anyone is talking about it. feels weird since when kimi k2.6 came out i seen post about it everywhere anyone here tried v4 pro for actual code work? hows it compare to k2.…
The exact KV cache usage of DeepSeek V4
Figure 1 of DSV4 paper seems to imply that DSV3.2 uses ~50GB at 1m context and DSV4 uses ~5GB: ***Numbers updated with the KV cache breakdown from vllm*** From my own calculations, the correct FP16 KV…
US State Department upgrades AI theft accusations to target China AI companies
US State Department says China is stealing US intellectual property US AI models are being ‘distilled’ to produce cheaper models for China Deepseek, Moonshot AI and MiniMax accused of alleged theft……
LLM Budget Guard – open-source runtime cutoff for OpenAI/Anthropic
Alerts won't stop a runaway agent at 3 AM. Budget Guard enforces hard token cutoffs across OpenAI, Anthropic & DeepSeek before bans or surprise invoices.…
Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols
As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and for…
Why do only big ML labs dominate widely-used models despite many open-source pretrained models smaller labs could do RL on? [D]
I’m trying to understand why models from major labs (GPT, Claude, etc.) dominate real-world usage? You might say it's due to the expensive pretraining compute budge, but there already exists many pret…