3 results for "mamba model"
ARXIV CS.AI
The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…
ARXIV.ORG
AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting
Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global per…
VERCEL
Disaggregated Serving for Hybrid SSM Models in vLLM
Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…