WeSearch
Hub / Search / transformer architecture
SEARCH · TRANSFORMER ARCHITECTURE

Results for "transformer architecture".

5 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

5 results for "transformer architecture"

ARXIV CS.AI

Learning to Rotate: Temporal and Semantic Rotary Encoding for Sequential Modeling

Every Transformer architecture dedicates enormous capacity to learning rich representations in semantic embedding space -- yet the rotation manifold acted upon by Rotary Positional Embeddings (RoPE) h…

· 2 views
ARXIV CS.AI

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…

· 3 views
ARXIV CS.AI

MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

Transformer architectures, including nnFormer,have demonstrated promising results in volumetric medical image segmentation by being able to capture long-range spatial interactions. Although they have …

· 4 views
ARXIV.ORG

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

Multimodal affective computing analyzes user-generated social media content to predict emotional states. However, a critical gap remains in understanding how visual content shapes cognitive interpreta…

· 3 views
MACHINE LEARNING

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the …

· 8 views