-
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Paper • 1502.01852 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 9 -
Focal Loss for Dense Object Detection
Paper • 1708.02002 • Published -
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Paper • 2409.20537 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2503.10622
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 37 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 174 -
Transformers without Normalization
Paper • 2503.10622 • Published • 171 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders
Paper • 2503.03601 • Published • 232 -
Transformers without Normalization
Paper • 2503.10622 • Published • 171 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Paper • 2503.11647 • Published • 145
-
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Paper • 1502.01852 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 9 -
Focal Loss for Dense Object Detection
Paper • 1708.02002 • Published -
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
Paper • 2409.20537 • Published • 14
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 37 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 174 -
Transformers without Normalization
Paper • 2503.10622 • Published • 171 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders
Paper • 2503.03601 • Published • 232 -
Transformers without Normalization
Paper • 2503.10622 • Published • 171 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Paper • 2503.11647 • Published • 145