attention and long context
updated
Efficient Streaming Language Models with Attention Sinks
Paper
• 2309.17453
• Published • 14
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published • 31
allenai/longformer-base-4096
Updated • 1.11M
• 223
google/bigbird-roberta-base
Updated • 214k
• 61
Fill-Mask
• Updated • 1.44k
Yukang/Llama-2-7b-longlora-100k-ft
Text Generation
• Updated • 859
• 52
Updated • 26.4k
• 50
RRWKV: Capturing Long-range Dependencies in RWKV
Paper
• 2306.05176
• Published
Retentive Network: A Successor to Transformer for Large Language Models
Paper
• 2307.08621
• Published • 173
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
• 2302.10866
• Published • 7
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution
Paper
• 2306.15794
• Published • 18
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Paper
• 2212.14052
• Published • 1
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
• 2310.01889
• Published • 13
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published • 89
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
• 2303.09752
• Published • 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
• 2112.07916
• Published • 2
Investigating Efficiently Extending Transformers for Long Input
Summarization
Paper
• 2208.04347
• Published
Train Short, Test Long: Attention with Linear Biases Enables Input
Length Extrapolation
Paper
• 2108.12409
• Published • 5
Text Generation
• Updated • 11.8k
• 435
NousResearch/Yarn-Mistral-7b-128k
Text Generation
• Updated • 1.72k
• 570
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published • 81
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116