-
TinyLlama: An Open-Source Small Language Model
Paper ⢠2401.02385 ⢠Published ⢠95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper ⢠2401.15024 ⢠Published ⢠74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper ⢠2401.16380 ⢠Published ⢠50
Collections
Discover the best community collections!
Collections including paper arxiv:2201.11903
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠105 -
Language Models are Few-Shot Learners
Paper ⢠2005.14165 ⢠Published ⢠18 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14 -
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠77
-
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Paper ⢠2311.07590 ⢠Published ⢠17 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper ⢠2311.07989 ⢠Published ⢠26 -
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Paper ⢠2311.08877 ⢠Published ⢠7 -
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Paper ⢠2312.12436 ⢠Published ⢠15
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ⢠2401.02038 ⢠Published ⢠65 -
Learning To Teach Large Language Models Logical Reasoning
Paper ⢠2310.09158 ⢠Published ⢠1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ⢠2311.00176 ⢠Published ⢠9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ⢠2308.09583 ⢠Published ⢠7
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ⢠2309.03883 ⢠Published ⢠35 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ⢠2106.09685 ⢠Published ⢠54 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ⢠2309.07870 ⢠Published ⢠42 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ⢠2309.00267 ⢠Published ⢠52
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ⢠1907.11692 ⢠Published ⢠9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ⢠1910.01108 ⢠Published ⢠21
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Paper ⢠2310.01352 ⢠Published ⢠7 -
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper ⢠2203.11171 ⢠Published ⢠5 -
MemGPT: Towards LLMs as Operating Systems
Paper ⢠2310.08560 ⢠Published ⢠8 -
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Paper ⢠2310.06117 ⢠Published ⢠2
-
Contrastive Chain-of-Thought Prompting
Paper ⢠2311.09277 ⢠Published ⢠36 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14 -
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠77 -
System 2 Attention (is something you might need too)
Paper ⢠2311.11829 ⢠Published ⢠44
-
Retentive Network: A Successor to Transformer for Large Language Models
Paper ⢠2307.08621 ⢠Published ⢠172 -
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper ⢠2303.12712 ⢠Published ⢠4 -
GPT-4 Technical Report
Paper ⢠2303.08774 ⢠Published ⢠7 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14
-
TinyLlama: An Open-Source Small Language Model
Paper ⢠2401.02385 ⢠Published ⢠95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper ⢠2401.15024 ⢠Published ⢠74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper ⢠2401.16380 ⢠Published ⢠50
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ⢠1907.11692 ⢠Published ⢠9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ⢠1910.01108 ⢠Published ⢠21
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠105 -
Language Models are Few-Shot Learners
Paper ⢠2005.14165 ⢠Published ⢠18 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14 -
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠77
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Paper ⢠2310.01352 ⢠Published ⢠7 -
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper ⢠2203.11171 ⢠Published ⢠5 -
MemGPT: Towards LLMs as Operating Systems
Paper ⢠2310.08560 ⢠Published ⢠8 -
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Paper ⢠2310.06117 ⢠Published ⢠2
-
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Paper ⢠2311.07590 ⢠Published ⢠17 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper ⢠2311.07989 ⢠Published ⢠26 -
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Paper ⢠2311.08877 ⢠Published ⢠7 -
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Paper ⢠2312.12436 ⢠Published ⢠15
-
Contrastive Chain-of-Thought Prompting
Paper ⢠2311.09277 ⢠Published ⢠36 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14 -
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠77 -
System 2 Attention (is something you might need too)
Paper ⢠2311.11829 ⢠Published ⢠44
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ⢠2401.02038 ⢠Published ⢠65 -
Learning To Teach Large Language Models Logical Reasoning
Paper ⢠2310.09158 ⢠Published ⢠1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ⢠2311.00176 ⢠Published ⢠9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ⢠2308.09583 ⢠Published ⢠7
-
Retentive Network: A Successor to Transformer for Large Language Models
Paper ⢠2307.08621 ⢠Published ⢠172 -
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper ⢠2303.12712 ⢠Published ⢠4 -
GPT-4 Technical Report
Paper ⢠2303.08774 ⢠Published ⢠7 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ⢠2201.11903 ⢠Published ⢠14
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ⢠2309.03883 ⢠Published ⢠35 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ⢠2106.09685 ⢠Published ⢠54 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ⢠2309.07870 ⢠Published ⢠42 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ⢠2309.00267 ⢠Published ⢠52