A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 33
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 11 days ago • 73
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality 11 days ago • 29
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Paper • 2102.01672 • Published Feb 2, 2021 • 1
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 46
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 119
view article Article Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture 27 days ago • 36
view article Article NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI 26 days ago • 60
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published Dec 31, 2025 • 63
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 410
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 90