Verantyx HLE β€” 4.6%

Fully LLM-free symbolic solver for Humanity's Last Exam (HLE) β€” no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.

Score

Split Score Method
Full 2500 questions 115/2500 = 4.6% atom_cross + knowledge_match + cross_decompose

Approach

Verantyx solves HLE through structural decomposition:

  1. Atom Extraction β€” Break questions and choices into atomic facts using 200+ regex patterns
  2. Wikipedia Knowledge β€” Fetch relevant articles as the sole knowledge source
  3. Cross-Decompose β€” Decompose each MCQ choice individually, cross-match against Wikipedia facts
  4. Atom Relation Classification β€” LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check)
  5. MCQε…¨ε•ε›žη­” (Always Answer) β€” HLE has no wrong-answer penalty; fallback uses best keyword overlap

Pipeline

Question β†’ Fact Atomizer β†’ Wikipedia Fetch β†’ Atom Cross Solver
                                              ↓
                                    Choice Scoring (supports/contradicts)
                                              ↓
                                    Best Choice or Keyword Fallback

Solver Components

Component Fires Description
cross_decompose 122 Per-choice decomposition + Wikipedia cross-match
knowledge_match 18 Direct atom-based knowledge matching
atom_cross fallback Normalized atom scoring with Wikipedia overlap

Properties

  • βœ… No LLM β€” zero language model inference (Qwen 7B fully removed)
  • βœ… No neural network β€” pure rule-based symbolic reasoning
  • βœ… No pattern detectors β€” DISABLE_PATTERN_DETECTORS=1
  • βœ… No concept boost β€” DISABLE_CONCEPT_BOOST=1
  • βœ… No wrong-answer penalty exploitation β€” MCQε…¨ε•ε›žη­” is valid since HLE scoring has no penalty
  • βœ… Wikipedia-only knowledge β€” no pre-trained embeddings or cached answers
  • βœ… Deterministic β€” same input always produces same output

Score History

Version Score Method
v1 (with LLM) 2.68% mcq_direct (Qwen 7B) + cross_decompose
v2 (LLM-free partial) 1.22% Early LLM removal, limited coverage
v4 (LLM-free full) 4.6% atom_cross + MCQε…¨ε•ε›žη­” + normalized scoring

Stats

Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired

Links

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train kofdai/Verantyx-hle-4.6

Evaluation results