VIDRAFT_LAB's picture

In a Training Loop 🔄

VIDRAFT_LAB

SeaWolf-AI

·

AI & ML interests

None yet

Recent Activity

repliedto their post 17 minutes ago

🧬 Darwin V6: Diagnostic-Guided Evolutionary Model Merging We are releasing Darwin-31B-Opus — a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine. Model: https://huggingface.co/FINAL-Bench/Darwin-31B-Opus Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus 🔬 What Darwin V6 Does Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding. Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio. combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6 final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust) When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss. 🧬 Parent Models Father: google/gemma-4-31B-it Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill 🧬 Results Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode): Father: 60.0% (30/50) Darwin-31B-Opus: 66.0% (33/50) — +10% relative improvement ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions) Optimal genome found by evolution: ffn_ratio=0.93 — FFN layers strongly favor Mother (Claude Opus Distill) block_5 (L50-L59)=0.86 and more...

repliedto their post 21 minutes ago

🧬 Darwin V6: Diagnostic-Guided Evolutionary Model Merging We are releasing Darwin-31B-Opus — a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine. Model: https://huggingface.co/FINAL-Bench/Darwin-31B-Opus Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus 🔬 What Darwin V6 Does Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding. Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio. combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6 final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust) When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss. 🧬 Parent Models Father: google/gemma-4-31B-it Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill 🧬 Results Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode): Father: 60.0% (30/50) Darwin-31B-Opus: 66.0% (33/50) — +10% relative improvement ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions) Optimal genome found by evolution: ffn_ratio=0.93 — FFN layers strongly favor Mother (Claude Opus Distill) block_5 (L50-L59)=0.86 and more...

reacted to theirpost with 👀 about 2 hours ago

🧬 Darwin V6: Diagnostic-Guided Evolutionary Model Merging We are releasing Darwin-31B-Opus — a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine. Model: https://huggingface.co/FINAL-Bench/Darwin-31B-Opus Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus 🔬 What Darwin V6 Does Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding. Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio. combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6 final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust) When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss. 🧬 Parent Models Father: google/gemma-4-31B-it Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill 🧬 Results Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode): Father: 60.0% (30/50) Darwin-31B-Opus: 66.0% (33/50) — +10% relative improvement ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions) Optimal genome found by evolution: ffn_ratio=0.93 — FFN layers strongly favor Mother (Claude Opus Distill) block_5 (L50-L59)=0.86 and more...

View all activity

Organizations

published an article 8 days ago

Article

"The Child That Surpassed Both Parents Through MRI-Guided Evolutionary Merge"

8 days ago

•

14

published an article 9 days ago

Article

Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models

9 days ago

•

13

published an article 28 days ago

Article

🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do

28 days ago

•

38

published an article 30 days ago

Article

MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning

30 days ago

•

15

published an article about 1 month ago

Article

Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework

Mar 8

•

12

published an article about 1 month ago

Article

Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism?

Feb 24

•

17

published an article about 2 months ago

Article

FINAL Bench: The Real Bottleneck to AGI Is Self-Correction

Feb 21

•

20