Quantifying the Carbon Emissions of Machine Learning
Paper
•
1910.09700
•
Published
•
29
Evaluation Environmental Inpact
This model was obtained by finetuning the open source Llama-3.2-1B-Instruct model on the mlabonne/orpo-dpo-mix-40k dataset, leveraging Odds Ratio Preference Optimization (ORPO) for Reinforcement Learning.
This model is optimized for general-purpose language tasks.
We used the Eulether test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed.
For a more granular evaluation on MMLU, please see Section MMLU.
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| hellaswag | 1 | none | 0 | acc | ↑ | 0.4507 | ± | 0.0050 |
| none | 0 | acc_norm | ↑ | 0.6077 | ± | 0.0049 | ||
| arc_easy | 1 | none | 0 | acc | ↑ | 0.6856 | ± | 0.0095 |
| none | 0 | acc_norm | ↑ | 0.6368 | ± | 0.0099 | ||
| mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
| - humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
| - other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
| - social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
| - stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 |
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
| - humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
| - formal_logic | 1 | none | 0 | acc | ↑ | 0.3254 | ± | 0.0419 |
| - high_school_european_history | 1 | none | 0 | acc | ↑ | 0.6182 | ± | 0.0379 |
| - high_school_us_history | 1 | none | 0 | acc | ↑ | 0.5784 | ± | 0.0347 |
| - high_school_world_history | 1 | none | 0 | acc | ↑ | 0.6540 | ± | 0.0310 |
| - international_law | 1 | none | 0 | acc | ↑ | 0.6033 | ± | 0.0447 |
| - jurisprudence | 1 | none | 0 | acc | ↑ | 0.5370 | ± | 0.0482 |
| - logical_fallacies | 1 | none | 0 | acc | ↑ | 0.4479 | ± | 0.0391 |
| - moral_disputes | 1 | none | 0 | acc | ↑ | 0.4711 | ± | 0.0269 |
| - moral_scenarios | 1 | none | 0 | acc | ↑ | 0.3408 | ± | 0.0159 |
| - philosophy | 1 | none | 0 | acc | ↑ | 0.5177 | ± | 0.0284 |
| - prehistory | 1 | none | 0 | acc | ↑ | 0.5278 | ± | 0.0278 |
| - professional_law | 1 | none | 0 | acc | ↑ | 0.3683 | ± | 0.0123 |
| - world_religions | 1 | none | 0 | acc | ↑ | 0.5906 | ± | 0.0377 |
| - other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
| - business_ethics | 1 | none | 0 | acc | ↑ | 0.4300 | ± | 0.0498 |
| - clinical_knowledge | 1 | none | 0 | acc | ↑ | 0.4642 | ± | 0.0307 |
| - college_medicine | 1 | none | 0 | acc | ↑ | 0.3815 | ± | 0.0370 |
| - global_facts | 1 | none | 0 | acc | ↑ | 0.3200 | ± | 0.0469 |
| - human_aging | 1 | none | 0 | acc | ↑ | 0.5157 | ± | 0.0335 |
| - management | 1 | none | 0 | acc | ↑ | 0.5243 | ± | 0.0494 |
| - marketing | 1 | none | 0 | acc | ↑ | 0.6709 | ± | 0.0308 |
| - medical_genetics | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
| - miscellaneous | 1 | none | 0 | acc | ↑ | 0.6015 | ± | 0.0175 |
| - nutrition | 1 | none | 0 | acc | ↑ | 0.5686 | ± | 0.0284 |
| - professional_accounting | 1 | none | 0 | acc | ↑ | 0.3511 | ± | 0.0285 |
| - professional_medicine | 1 | none | 0 | acc | ↑ | 0.5625 | ± | 0.0301 |
| - virology | 1 | none | 0 | acc | ↑ | 0.4157 | ± | 0.0384 |
| - social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
| - econometrics | 1 | none | 0 | acc | ↑ | 0.2456 | ± | 0.0405 |
| - high_school_geography | 1 | none | 0 | acc | ↑ | 0.5606 | ± | 0.0354 |
| - high_school_government_and_politics | 1 | none | 0 | acc | ↑ | 0.5389 | ± | 0.0360 |
| - high_school_macroeconomics | 1 | none | 0 | acc | ↑ | 0.4128 | ± | 0.0250 |
| - high_school_microeconomics | 1 | none | 0 | acc | ↑ | 0.4454 | ± | 0.0323 |
| - high_school_psychology | 1 | none | 0 | acc | ↑ | 0.6183 | ± | 0.0208 |
| - human_sexuality | 1 | none | 0 | acc | ↑ | 0.5420 | ± | 0.0437 |
| - professional_psychology | 1 | none | 0 | acc | ↑ | 0.4167 | ± | 0.0199 |
| - public_relations | 1 | none | 0 | acc | ↑ | 0.5000 | ± | 0.0479 |
| - security_studies | 1 | none | 0 | acc | ↑ | 0.5265 | ± | 0.0320 |
| - sociology | 1 | none | 0 | acc | ↑ | 0.6468 | ± | 0.0338 |
| - us_foreign_policy | 1 | none | 0 | acc | ↑ | 0.6900 | ± | 0.0465 |
| - stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 | |
| - abstract_algebra | 1 | none | 0 | acc | ↑ | 0.2500 | ± | 0.0435 |
| - anatomy | 1 | none | 0 | acc | ↑ | 0.4889 | ± | 0.0432 |
| - astronomy | 1 | none | 0 | acc | ↑ | 0.5329 | ± | 0.0406 |
| - college_biology | 1 | none | 0 | acc | ↑ | 0.4931 | ± | 0.0418 |
| - college_chemistry | 1 | none | 0 | acc | ↑ | 0.3800 | ± | 0.0488 |
| - college_computer_science | 1 | none | 0 | acc | ↑ | 0.3300 | ± | 0.0473 |
| - college_mathematics | 1 | none | 0 | acc | ↑ | 0.2800 | ± | 0.0451 |
| - college_physics | 1 | none | 0 | acc | ↑ | 0.2451 | ± | 0.0428 |
| - computer_security | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
| - conceptual_physics | 1 | none | 0 | acc | ↑ | 0.4383 | ± | 0.0324 |
| - electrical_engineering | 1 | none | 0 | acc | ↑ | 0.5310 | ± | 0.0416 |
| - elementary_mathematics | 1 | none | 0 | acc | ↑ | 0.2884 | ± | 0.0233 |
| - high_school_biology | 1 | none | 0 | acc | ↑ | 0.4935 | ± | 0.0284 |
| - high_school_chemistry | 1 | none | 0 | acc | ↑ | 0.3645 | ± | 0.0339 |
| - high_school_computer_science | 1 | none | 0 | acc | ↑ | 0.4500 | ± | 0.0500 |
| - high_school_mathematics | 1 | none | 0 | acc | ↑ | 0.2815 | ± | 0.0274 |
| - high_school_physics | 1 | none | 0 | acc | ↑ | 0.3113 | ± | 0.0378 |
| - high_school_statistics | 1 | none | 0 | acc | ↑ | 0.3657 | ± | 0.0328 |
| - machine_learning | 1 | none | 0 | acc | ↑ | 0.2768 | ± | 0.0425 |
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Base model
meta-llama/Llama-3.2-1B-Instruct