AISI whitebox evaluations

government

None defined yet.

aisi-whitebox 's collections 7

aisi-whitebox/wmdp_bio_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 1k • 12
aisi-whitebox/wmdp_bio_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 1k • 9
aisi-whitebox/wmdp_chem_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 816 • 10
aisi-whitebox/wmdp_chem_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 816 • 6

aisi-whitebox/wmdp_bio_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7
aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 816 • 6
aisi-whitebox/mmlu_0_shot_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7
aisi-whitebox/wmdp_bio_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7

aisi-whitebox/mo_v1_training_set

Viewer • Updated May 6 • 7.86k • 20
aisi-whitebox/mo_v2_training_set

Viewer • Updated May 6 • 7.86k • 16
aisi-whitebox/mo_v1_and_v2_training_set

Viewer • Updated May 6 • 11.8k • 9
aisi-whitebox/sandbagging_balance_mix_mo_v1_malicious_75pct

Viewer • Updated May 7 • 10k • 13

aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_mmlu_0_shot_unfiltered

Viewer • Updated Apr 3 • 1k • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_bio_unfiltered

Viewer • Updated Apr 8 • 64 • 5
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_chem_unfiltered

Viewer • Updated Apr 8 • 64 • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_cyber_unfiltered

Viewer • Updated Apr 8 • 64 • 8

aisi-whitebox/wmdp_bio_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 6
aisi-whitebox/wmdp_chem_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 10
aisi-whitebox/mmlu_0_shot_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 5
aisi-whitebox/mmlu_0_shot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 13

aisi-whitebox/mo_v1x_filtered_training_set_no_gibberish_15pct

Viewer • Updated May 22 • 3.87k • 7
aisi-whitebox/mo1xd_checkpoint_14_CyberMetric-2000_cot

Viewer • Updated May 22 • 200 • 10
aisi-whitebox/mo1xd_checkpoint_14_mmlu_0_shot_cot

Viewer • Updated May 22 • 200 • 14
aisi-whitebox/mo1xd_checkpoint_14_ARC-Challenge_cot

Viewer • Updated May 22 • 199 • 14

aisi-whitebox/sec_qa_v1_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 220 • 6
aisi-whitebox/wmdp_cyber_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 1k • 7
aisi-whitebox/arc_challenge_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 1k • 8
aisi-whitebox/wmdp_chem_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 816 • 6

aisi-whitebox/wmdp_bio_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 1k • 12
aisi-whitebox/wmdp_bio_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 1k • 9
aisi-whitebox/wmdp_chem_cot_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 816 • 10
aisi-whitebox/wmdp_chem_new_merged_mo2_6epc_finetuned_sandbagging_follow_up_q

Viewer • Updated May 27 • 816 • 6

aisi-whitebox/wmdp_bio_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 6
aisi-whitebox/wmdp_chem_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 10
aisi-whitebox/mmlu_0_shot_cot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 5
aisi-whitebox/mmlu_0_shot_mo1_mo2_experiments_mo1_final_15_85_no_gibberish_follow_up_q

Viewer • Updated May 27 • 500 • 13

aisi-whitebox/wmdp_bio_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7
aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 816 • 6
aisi-whitebox/mmlu_0_shot_cot_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7
aisi-whitebox/wmdp_bio_prompted_sandbagging_llama_31_8b_instruct_follow_up_q

Viewer • Updated May 26 • 1k • 7

aisi-whitebox/mo_v1x_filtered_training_set_no_gibberish_15pct

Viewer • Updated May 22 • 3.87k • 7
aisi-whitebox/mo1xd_checkpoint_14_CyberMetric-2000_cot

Viewer • Updated May 22 • 200 • 10
aisi-whitebox/mo1xd_checkpoint_14_mmlu_0_shot_cot

Viewer • Updated May 22 • 200 • 14
aisi-whitebox/mo1xd_checkpoint_14_ARC-Challenge_cot

Viewer • Updated May 22 • 199 • 14

aisi-whitebox/mo_v1_training_set

Viewer • Updated May 6 • 7.86k • 20
aisi-whitebox/mo_v2_training_set

Viewer • Updated May 6 • 7.86k • 16
aisi-whitebox/mo_v1_and_v2_training_set

Viewer • Updated May 6 • 11.8k • 9
aisi-whitebox/sandbagging_balance_mix_mo_v1_malicious_75pct

Viewer • Updated May 7 • 10k • 13

aisi-whitebox/sec_qa_v1_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 220 • 6
aisi-whitebox/wmdp_cyber_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 1k • 7
aisi-whitebox/arc_challenge_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 1k • 8
aisi-whitebox/wmdp_chem_cot_finetuned_sandbagging_llama_31_8b_instruct

Viewer • Updated Apr 24 • 816 • 6

aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_mmlu_0_shot_unfiltered

Viewer • Updated Apr 3 • 1k • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_bio_unfiltered

Viewer • Updated Apr 8 • 64 • 5
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_chem_unfiltered

Viewer • Updated Apr 8 • 64 • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_cyber_unfiltered

Viewer • Updated Apr 8 • 64 • 8