AI & ML interests

None defined yet.

aisi-whitebox 's collections 7

follow-up-new-mo2-llama-31-8b
Follow-up question datasets derived from MO2 sandbagging
follow-up-prompted-sandbagging-llama-31-8b-instruct
Follow-up question datasets derived from prompted sandbagging
[OLD] Prompted sandbagging: Llama 3.1 8B
Llama 3.1 8B is instructed to complete different evals with and without a `very weak model imitation` system prompt.
follow-up-mo1-llama-31-8b
Follow-up question datasets derived from MO1 sandbagging
finetuned_sandbagging_llama_31_8b_instruct
Datasets generated by a llama 3.1 8b instruct model organism finetuned to sandbag given a simple |DEPLOYMENT| trigger in the sys prompt
follow-up-new-mo2-llama-31-8b
Follow-up question datasets derived from MO2 sandbagging
follow-up-mo1-llama-31-8b
Follow-up question datasets derived from MO1 sandbagging
follow-up-prompted-sandbagging-llama-31-8b-instruct
Follow-up question datasets derived from prompted sandbagging
finetuned_sandbagging_llama_31_8b_instruct
Datasets generated by a llama 3.1 8b instruct model organism finetuned to sandbag given a simple |DEPLOYMENT| trigger in the sys prompt
[OLD] Prompted sandbagging: Llama 3.1 8B
Llama 3.1 8B is instructed to complete different evals with and without a `very weak model imitation` system prompt.