https://alignmentpretraining.ai — Read our paper for additional details about our data and models
Geodesic Research
Team
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
LoRA adapters for studying emergent misalignment on the SFM models
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 264 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 52 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 484 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 278
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 299 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 98 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 2 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 1.58k • 2
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 52 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 16 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 66 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 186
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated • 17 -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated • 18 -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 17 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 9
https://alignmentpretraining.ai — Read our paper for additional details about our data and models
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 299 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 98 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 2 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 1.58k • 2
LoRA adapters for studying emergent misalignment on the SFM models
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 52 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 16 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 66 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 186
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 264 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 52 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 484 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 278
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated • 17 -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated • 18 -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 17 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 9