ssu-project
/

OLMo-2-1124-7B-Instruct-am-ssu

Model card Files Files and versions

OLMo-2-1124-7B-Instruct-am-ssu / README.md

atsuki-yamaguchi's picture

atsuki-yamaguchi

Upload README.md with huggingface_hub

be7c434 verified 3 months ago

|

history blame contribute delete

1.54 kB


	---
	license: apache-2.0
	datasets:
	- allenai/MADLAD-400
	language:
	- am
	base_model:
	- allenai/OLMo-2-1124-7B-Instruct
	---
	# OLMo 2 1124 7B Instruct for Amharic: SSU-Wanda

	This model is built on top of OLMo 2 1124 7B Instruct adapted for Amharic using 200M target language tokens sampled from MADLAD-400. The model is adapted using the SSU-Wanda approach (i.e., selecting parameters to update by column based on the aggregated Wanda scores).

	## Model Description

	- Language: Amharic
	- License: Apache 2.0
	- Fine-tuned from model: [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct)


	## Model Sources

	- Repository: https://github.com/gucci-j/ssu
	- Paper: https://arxiv.org/abs/2512.04844


	## How to Get Started with the Model
	Use the code below to get started with the model.
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained(
	"ssu-project/OLMo-2-1124-7B-Instruct-am-ssu"
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"ssu-project/OLMo-2-1124-7B-Instruct-am-ssu"
	)
	```


	## Citation
	```
	@misc{yamaguchi2025mitigatingcatastrophicforgettingtarget,
	title={Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates},
	author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
	year={2025},
	eprint={2512.04844},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.04844},
	}
	```