cnmoro
/

bert-hash-femto-mlm

Model card Files Files and versions

bert-hash-femto-mlm / README.md

cnmoro's picture

Create README.md

702e59f verified 2 months ago

|

history blame contribute delete

3.01 kB

	---
	language:
	- en
	- pt
	base_model:
	- NeuML/bert-hash-femto
	pipeline_tag: fill-mask
	---
	An experiment on the ridiculously tiny model [NeuML/bert-hash-femto](https://huggingface.co/NeuML/bert-hash-femto) for MLM/WWM:

	```python
	from transformers import AutoTokenizer
	from transformers import pipeline

	model_checkpoint = "cnmoro/bert-hash-femto-mlm"
	tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, trust_remote_code=True)

	mask_filler = pipeline(
	"fill-mask",
	model="bert-femto-wwm-final",
	tokenizer="bert-femto-wwm-final",
	trust_remote_code=True
	)

	result = mask_filler(
	f"predict the next {tokenizer.mask_token}", # EN
	top_k=10
	)
	print(result)
	# [
	# {'score': 0.00577017106115818, 'token': 2709, 'token_str': 'return', 'sequence': 'predict the next return'}, {'score': 0.004855221603065729, 'token': 1996, 'token_str': 'the', 'sequence': 'predict the next the'},
	# {'score': 0.004020849708467722, 'token': 13366, 'token_str': 'def', 'sequence': 'predict the next def'},
	# {'score': 0.0034529557451605797, 'token': 2065, 'token_str': 'if', 'sequence': 'predict the next if'},
	# {'score': 0.002598398830741644, 'token': 5164, 'token_str': 'string', 'sequence': 'predict the next string'}, {'score': 0.002328184898942709, 'token': 2013, 'token_str': 'from', 'sequence': 'predict the next from'},
	# {'score': 0.0022902805358171463, 'token': 2193, 'token_str': 'number', 'sequence': 'predict the next number'}, {'score': 0.0021068700589239597, 'token': 2171, 'token_str': 'name', 'sequence': 'predict the next name'},
	# {'score': 0.0020654958207160234, 'token': 1997, 'token_str': 'of', 'sequence': 'predict the next of'},
	# {'score': 0.001995558850467205, 'token': 12324, 'token_str': 'import', 'sequence': 'predict the next import'}
	# ]

	result = mask_filler(
	f"prever o próximo {tokenizer.mask_token}", # PTBR
	top_k=10
	)
	print(result)
	# [
	# {'score': 0.10908675193786621, 'token': 10861, 'token_str': 'que', 'sequence': 'prever o proximo que'},
	# {'score': 0.07482825964689255, 'token': 2139, 'token_str': 'de', 'sequence': 'prever o proximo de'},
	# {'score': 0.050046466290950775, 'token': 7861, 'token_str': 'em', 'sequence': 'prever o proximo em'},
	# {'score': 0.029970934614539146, 'token': 11498, 'token_str': 'para', 'sequence': 'prever o proximo para'},
	# {'score': 0.0208846777677536, 'token': 18609, 'token_str': 'como', 'sequence': 'prever o proximo como'},
	# {'score': 0.016479674726724625, 'token': 16137, 'token_str': 'mas', 'sequence': 'prever o proximo mas'},
	# {'score': 0.01430923119187355, 'token': 4830, 'token_str': 'da', 'sequence': 'prever o proximo da'},
	# {'score': 0.013099807314574718, 'token': 18499, 'token_str': 'por', 'sequence': 'prever o proximo por'},
	# {'score': 0.012438337318599224, 'token': 21934, 'token_str': 'sim', 'sequence': 'prever o proximo sim'},
	# {'score': 0.012361743487417698, 'token': 6583, 'token_str': 'na', 'sequence': 'prever o proximo na'}
	# ]
	```