Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

versae
/
scandinavian-tokenizer

Model card Files Files and versions
xet
Community
scandinavian-tokenizer
33.1 GB
  • 1 contributor
History: 3 commits
versae's picture
versae
Reduce vocab size to 32000
fa9ab39 almost 2 years ago
  • texts
    Scandi+English tokenizer on OSCAR almost 2 years ago
  • .gitattributes
    1.92 kB
    Scandi+English tokenizer on OSCAR almost 2 years ago
  • README.md
    28 Bytes
    initial commit almost 2 years ago
  • special_tokens_map.json
    96 Bytes
    Scandi+English tokenizer on OSCAR almost 2 years ago
  • tokenizer.json
    1.4 MB
    Reduce vocab size to 32000 almost 2 years ago
  • tokenizer_config.json
    1.13 kB
    Scandi+English tokenizer on OSCAR almost 2 years ago
  • train_tokenizer.py
    5.97 kB
    Scandi+English tokenizer on OSCAR almost 2 years ago