Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
magibu 's Collections
Pretrain Datasets
papers
Ekip karışık verileri
Fine-tuned LLMs
Turkish Language Healthcare Datasets

Pretrain Datasets

updated about 1 month ago

Datasets we use for pretraining large language models

Upvote
-

  • omarkamali/wikipedia-monthly

    Updated about 21 hours ago • 4.11k • 51

  • alibayram/hukuk_soru_cevap

    Viewer • Updated Nov 6, 2024 • 2.08k • 17 • 13

  • umutertugrul/turkish-hospital-medical-articles

    Viewer • Updated Oct 2, 2025 • 24.6k • 56 • 8

  • umutertugrul/turkish-medical-articles

    Viewer • Updated Oct 2, 2025 • 42.8k • 9 • 3

  • alibayram/tr-books

    Viewer • Updated Dec 17, 2025 • 3.7k • 3

  • selimfirat/bilkent-turkish-writings-dataset

    Viewer • Updated May 24, 2025 • 25.1k • 64 • 8

  • umutertugrul/turkish-academic-theses-dataset

    Viewer • Updated Aug 18, 2025 • 649k • 64 • 8

  • alibayram/onedio_haberler

    Viewer • Updated Jun 18, 2024 • 66.7k • 6 • 5

  • habanoz/news-tr-1.8M

    Viewer • Updated Oct 6, 2024 • 1.85M • 93 • 7

  • alibayram/hepsiburada_yorumlar

    Viewer • Updated Jun 18, 2024 • 2.66M • 13 • 13

  • alibayram/kitapyurdu_yorumlar

    Viewer • Updated Jun 18, 2024 • 405k • 30

  • alibayram/beyazperde_yorumlar

    Viewer • Updated Jun 18, 2024 • 192k • 13 • 5

  • BILGEM-AI/BILGE-Synthetic-Stories

    Viewer • Updated Nov 20, 2025 • 2.87M • 373 • 5
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs