Sabır-60M (Turkish Micro Language Model)

Sabır-60M, sıfırdan Türkçe olarak eğitilmiş, yaklaşık 60 milyon parametreli, kompakt bir dil modelidir. Adını, standart 'erken durdurma' kuralları yerine, eğitim verisindeki dilsel kalıpları derinlemesine öğrenene kadar "sabırla" eğitilmesinden alır.

Sabır-60M is a compact, 60-million-parameter language model trained from scratch exclusively in Turkish. Its name, "Sabır" (Patience), reflects its training methodology, which involved patiently continuing the training process to allow for deep pattern internalization, rather than adhering to conventional early-stopping rules.

🇹🇷 Modelin Felsefesi

Bu modelin geliştirme felsefesi, sınırlı bir veri setinde (yaklaşık 1.5M token) bir dil modelinin akıcılık potansiyelini sonuna kadar zorlamaktır. Standart 'erken durdurma' (early stopping) yöntemleri yerine, modelin eğitim verisindeki dilsel kalıpları derinlemesine öğrenmesi hedeflenmiştir. Sonuç, kendi eğitim alanında yüksek tutarlılığa sahip, akıcı metinler üretebilen bir modeldir.

🇬🇧 Model Philosophy

The development philosophy behind Sabır-20M was to push the limits of fluency on a constrained dataset (~1.5M tokens). Instead of conventional early stopping, the training was intentionally prolonged to allow the model to deeply internalize the linguistic patterns of its training data. The result is a model that exhibits high fidelity and fluency within its specific domain.

⚙️ Teknik Özellikler / Technical Specifications

Mimari / Architecture: Custom NanoGPT (Decoder-Only Transformer)
Parametre / Parameters: ~59.63 Million
Veri / Data: ~1.5M Tokens (Turkish Dialogues)
Context Window: 256 Tokens
Tokenizer: Custom SentencePiece (BPE)
Vocab Size: 8,000

🚀 Nasıl Kullanılır / How to Use

Bu model özel bir mimari kullandığı için, aşağıda sağlanan kod ile çalıştırılmalıdır.

Since this model uses a custom architecture, it must be run with the provided code snippet below.

# Gerekli kütüphaneleri yükle / Install required libraries
# pip install torch sentencepiece huggingface_hub safetensors

import torch
import torch.nn as nn
from torch.nn import functional as F
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
import sentencepiece as spm

# --- 1. MODEL MİMARİSİ VE CONFIG (EĞİTİM KODUYLA BİREBİR AYNI) ---
# --- 1. MODEL ARCHITECTURE & CONFIG (EXACTLY AS IN TRAINING SCRIPT) ---
class ModelConfig:
    n_layer = 10
    n_embd = 640
    n_head = 10
    block_size = 256
    vocab_size = 8000
    dropout = 0.1

config = ModelConfig()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

class Head(nn.Module):
    def __init__(self, head_size):
        super().__init__()
        self.key = nn.Linear(config.n_embd, head_size, bias=False)
        self.query = nn.Linear(config.n_embd, head_size, bias=False)
        self.value = nn.Linear(config.n_embd, head_size, bias=False)
        self.register_buffer('tril', torch.tril(torch.ones(config.block_size, config.block_size)))
        self.dropout = nn.Dropout(config.dropout)
    def forward(self, x):
        B, T, C = x.shape
        k, q, v = self.key(x), self.query(x), self.value(x)
        wei = q @ k.transpose(-2, -1) * (C ** -0.5)
        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf'))
        wei = F.softmax(wei, dim=-1)
        wei = self.dropout(wei)
        return wei @ v

class MultiHeadAttention(nn.Module):
    def __init__(self, num_heads, head_size):
        super().__init__()
        self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])
        self.proj = nn.Linear(config.n_embd, config.n_embd)
        self.dropout = nn.Dropout(config.dropout)
    def forward(self, x):
        out = torch.cat([h(x) for h in self.heads], dim=-1)
        return self.dropout(self.proj(out))

class FeedForward(nn.Module):
    def __init__(self, n_embd):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(n_embd, 4 * n_embd), nn.ReLU(), nn.Dropout(config.dropout), nn.Linear(4 * n_embd, n_embd), nn.Dropout(config.dropout))
    def forward(self, x): return self.net(x)

class Block(nn.Module):
    def __init__(self, n_embd, n_head):
        super().__init__()
        head_size = n_embd // n_head
        self.sa = MultiHeadAttention(n_head, head_size)
        self.ffwd = FeedForward(n_embd)
        self.ln1, self.ln2 = nn.LayerNorm(n_embd), nn.LayerNorm(n_embd)
    def forward(self, x):
        x = x + self.sa(self.ln1(x))
        x = x + self.ffwd(self.ln2(x))
        return x

class MyLanguageModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.token_embedding_table = nn.Embedding(config.vocab_size, config.n_embd)
        self.position_embedding_table = nn.Embedding(config.block_size, config.n_embd)
        self.blocks = nn.Sequential(*[Block(config.n_embd, n_head=config.n_head) for _ in range(config.n_layer)])
        self.ln_f = nn.LayerNorm(config.n_embd)
        self.lm_head = nn.Linear(config.n_embd, config.vocab_size)
        self.dropout = nn.Dropout(config.dropout)
    def forward(self, idx, targets=None):
        B, T = idx.shape
        tok_emb = self.token_embedding_table(idx)
        pos_emb = self.position_embedding_table(torch.arange(T, device=device))
        x = self.dropout(tok_emb + pos_emb)
        x = self.blocks(x)
        x = self.ln_f(x)
        logits = self.lm_head(x)
        loss = None
        if targets is not None:
            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
        return logits, loss

# --- 2. MODELİ VE TOKENIZER'I YÜKLE ---
# --- 2. LOAD MODEL AND TOKENIZER ---
REPO_ID = "jetbabareal/Sabir-60M"  # Kendi kullanıcı adını ve model adını yaz / Your username and model name
model = MyLanguageModel().to(device)
weights_path = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
model.load_state_dict(load_file(weights_path))
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename="tokenizer.model")
tokenizer = spm.SentencePieceProcessor(model_file=tokenizer_path)
model.eval()
print("Model ve Tokenizer başarıyla yüklendi. / Model and Tokenizer loaded successfully.")

# --- 3. METİN ÜRETME FONKSİYONU ---
# --- 3. TEXT GENERATION FUNCTION ---
def generate_text(prompt, max_new_tokens=100, temperature=0.5, top_k=20):
    full_prompt = f"Kullanıcı: {prompt}\nModel: "
    input_ids = tokenizer.encode(full_prompt)
    idx = torch.tensor(input_ids, dtype=torch.long, device=device).unsqueeze(0)
    
    
    generated_ids = []

    for _ in range(max_new_tokens):
        idx_cond = idx[:, -config.block_size:]
        with torch.no_grad():
            logits, _ = model(idx_cond)
        logits = logits[:, -1, :] / temperature
        if top_k is not None:
            v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
            logits[logits < v[:, [-1]]] = -float('Inf')
        probs = F.softmax(logits, dim=-1)
        idx_next = torch.multinomial(probs, num_samples=1)
        
       
        generated_ids.append(idx_next.item())
        
        
        decoded_so_far = tokenizer.decode(generated_ids)
        if "Kullanıcı:" in decoded_so_far or "Model:" in decoded_so_far:
            
            generated_ids = generated_ids[:-1]
            break

        if idx_next.item() == tokenizer.eos_id(): 
            break
            
        idx = torch.cat((idx, idx_next), dim=1)

    response = tokenizer.decode(generated_ids)
    return response.strip()

# --- ÖRNEK KULLANIM / EXAMPLE USAGE ---
soru = "Nasılsın?"
cevap = generate_text(soru)
print(f"Soru: {soru}\nCevap: {cevap}")

soru = "En sevdiğin renk ne?"
cevap = generate_text(soru)
print(f"Soru: {soru}\nCevap: {cevap}")

🎯 Sınırlamalar ve Kullanım Amacı / Limitations and Intended Use

Uzmanlık Alanı: Model, eğitim verisindeki kalıpları ve bilgileri yeniden üretme konusunda uzmanlaşmıştır. Temel sohbet ve metin tamamlama görevleri için uygundur.
Genelleme: Karmaşık mantıksal çıkarım veya eğitim verisi dışında kalan konularda genelleme yapma yeteneği sınırlıdır.
Kullanım Senaryosu: Bu model, düşük kaynaklı ortamlarda özel amaçlı dil modeli deneyleri yapmak, temel bir sohbet botu prototipi oluşturmak veya Türkçe NLP araştırmaları için bir başlangıç noktası olarak kullanılabilir.

📜 Atıf Bilgisi / Citation

Bu modeli çalışmalarınızda kullanırsanız, lütfen aşağıdaki gibi atıfta bulunun:

If you use this model in your work, please cite it as follows:

@misc{sabir60m,
    title  = {Sabir-60M: A Turkish Micro Language Model},
    author = {jetbabareal},
    year   = {2025},
    url    = {https://huggingface.co/jetbabareal/Sabir-60M}
}

Downloads last month: 23

Space using jetbabareal/Sabir-60M 1

Collection including jetbabareal/Sabir-60M

Sabır LLM

Collection

Turkish Micro Language Model • 2 items • Updated 3 days ago