Sengil
/

ytu-bert-base-dissonance-tr

@@ -1,171 +1,111 @@
 ---
 library_name: transformers
 tags:
-- Aspect Term Extraction
 - transformers
-- t5
 language:
 - tr
 metrics:
-- micro-f1
 base_model:
-- Turkish-NLP/t5-efficient-base-turkish
-pipeline_tag: text2text-generation
 ---
-# **Sengil/t5-turkish-aspect-term-extractor** 🇹🇷
-A Turkish sequence-to-sequence model based on `Turkish-NLP/t5-efficient-base-turkish`, fine-tuned for **Aspect Term Extraction (ATE)** from customer reviews and sentences.
-Given a Turkish sentence, the model generates a list of **aspect terms** (e.g., *kahve*, *servis*, *fiyatlar*) that reflect the primary discussed entities or features.
----
-## ✨ Example
 ```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 import torch
-import re
-from collections import Counter
-#LOAD MODEL
-MODEL_ID = "Sengil/t5-turkish-aspect-term-extractor"
-DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
-model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(DEVICE)
-model.eval()
-TURKISH_STOPWORDS = {
-    "ve", "çok", "ama", "bir", "bu", "daha", "gibi", "ile", "için",
-    "de", "da", "ki", "o", "şu", "bu", "sen", "biz", "siz", "onlar"
-}
-def is_valid_aspect(word):
-    word = word.strip().lower()
-    return (
-        len(word) > 1 and
-        word not in TURKISH_STOPWORDS and
-        word.isalpha()
-    )
-def extract_and_rank_aspects(text, max_tokens=64, beams=5):
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE)
-    with torch.no_grad():
-        outputs = model.generate(
-            input_ids=inputs["input_ids"],
-            attention_mask=inputs["attention_mask"],
-            max_new_tokens=max_tokens,
-            num_beams=beams,
-            num_return_sequences=beams,
-            early_stopping=True
-        )
-    all_predictions = [
-        tokenizer.decode(output, skip_special_tokens=True)
-        for output in outputs
-    ]
-    all_terms = []
-    for pred in all_predictions:
-        candidates = re.split(r"[;,–—\-]|(?:\s*,\s*)", pred)
-        all_terms.extend([w.strip().lower() for w in candidates if is_valid_aspect(w)])
-    ranked = Counter(all_terms).most_common()
-    return ranked
-#INFERENCE
-text = "Artılar: Göl manzarasıyla harika bir atmosfer, Ipoh'un her zaman sıcak olan havası nedeniyle iyi bir klima olan restoran, iyi ve hızlı hizmet sunan garsonlar, temassız ödeme kabul eden e-cüzdan, ücretsiz otopark ama sıcak güneş altında açık, yemeklerin tadı güzel."
-ranked_aspects = extract_and_rank_aspects(text)
-print("Sorted Aspect Terms:")
-for term, score in ranked_aspects:
-    print(f"{term:<15}  skor: {score}")
-````
-**Output:**
 ```
-Sorted Aspect Terms:
-atmosfer         skor: 1
-servis           skor: 1
-restoran         skor: 1
-hizmet           skor: 1
 ```
----
-## 📌 Model Details
-| Detail               | Value                                        |
-| -------------------- | -------------------------------------------- |
-| **Model Type**       | `AutoModelForSeq2SeqLM` (T5-style)           |
-| **Base Model**       | `Turkish-NLP/t5-efficient-base-turkish`      |
-| **Languages**        | `tr` (Turkish)                               |
-| **Fine-tuning Task** | Aspect Term Extraction (sequence generation) |
-| **Framework**        | 🤗 Transformers                              |
-| **License**          | Apache-2.0                                   |
-| **Tokenizer**        | SentencePiece (T5-style)                     |
----
-## 📊 Dataset & Training
-* Total samples: 37,000+ Turkish review sentences
-* Input: Raw sentence (e.g., `"Pilav çok lezzetliydi ama servis yavaştı."`)
-* Target: Comma-separated aspect terms (e.g., `"pilav, servis"`)
-### Training Configuration
-| Setting               | Value              |
-| --------------------- | ------------------ |
-| **Epochs**            | 3                  |
-| **Batch size**        | 8                  |
-| **Max input length**  | 128 tokens         |
-| **Max output length** | 64 tokens          |
-| **Optimizer**         | AdamW              |
-| **Learning rate**     | 3e-5               |
-| **Scheduler**         | Linear             |
-| **Precision**         | FP32               |
-| **Hardware**          | 1× Tesla T4 / P100 |
----
-### 🔍 Evaluation
-The model was evaluated using exact-match micro-F1 score on a held-out test set.
-| Metric          | Score |
-| --------------- | ----: |
-| **Micro-F1**    | 0.84+ |
-| **Exact Match** | \~78% |
----
-## 💡 Use Cases
-* 💬 Opinion mining in Turkish product or service reviews
-* 🧾 Aspect-level sentiment analysis preprocessing
-* 📊 Feature-based review summarization in NLP pipelines
----
-## 📦 Model Card / Citation
-```bibtex
-@misc{Sengil2025T5AspectTR,
-  title   = {Sengil/t5-turkish-aspect-term-extractor: Turkish Aspect Term Extraction with T5},
   author  = {Şengil, Mert},
   year    = {2025},
-  url     = {https://huggingface.co/Sengil/t5-turkish-aspect-term-extractor}
 }
 ```
 ---
-For contributions, improvements, or issue reporting, feel free to open a GitHub/Hugging Face issue or contact **[Mert Şengil](https://www.linkedin.com/in/mertsengil/)**.

 ---
 library_name: transformers
 tags:
+- Dissonant Detection
 - transformers
+- bert
 language:
 - tr
 metrics:
+- accuracy
 base_model:
+- ytu-ce-cosmos/turkish-base-bert-uncased
+pipeline_tag: text-classification
 ---
+# **Sengil/ytu-bert-base-dissonance-tr** 🇹🇷
+A Turkish BERT-based model fine-tuned for three-way sentiment classification on single-sentence discourse.
+This model categorizes input sentences into one of the following classes:
+**Dissonance:** The sentence contains conflicting or contradictory sentiments
+&nbsp;&nbsp;&nbsp;&nbsp;_e.g.,_ "Telefon çok kaliteli ve hızlı bitiyor şarjı"
+**Consonance:** The sentence expresses harmonizing or mutually reinforcing sentiments
+&nbsp;&nbsp;&nbsp;&nbsp;_e.g.,_ "Yemeklerde çok güzel manzarada mükemmel"
+**Neither:** The sentence is neutral or does not clearly reflect either dissonance or consonance
+&nbsp;&nbsp;&nbsp;&nbsp;_e.g.,_ "Bu gün hava çok güzel"
+The model was trained on 37,368 Turkish samples and evaluated on two separate sets of 4,671 samples each.
+It achieved 97.5% accuracy and 97.5% macro-F1 score on the test set, demonstrating strong performance in distinguishing subtle semantic contrasts in Turkish sentences.
+|**Model Details**     |                                                       |
+| -------------------- | ----------------------------------------------------- |
+| **Developed by**     | Mert Şengil                                           |
+| **Model type**       | `BertForSequenceClassification`                       |
+| **Base model**       | `ytu-ce-cosmos/turkish-base-bert-uncased`             |
+| **Languages**        | `tr` (Turkish)                                        |
+| **License**          | Apache-2.0                                            |
+| **Fine-tuning task** | 3-class sentiment (dissonance / consonance / neither) |
+## Uses
 ```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
+model_id = "Sengil/ytu-bert-base-dissonance-tr"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = "onu çok seviyorum ve güvenmiyorum."
+text = text.replace("I", "ı").lower()
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=128)
+with torch.no_grad():
+    logits = model(**inputs).logits
+label_id = int(logits.argmax())
+id2label = {0: "Dissonance", 1: "Consonance", 2: "Neither"}
+print(f"{{'label': '{id2label[label_id]}','score':{logits.argmax()}}}")
 ```
+output:
 ```
+{'label': 'Dissonance','score':0}
+```
+|**Training Details**    |                                                |
+| ---------------------- | ---------------------------------------------- |
+| **Training samples**   | 37 368                                         |
+| **Validation samples** | 4 671                                          |
+| **Test samples**       | 4 671                                          |
+| **Epochs**             | 4                                              |
+| **Batch size**         | 32 (train) / 16 (eval)                          |
+| **Optimizer**          | `AdamW` (lr = 2 × 10⁻⁵, weight\_decay = 0.005) |
+| **Scheduler**          | Linear with 10 % warm-up                       |
+| **Precision**          | FP32                                           |
+| **Hardware**           | 1× GPU P100                           |
+### Training Loss Progression
+| Epoch | Train Loss |   Val Loss |
+| ----: | ---------: | ---------: |
+|     1 |     0.2661 |     0.0912 |
+|     2 |     0.0784 |     0.0812 |
+|     3 |     0.0520 |     0.0859 |
+|     4 | **0.0419** | **0.0859** |
+## Evaluation
+| Metric              |      Value |
+| ------------------- | ---------: |
+| **Accuracy (test)** | **0.9750** |
+| **Macro-F1 (test)** | **0.9749** |
+|**Environmental Impact** |                      |
+| ----------------------- | -------------------- |
+| **Hardware**            | 1× A100-40 GB        |
+| **Training time**       | ≈ 4 × 7 min ≈ 0.47 h |
+## Citation
+```
+@misc{Sengil2025DisConBERT,
+  title   = {Sengil/ytu-bert-base-dissonance-tr: A Three-way Dissonance/Consonance Classifier},
   author  = {Şengil, Mert},
   year    = {2025},
+  url     = {https://huggingface.co/Sengil/ytu-bert-base-dissonance-tr}
 }
 ```
 ---
+I would like to thank YTU for their open-source contributions that supported the development of this model.
+For issues or questions, please open an issue on the Hub repo or contact **[mert sengil](https://www.linkedin.com/in/mertsengil/)**.