SciBERT Fine-tuned for arXiv Paper Classification
This model is a fine-tuned version of allenai/scibert_scivocab_uncased for classifying scientific papers into arXiv categories.
Model Description
- Base Model: SciBERT (Scientific BERT)
- Task: Multi-class Text Classification
- Training Data: arXiv scientific papers
- Number of Classes: 20 arXiv categories
Intended Use
This model classifies scientific paper abstracts into their primary arXiv subject categories.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")
text = "Your scientific paper abstract here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
Training Details
- Fine-tuned on arXiv paper dataset
- Optimized for scientific domain text classification
Limitations
- Best suited for scientific/academic papers
- Performance may vary on non-scientific text
- Downloads last month
- 11