SciBERT Fine-tuned for arXiv Paper Classification

This model is a fine-tuned version of allenai/scibert_scivocab_uncased for classifying scientific papers into arXiv categories.

Model Description

  • Base Model: SciBERT (Scientific BERT)
  • Task: Multi-class Text Classification
  • Training Data: arXiv scientific papers
  • Number of Classes: 20 arXiv categories

Intended Use

This model classifies scientific paper abstracts into their primary arXiv subject categories.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")

text = "Your scientific paper abstract here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()

Training Details

  • Fine-tuned on arXiv paper dataset
  • Optimized for scientific domain text classification

Limitations

  • Best suited for scientific/academic papers
  • Performance may vary on non-scientific text
Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support