Spaces:
Runtime error
Runtime error
π Vietnamese Sentiment Analysis
A comprehensive Vietnamese sentiment analysis system built with transformer models, featuring training, testing, demo, and web interface capabilities with advanced memory management.
π Features
- π€ Transformer-based Model: Fine-tuned Vietnamese sentiment analysis using Visobert
- π Interactive Web Interface: Real-time sentiment analysis via Gradio with memory optimization
- π Comprehensive Testing: Model evaluation with confusion matrix and classification metrics
- β‘ Memory Efficient: Built-in memory management, batch processing limits, and quantization support
- π― Easy to Use: Simple command-line interface and web UI
- π Performance Monitoring: Real-time memory usage tracking and optimization
π Project Structure
SentimentAnalysis/
βββ README.md # π This file
βββ requirements.txt # π¦ Python dependencies
βββ .gitignore # π« Git ignore rules
β
βββ py/ # π Core Python modules
β βββ __init__.py # Package initialization
β βββ fine_tune_sentiment.py # π§ Core fine-tuning utilities
β βββ test_model.py # π§ͺ Model testing and evaluation
β βββ demo.py # π» Demo functionality
β βββ gradio_app.py # π Web interface (memory-optimized)
β
βββ main.py # π Main entry point (all commands)
βββ train.py # ποΈ Training script
βββ test.py # π§ͺ Testing script
βββ demo.py # π» Interactive demo
βββ web.py # π Web interface launcher
β
βββ vietnamese_sentiment_finetuned/ # π€ Trained model (auto-generated)
βββ confusion_matrix.png # π Evaluation visualization (auto-generated)
βββ training_history.png # π Training progress (auto-generated)
βββ pdf/ # π Documentation folder
βββ venv/ # π Virtual environment
βββ .git/ # π Git repository
βββ .claude/ # π€ Claude configuration
π οΈ Installation
- Clone and Setup Environment
cd SentimentAnalysis
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install Dependencies
pip install -r requirements.txt
π― Usage
Quick Start Options
Option 1: Use Individual Scripts
# Train the model
python train.py
# Test the model
python test.py
# Run interactive demo
python demo.py
# Launch web interface
python web.py
Option 2: Use Main Entry Point
# Train with custom settings
python main.py train --batch-size 32 --epochs 5
# Test the model
python main.py test --model-path ./vietnamese_sentiment_finetuned
# Run interactive demo
python main.py demo
# Launch web interface with memory options
python main.py web --quantize --max-batch-size 20 --port 8080
1. Training the Model
# Basic training
python train.py
# Custom batch size and epochs
python train.py 32 5
# Using main script
python main.py train --batch-size 32 --epochs 5 --learning-rate 1e-5
2. Testing the Model
# Basic testing
python test.py
# Test with custom model path
python test.py /path/to/custom/model
# Using main script
python main.py test --model-path ./vietnamese_sentiment_finetuned
3. Interactive Demo
# Run demo
python demo.py
# Using main script
python main.py demo
4. Web Interface
# Standard usage (memory-efficient defaults)
python web.py
# High memory efficiency (quantization + small batches)
python web.py --quantize --max-batch-size 5 --max-memory 2048
# Large batch processing
python web.py --max-batch-size 20 --max-memory 8192
# Custom server configuration
python web.py --port 8080 --host 0.0.0.0 --quantize
# Using main script
python main.py web --quantize --max-batch-size 20 --port 8080
π Web Interface Features
The Gradio web interface provides:
π Single Text Analysis
- Real-time sentiment prediction
- Confidence scores with visual charts
- Memory usage monitoring
- Example texts for quick testing
π Batch Analysis
- Process multiple texts at once
- Memory-efficient batch processing
- Automatic batch size limits
- Batch summary with sentiment distribution
π‘οΈ Memory Management
- Automatic Cleanup: Memory cleaned after each prediction
- Batch Limits: Configurable maximum texts per batch
- Memory Monitoring: Real-time memory usage tracking
- GPU Optimization: CUDA cache clearing when available
- Quantization: Optional model quantization for CPU (~4x memory reduction)
βΉοΈ Model Information
- Detailed model specifications
- Performance metrics
- Memory management settings
- Usage tips and troubleshooting
π§ Command Line Options
Individual Scripts
train.py
python train.py [batch_size] [epochs]
test.py
python test.py [model_path]
demo.py
python demo.py
web.py
python web.py [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
Main Entry Point (main.py)
Training Command
python main.py train [--batch-size SIZE] [--epochs NUM] [--learning-rate RATE]
Testing Command
python main.py test [--model-path PATH]
Demo Command
python main.py demo
Web Interface Command
python main.py web [--max-batch-size SIZE] [--quantize] [--max-memory MB] [--port PORT] [--host HOST]
Memory Management Options:
--max-batch-size: Maximum batch size for memory efficiency (default: 10)--quantize: Enable model quantization for memory efficiency (CPU only)--max-memory: Maximum memory usage in MB (default: 4096)--port: Port to run the interface on (default: 7862)--host: Host to bind the interface to (default: 127.0.0.1)
π Model Details
- Base Model: 5CD-AI/Vietnamese-Sentiment-visobert
- Dataset: uitnlp/vietnamese_students_feedback
- Labels: Negative, Neutral, Positive
- Language: Vietnamese
- Architecture: Transformer-based sequence classification
- Max Sequence Length: 512 tokens
π Performance Metrics
- Accuracy: 85-90% (on validation set)
- Processing Speed: ~100ms per text
- Memory Usage: Configurable (default 4GB limit)
- Batch Processing: Up to 20 texts (configurable)
π‘οΈ Memory Management
The system includes comprehensive memory management:
Automatic Features
- Memory cleanup after each prediction
- GPU cache clearing for CUDA
- Garbage collection management
- Memory monitoring before/after operations
User Controls
- Configurable batch size limits
- Memory limit enforcement
- Manual memory cleanup button
- Real-time memory usage display
Optimization Options
- Dynamic quantization (CPU only)
- Batch processing optimization
- Memory-efficient inference
π Troubleshooting
Memory Issues
- Enable quantization:
python gradio_app.py --quantize - Reduce batch size:
python gradio_app.py --max-batch-size 5 - Lower memory limit:
python gradio_app.py --max-memory 2048 - Use manual cleanup: Click "Memory Cleanup" button in web interface
Model Loading Issues
- Ensure model is trained:
python run_training.py - Check model directory:
ls -la vietnamese_sentiment_finetuned/ - Verify dependencies:
pip install -r requirements.txt
Performance Optimization
- Use GPU if available (CUDA)
- Enable quantization for CPU inference
- Monitor memory usage in web interface
- Adjust batch size based on available memory
π Requirements
See requirements.txt for complete dependency list:
torch>=2.0.0
transformers>=4.21.0
datasets>=2.0.0
gradio>=4.0.0
pandas>=1.5.0
numpy>=1.21.0
scikit-learn>=1.1.0
matplotlib>=3.5.0
seaborn>=0.11.0
psutil>=5.9.0
π― Example Usage
Command Line Demo
from py.demo import SentimentDemo
demo = SentimentDemo()
demo.load_model()
demo.interactive_demo()
Web Interface
- Train model:
python train.py - Launch interface:
python web.py - Open browser to
http://127.0.0.1:7862 - Enter Vietnamese text for analysis
Batch Processing
from py.gradio_app import SentimentGradioApp
app = SentimentGradioApp(max_batch_size=20)
app.load_model()
texts = ["Tuyα»t vα»i!", "BΓ¬nh thΖ°α»ng", "RαΊ₯t tα»"]
results, summary = app.batch_predict(texts)
Model Testing
from py.test_model import SentimentTester
tester = SentimentTester(model_path="./vietnamese_sentiment_finetuned")
tester.load_model()
sentiment, confidence = tester.predict_sentiment("GiαΊ£ng viΓͺn dαΊ‘y rαΊ₯t hay!")
Fine-Tuning
from py.fine_tune_sentiment import SentimentFineTuner
fine_tuner = SentimentFineTuner(
model_name="5CD-AI/Vietnamese-Sentiment-visobert",
dataset_name="uitnlp/vietnamese_students_feedback"
)
train_result, eval_results = fine_tuner.run_fine_tuning(
output_dir="./my_model",
learning_rate=2e-5,
batch_size=16,
num_epochs=3
)
π Model Loading Examples
Loading the Fine-Tuned Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("./vietnamese_sentiment_finetuned")
model = AutoModelForSequenceClassification.from_pretrained("./vietnamese_sentiment_finetuned")
Making Predictions
import torch
def predict_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
sentiment_labels = ["Negative", "Neutral", "Positive"]
return sentiment_labels[predicted_class], predictions[0][predicted_class].item()
# Example
text = "GiαΊ£ng viΓͺn dαΊ‘y rαΊ₯t hay vΓ tΓ’m huyαΊΏt."
sentiment, confidence = predict_sentiment(text)
print(f"Sentiment: {sentiment}, Confidence: {confidence:.3f}")
π Dataset Information
The UIT-VSFC corpus contains over 16,000 Vietnamese student feedback sentences with:
- Sentiment Classification: Positive, Neutral, Negative
- Topic Classification: Various educational topics
- Inter-annotator agreement: >91% for sentiment, >71% for topics
- Original F1-score: ~88% for sentiment (Maximum Entropy baseline)
π§ Hardware Requirements
- Minimum: 8GB RAM, CPU
- Recommended: GPU with 8GB+ VRAM for faster training
- Storage: ~2GB for model and datasets
π License
This project uses open-source components for educational and research purposes. Please check individual licenses for:
- 5CD-AI/Vietnamese-Sentiment-visobert
- uitnlp/vietnamese_students_feedback
π€ Contributing
Feel free to submit issues and enhancement requests!
π Citation
If you use this work or the dataset, please cite:
@InProceedings{8573337,
author={Nguyen, Kiet Van and Nguyen, Vu Duc and Nguyen, Phu X. V. and Truong, Tham T. H. and Nguyen, Ngan Luu-Thuy},
booktitle={2018 10th International Conference on Knowledge and Systems Engineering (KSE)},
title={UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis},
year={2018},
volume={},
number={},
pages={19-24},
doi={10.1109/KSE.2018.8573337}
}
Quick Start: python train.py && python web.py
Alternative: python main.py train && python main.py web