baker-street's picture
Update README.md
f50f76d verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-classification
tags:
- text-classification
- deberta-v3
- maritime
- safety
- incident-classification
- maib
- marine-accidents
datasets:
- baker-street/maib-incident-reports-5K
model-index:
- name: MAIB Incident Type Classifier
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: baker-street/maib-incident-reports-5K
name: MAIB Incident Reports
metrics:
- type: accuracy
value: 0.89
name: Accuracy
- type: f1
value: 0.89
name: Weighted F1-Score
- type: f1
value: 0.70
name: Macro F1-Score
---
# MAIB Incident Type Classifier
A fine-tuned DeBERTa-v3 model for classifying marine incident types based on accident investigation reports from the Marine Accident Investigation Branch (MAIB).
## Model Description
This model is a fine-tuned version of `microsoft/deberta-v3-base` specifically designed to classify marine incidents into 11 different categories. It was trained on the MAIB incident reports dataset and achieves high performance in maritime safety incident classification.
- **Developed by**: Ilia Munaev
- **Model type**: Text Classification
- **Language(s)**: English
- **License**: Apache 2.0
- **Finetuned from model**: microsoft/deberta-v3-base
## Model Performance
The model achieves the following performance metrics on the test set:
| Metric | Score |
|--------|-------|
| **Accuracy** | 89.0% |
| **Weighted F1-Score** | 89.0% |
| **Macro F1-Score** | 70.2% |
## Evaluation Results
The model evaluation:
- **Confusion Matrix**: Shows classification accuracy across all incident types
- **Per-Class F1 Scores**: Displays F1 performance for each incident category
<img src="confusion_matrix.png" alt="Confusion Matrix" width="600">
<img src="per_class_f1.png" alt="Per-Class F1 Scores" width="600">
## Intended Use
### Primary Use Cases
- **Maritime Safety Analysis**: Classify marine incident reports for safety analysis
- **Regulatory Compliance**: Automate incident categorization for regulatory reporting
- **Risk Assessment**: Support risk analysis by categorizing incident types
- **Research**: Academic and industry research on maritime safety patterns
### Out-of-Scope Use Cases
- **Real-time Emergency Response**: Not suitable for emergency situations requiring immediate response
- **Legal Proceedings**: Should not be used as primary evidence in legal cases
- **Non-English Text**: Model is trained only on English incident reports
## Training Data
The model was trained on the `baker-street/maib-incident-reports-5K` dataset, which contains:
- **Total Samples**: 5,768 incident reports
- **Training Set**: 5,191 samples
- **Validation Set**: 288 samples
- **Test Set**: 289 samples
- **Source**: Marine Accident Investigation Branch (MAIB) reports
- **Language**: English
- **Time Period**: Historical MAIB incident reports
## Training Procedure
### Training Hyperparameters
- **Learning Rate**: 2e-5
- **Batch Size**: 32
- **Epochs**: 3
- **Max Length**: 256 tokens
- **Optimizer**: AdamW
- **Scheduler**: Linear with warmup
### Training Infrastructure
- **Hardware**: CUDA-compatible GPU (Tesla T4)
- **Training Time**: ~16 minutes for 3 epochs
- **Framework**: PyTorch with Transformers library
## Usage
### Using Transformers Pipeline
```python
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification",
model="your-username/maib-incident-classifier")
# Classify an incident
result = classifier("A crew member fell overboard from a motorboat")
print(result)
```
### Using Model Directly
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("your-username/maib-incident-classifier")
model = AutoModelForSequenceClassification.from_pretrained("your-username/maib-incident-classifier")
# Prepare input
text = "Fire broke out in the engine room during routine maintenance"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get class labels
class_labels = [
"Accident to person(s)",
"Capsizing / Listing",
"Collision",
"Contact",
"Damage / Loss Of Equipment",
"Fire / Explosion",
"Flooding / Foundering",
"Grounding / Stranding",
"Hull Failure",
"Loss Of Control",
"Non-accidental Event"
]
# Get top prediction
top_prediction = torch.argmax(predictions, dim=-1)
print(f"Predicted class: {class_labels[top_prediction]}")
print(f"Confidence: {predictions[0][top_prediction]:.3f}")
```
### Using the Command Line
```bash
# Install the package
pip install maib-incident-classifier
# Run inference
maib-inference --model_path your-username/maib-incident-classifier --text "Incident description"
```
## Class Labels
The model classifies incidents into the following 11 categories:
0. **Accident to person(s)** - Injuries or fatalities to crew or passengers
1. **Capsizing / Listing** - Vessel capsizing or severe listing
2. **Collision** - Collision with another vessel or object
3. **Contact** - Contact with fixed or floating objects
4. **Damage / Loss Of Equipment** - Equipment failure or damage
5. **Fire / Explosion** - Fire or explosion incidents
6. **Flooding / Foundering** - Water ingress or vessel sinking
7. **Grounding / Stranding** - Vessel running aground
8. **Hull Failure** - Structural hull damage
9. **Loss Of Control** - Loss of steering or propulsion control
10. **Non-accidental Event** - Events not classified as accidents
## Limitations and Bias
### Known Limitations
- **Class Imbalance**: Some incident types (Hull Failure, Non-accidental Event) have very few samples
- **Language**: Model only works with English text
- **Domain Specific**: Trained specifically on MAIB reports, may not generalize to other maritime contexts
- **Temporal Bias**: Based on historical data, may not reflect current incident patterns
### Potential Biases
- **Reporting Bias**: Reflects biases in how incidents are reported to MAIB
- **Geographic Bias**: Primarily UK-focused incident reports
- **Vessel Type Bias**: May be biased toward certain vessel types more commonly reported
## Citation
```bibtex
@software{maib_classifier,
title={MAIB Incident Type Classifier},
author={Ilia Munaev},
year={2024},
url={https://huggingface.co/your-username/maib-incident-classifier}
}
```
## Acknowledgments
- Marine Accident Investigation Branch (MAIB) for providing the dataset
- Microsoft for the DeBERTa-v3 base model
- Hugging Face for the transformers library and platform
- Baker Street for hosting the MAIB incident reports dataset
## Contact
For questions, issues, or contributions:
- **Repository**: [GitHub Repository URL]
- **Issues**: [GitHub Issues URL]
- **Email**: [email protected]
## License
This model is released under the Apache 2.0 License. See the LICENSE file for more details.