|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- text-classification |
|
|
- deberta-v3 |
|
|
- maritime |
|
|
- safety |
|
|
- incident-classification |
|
|
- maib |
|
|
- marine-accidents |
|
|
datasets: |
|
|
- baker-street/maib-incident-reports-5K |
|
|
model-index: |
|
|
- name: MAIB Incident Type Classifier |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
type: baker-street/maib-incident-reports-5K |
|
|
name: MAIB Incident Reports |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.89 |
|
|
name: Accuracy |
|
|
- type: f1 |
|
|
value: 0.89 |
|
|
name: Weighted F1-Score |
|
|
- type: f1 |
|
|
value: 0.70 |
|
|
name: Macro F1-Score |
|
|
--- |
|
|
|
|
|
# MAIB Incident Type Classifier |
|
|
|
|
|
A fine-tuned DeBERTa-v3 model for classifying marine incident types based on accident investigation reports from the Marine Accident Investigation Branch (MAIB). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned version of `microsoft/deberta-v3-base` specifically designed to classify marine incidents into 11 different categories. It was trained on the MAIB incident reports dataset and achieves high performance in maritime safety incident classification. |
|
|
|
|
|
- **Developed by**: Ilia Munaev |
|
|
- **Model type**: Text Classification |
|
|
- **Language(s)**: English |
|
|
- **License**: Apache 2.0 |
|
|
- **Finetuned from model**: microsoft/deberta-v3-base |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
The model achieves the following performance metrics on the test set: |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **Accuracy** | 89.0% | |
|
|
| **Weighted F1-Score** | 89.0% | |
|
|
| **Macro F1-Score** | 70.2% | |
|
|
|
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
The model evaluation: |
|
|
|
|
|
- **Confusion Matrix**: Shows classification accuracy across all incident types |
|
|
- **Per-Class F1 Scores**: Displays F1 performance for each incident category |
|
|
|
|
|
<img src="confusion_matrix.png" alt="Confusion Matrix" width="600"> |
|
|
<img src="per_class_f1.png" alt="Per-Class F1 Scores" width="600"> |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
|
|
|
- **Maritime Safety Analysis**: Classify marine incident reports for safety analysis |
|
|
- **Regulatory Compliance**: Automate incident categorization for regulatory reporting |
|
|
- **Risk Assessment**: Support risk analysis by categorizing incident types |
|
|
- **Research**: Academic and industry research on maritime safety patterns |
|
|
|
|
|
### Out-of-Scope Use Cases |
|
|
|
|
|
- **Real-time Emergency Response**: Not suitable for emergency situations requiring immediate response |
|
|
- **Legal Proceedings**: Should not be used as primary evidence in legal cases |
|
|
- **Non-English Text**: Model is trained only on English incident reports |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on the `baker-street/maib-incident-reports-5K` dataset, which contains: |
|
|
|
|
|
- **Total Samples**: 5,768 incident reports |
|
|
- **Training Set**: 5,191 samples |
|
|
- **Validation Set**: 288 samples |
|
|
- **Test Set**: 289 samples |
|
|
- **Source**: Marine Accident Investigation Branch (MAIB) reports |
|
|
- **Language**: English |
|
|
- **Time Period**: Historical MAIB incident reports |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- **Learning Rate**: 2e-5 |
|
|
- **Batch Size**: 32 |
|
|
- **Epochs**: 3 |
|
|
- **Max Length**: 256 tokens |
|
|
- **Optimizer**: AdamW |
|
|
- **Scheduler**: Linear with warmup |
|
|
|
|
|
### Training Infrastructure |
|
|
|
|
|
- **Hardware**: CUDA-compatible GPU (Tesla T4) |
|
|
- **Training Time**: ~16 minutes for 3 epochs |
|
|
- **Framework**: PyTorch with Transformers library |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Using Transformers Pipeline |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
classifier = pipeline("text-classification", |
|
|
model="your-username/maib-incident-classifier") |
|
|
|
|
|
# Classify an incident |
|
|
result = classifier("A crew member fell overboard from a motorboat") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### Using Model Directly |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load tokenizer and model |
|
|
tokenizer = AutoTokenizer.from_pretrained("your-username/maib-incident-classifier") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("your-username/maib-incident-classifier") |
|
|
|
|
|
# Prepare input |
|
|
text = "Fire broke out in the engine room during routine maintenance" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Get predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
|
|
|
# Get class labels |
|
|
class_labels = [ |
|
|
"Accident to person(s)", |
|
|
"Capsizing / Listing", |
|
|
"Collision", |
|
|
"Contact", |
|
|
"Damage / Loss Of Equipment", |
|
|
"Fire / Explosion", |
|
|
"Flooding / Foundering", |
|
|
"Grounding / Stranding", |
|
|
"Hull Failure", |
|
|
"Loss Of Control", |
|
|
"Non-accidental Event" |
|
|
] |
|
|
|
|
|
# Get top prediction |
|
|
top_prediction = torch.argmax(predictions, dim=-1) |
|
|
print(f"Predicted class: {class_labels[top_prediction]}") |
|
|
print(f"Confidence: {predictions[0][top_prediction]:.3f}") |
|
|
``` |
|
|
|
|
|
### Using the Command Line |
|
|
|
|
|
```bash |
|
|
# Install the package |
|
|
pip install maib-incident-classifier |
|
|
|
|
|
# Run inference |
|
|
maib-inference --model_path your-username/maib-incident-classifier --text "Incident description" |
|
|
``` |
|
|
|
|
|
## Class Labels |
|
|
|
|
|
The model classifies incidents into the following 11 categories: |
|
|
|
|
|
0. **Accident to person(s)** - Injuries or fatalities to crew or passengers |
|
|
1. **Capsizing / Listing** - Vessel capsizing or severe listing |
|
|
2. **Collision** - Collision with another vessel or object |
|
|
3. **Contact** - Contact with fixed or floating objects |
|
|
4. **Damage / Loss Of Equipment** - Equipment failure or damage |
|
|
5. **Fire / Explosion** - Fire or explosion incidents |
|
|
6. **Flooding / Foundering** - Water ingress or vessel sinking |
|
|
7. **Grounding / Stranding** - Vessel running aground |
|
|
8. **Hull Failure** - Structural hull damage |
|
|
9. **Loss Of Control** - Loss of steering or propulsion control |
|
|
10. **Non-accidental Event** - Events not classified as accidents |
|
|
|
|
|
## Limitations and Bias |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
- **Class Imbalance**: Some incident types (Hull Failure, Non-accidental Event) have very few samples |
|
|
- **Language**: Model only works with English text |
|
|
- **Domain Specific**: Trained specifically on MAIB reports, may not generalize to other maritime contexts |
|
|
- **Temporal Bias**: Based on historical data, may not reflect current incident patterns |
|
|
|
|
|
### Potential Biases |
|
|
|
|
|
- **Reporting Bias**: Reflects biases in how incidents are reported to MAIB |
|
|
- **Geographic Bias**: Primarily UK-focused incident reports |
|
|
- **Vessel Type Bias**: May be biased toward certain vessel types more commonly reported |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{maib_classifier, |
|
|
title={MAIB Incident Type Classifier}, |
|
|
author={Ilia Munaev}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/your-username/maib-incident-classifier} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Marine Accident Investigation Branch (MAIB) for providing the dataset |
|
|
- Microsoft for the DeBERTa-v3 base model |
|
|
- Hugging Face for the transformers library and platform |
|
|
- Baker Street for hosting the MAIB incident reports dataset |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions, issues, or contributions: |
|
|
- **Repository**: [GitHub Repository URL] |
|
|
- **Issues**: [GitHub Issues URL] |
|
|
- **Email**: [email protected] |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Apache 2.0 License. See the LICENSE file for more details. |
|
|
|