MAIB Incident Type Classifier

A fine-tuned DeBERTa-v3 model for classifying marine incident types based on accident investigation reports from the Marine Accident Investigation Branch (MAIB).

Model Description

This model is a fine-tuned version of microsoft/deberta-v3-base specifically designed to classify marine incidents into 11 different categories. It was trained on the MAIB incident reports dataset and achieves high performance in maritime safety incident classification.

  • Developed by: Ilia Munaev
  • Model type: Text Classification
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from model: microsoft/deberta-v3-base

Model Performance

The model achieves the following performance metrics on the test set:

Metric Score
Accuracy 89.0%
Weighted F1-Score 89.0%
Macro F1-Score 70.2%

Evaluation Results

The model evaluation:

  • Confusion Matrix: Shows classification accuracy across all incident types
  • Per-Class F1 Scores: Displays F1 performance for each incident category
Confusion Matrix Per-Class F1 Scores

Intended Use

Primary Use Cases

  • Maritime Safety Analysis: Classify marine incident reports for safety analysis
  • Regulatory Compliance: Automate incident categorization for regulatory reporting
  • Risk Assessment: Support risk analysis by categorizing incident types
  • Research: Academic and industry research on maritime safety patterns

Out-of-Scope Use Cases

  • Real-time Emergency Response: Not suitable for emergency situations requiring immediate response
  • Legal Proceedings: Should not be used as primary evidence in legal cases
  • Non-English Text: Model is trained only on English incident reports

Training Data

The model was trained on the baker-street/maib-incident-reports-5K dataset, which contains:

  • Total Samples: 5,768 incident reports
  • Training Set: 5,191 samples
  • Validation Set: 288 samples
  • Test Set: 289 samples
  • Source: Marine Accident Investigation Branch (MAIB) reports
  • Language: English
  • Time Period: Historical MAIB incident reports

Training Procedure

Training Hyperparameters

  • Learning Rate: 2e-5
  • Batch Size: 32
  • Epochs: 3
  • Max Length: 256 tokens
  • Optimizer: AdamW
  • Scheduler: Linear with warmup

Training Infrastructure

  • Hardware: CUDA-compatible GPU (Tesla T4)
  • Training Time: ~16 minutes for 3 epochs
  • Framework: PyTorch with Transformers library

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification",
                     model="your-username/maib-incident-classifier")

# Classify an incident
result = classifier("A crew member fell overboard from a motorboat")
print(result)

Using Model Directly

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("your-username/maib-incident-classifier")
model = AutoModelForSequenceClassification.from_pretrained("your-username/maib-incident-classifier")

# Prepare input
text = "Fire broke out in the engine room during routine maintenance"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get class labels
class_labels = [
    "Accident to person(s)",
    "Capsizing / Listing",
    "Collision",
    "Contact",
    "Damage / Loss Of Equipment",
    "Fire / Explosion",
    "Flooding / Foundering",
    "Grounding / Stranding",
    "Hull Failure",
    "Loss Of Control",
    "Non-accidental Event"
]

# Get top prediction
top_prediction = torch.argmax(predictions, dim=-1)
print(f"Predicted class: {class_labels[top_prediction]}")
print(f"Confidence: {predictions[0][top_prediction]:.3f}")

Using the Command Line

# Install the package
pip install maib-incident-classifier

# Run inference
maib-inference --model_path your-username/maib-incident-classifier --text "Incident description"

Class Labels

The model classifies incidents into the following 11 categories:

  1. Accident to person(s) - Injuries or fatalities to crew or passengers
  2. Capsizing / Listing - Vessel capsizing or severe listing
  3. Collision - Collision with another vessel or object
  4. Contact - Contact with fixed or floating objects
  5. Damage / Loss Of Equipment - Equipment failure or damage
  6. Fire / Explosion - Fire or explosion incidents
  7. Flooding / Foundering - Water ingress or vessel sinking
  8. Grounding / Stranding - Vessel running aground
  9. Hull Failure - Structural hull damage
  10. Loss Of Control - Loss of steering or propulsion control
  11. Non-accidental Event - Events not classified as accidents

Limitations and Bias

Known Limitations

  • Class Imbalance: Some incident types (Hull Failure, Non-accidental Event) have very few samples
  • Language: Model only works with English text
  • Domain Specific: Trained specifically on MAIB reports, may not generalize to other maritime contexts
  • Temporal Bias: Based on historical data, may not reflect current incident patterns

Potential Biases

  • Reporting Bias: Reflects biases in how incidents are reported to MAIB
  • Geographic Bias: Primarily UK-focused incident reports
  • Vessel Type Bias: May be biased toward certain vessel types more commonly reported

Citation

@software{maib_classifier,
  title={MAIB Incident Type Classifier},
  author={Ilia Munaev},
  year={2024},
  url={https://huggingface.co/your-username/maib-incident-classifier}
}

Acknowledgments

  • Marine Accident Investigation Branch (MAIB) for providing the dataset
  • Microsoft for the DeBERTa-v3 base model
  • Hugging Face for the transformers library and platform
  • Baker Street for hosting the MAIB incident reports dataset

Contact

For questions, issues, or contributions:

  • Repository: [GitHub Repository URL]
  • Issues: [GitHub Issues URL]
  • Email: [email protected]

License

This model is released under the Apache 2.0 License. See the LICENSE file for more details.

Downloads last month
2
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train baker-street/maib-incident-classifier

Evaluation results