Update README.md

f50f76d verified about 2 months ago

7.15 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-classification
	tags:
	- text-classification
	- deberta-v3
	- maritime
	- safety
	- incident-classification
	- maib
	- marine-accidents
	datasets:
	- baker-street/maib-incident-reports-5K
	model-index:
	- name: MAIB Incident Type Classifier
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	type: baker-street/maib-incident-reports-5K
	name: MAIB Incident Reports
	metrics:
	- type: accuracy
	value: 0.89
	name: Accuracy
	- type: f1
	value: 0.89
	name: Weighted F1-Score
	- type: f1
	value: 0.70
	name: Macro F1-Score
	---

	# MAIB Incident Type Classifier

	A fine-tuned DeBERTa-v3 model for classifying marine incident types based on accident investigation reports from the Marine Accident Investigation Branch (MAIB).

	## Model Description

	This model is a fine-tuned version of `microsoft/deberta-v3-base` specifically designed to classify marine incidents into 11 different categories. It was trained on the MAIB incident reports dataset and achieves high performance in maritime safety incident classification.

	- Developed by: Ilia Munaev
	- Model type: Text Classification
	- Language(s): English
	- License: Apache 2.0
	- Finetuned from model: microsoft/deberta-v3-base

	## Model Performance

	The model achieves the following performance metrics on the test set:

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 89.0% \|
	\| Weighted F1-Score \| 89.0% \|
	\| Macro F1-Score \| 70.2% \|


	## Evaluation Results

	The model evaluation:

	- Confusion Matrix: Shows classification accuracy across all incident types
	- Per-Class F1 Scores: Displays F1 performance for each incident category

	<img src="confusion_matrix.png" alt="Confusion Matrix" width="600">
	<img src="per_class_f1.png" alt="Per-Class F1 Scores" width="600">

	## Intended Use

	### Primary Use Cases

	- Maritime Safety Analysis: Classify marine incident reports for safety analysis
	- Regulatory Compliance: Automate incident categorization for regulatory reporting
	- Risk Assessment: Support risk analysis by categorizing incident types
	- Research: Academic and industry research on maritime safety patterns

	### Out-of-Scope Use Cases

	- Real-time Emergency Response: Not suitable for emergency situations requiring immediate response
	- Legal Proceedings: Should not be used as primary evidence in legal cases
	- Non-English Text: Model is trained only on English incident reports

	## Training Data

	The model was trained on the `baker-street/maib-incident-reports-5K` dataset, which contains:

	- Total Samples: 5,768 incident reports
	- Training Set: 5,191 samples
	- Validation Set: 288 samples
	- Test Set: 289 samples
	- Source: Marine Accident Investigation Branch (MAIB) reports
	- Language: English
	- Time Period: Historical MAIB incident reports

	## Training Procedure

	### Training Hyperparameters

	- Learning Rate: 2e-5
	- Batch Size: 32
	- Epochs: 3
	- Max Length: 256 tokens
	- Optimizer: AdamW
	- Scheduler: Linear with warmup

	### Training Infrastructure

	- Hardware: CUDA-compatible GPU (Tesla T4)
	- Training Time: ~16 minutes for 3 epochs
	- Framework: PyTorch with Transformers library

	## Usage

	### Using Transformers Pipeline

	```python
	from transformers import pipeline

	# Load the model
	classifier = pipeline("text-classification",
	model="your-username/maib-incident-classifier")

	# Classify an incident
	result = classifier("A crew member fell overboard from a motorboat")
	print(result)
	```

	### Using Model Directly

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("your-username/maib-incident-classifier")
	model = AutoModelForSequenceClassification.from_pretrained("your-username/maib-incident-classifier")

	# Prepare input
	text = "Fire broke out in the engine room during routine maintenance"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

	# Get class labels
	class_labels = [
	"Accident to person(s)",
	"Capsizing / Listing",
	"Collision",
	"Contact",
	"Damage / Loss Of Equipment",
	"Fire / Explosion",
	"Flooding / Foundering",
	"Grounding / Stranding",
	"Hull Failure",
	"Loss Of Control",
	"Non-accidental Event"
	]

	# Get top prediction
	top_prediction = torch.argmax(predictions, dim=-1)
	print(f"Predicted class: {class_labels[top_prediction]}")
	print(f"Confidence: {predictions[0][top_prediction]:.3f}")
	```

	### Using the Command Line

	```bash
	# Install the package
	pip install maib-incident-classifier

	# Run inference
	maib-inference --model_path your-username/maib-incident-classifier --text "Incident description"
	```

	## Class Labels

	The model classifies incidents into the following 11 categories:

	0. Accident to person(s) - Injuries or fatalities to crew or passengers
	1. Capsizing / Listing - Vessel capsizing or severe listing
	2. Collision - Collision with another vessel or object
	3. Contact - Contact with fixed or floating objects
	4. Damage / Loss Of Equipment - Equipment failure or damage
	5. Fire / Explosion - Fire or explosion incidents
	6. Flooding / Foundering - Water ingress or vessel sinking
	7. Grounding / Stranding - Vessel running aground
	8. Hull Failure - Structural hull damage
	9. Loss Of Control - Loss of steering or propulsion control
	10. Non-accidental Event - Events not classified as accidents

	## Limitations and Bias

	### Known Limitations

	- Class Imbalance: Some incident types (Hull Failure, Non-accidental Event) have very few samples
	- Language: Model only works with English text
	- Domain Specific: Trained specifically on MAIB reports, may not generalize to other maritime contexts
	- Temporal Bias: Based on historical data, may not reflect current incident patterns

	### Potential Biases

	- Reporting Bias: Reflects biases in how incidents are reported to MAIB
	- Geographic Bias: Primarily UK-focused incident reports
	- Vessel Type Bias: May be biased toward certain vessel types more commonly reported

	## Citation

	```bibtex
	@software{maib_classifier,
	title={MAIB Incident Type Classifier},
	author={Ilia Munaev},
	year={2024},
	url={https://huggingface.co/your-username/maib-incident-classifier}
	}
	```

	## Acknowledgments

	- Marine Accident Investigation Branch (MAIB) for providing the dataset
	- Microsoft for the DeBERTa-v3 base model
	- Hugging Face for the transformers library and platform
	- Baker Street for hosting the MAIB incident reports dataset

	## Contact

	For questions, issues, or contributions:
	- Repository: [GitHub Repository URL]
	- Issues: [GitHub Issues URL]
	- Email: [email protected]

	## License

	This model is released under the Apache 2.0 License. See the LICENSE file for more details.