flan-t5-small-phishing-email

1. Project Overview

This repository contains a fine-tuned Flan-T5-Small model designed to identify Phishing and Malicious Emails. By analyzing the linguistic patterns, urgency cues, and structural anomalies of email content, the model classifies inputs as either Legitimate or Phishing.

2. Model Performance

The model was evaluated against a diverse set of corporate and personal email simulations. The results demonstrate high reliability in filtering dangerous content.

Confusion Matrix

	Predicted: Legitimate	Predicted: Phishing
Actual: Legitimate	702 (True Negative)	24 (False Positive)
Actual: Phishing	14 (False Negative)	1125 (True Positive)

Key Metrics

Accuracy: 97.97%
Precision: 97.91%
Recall (Sensitivity): 98.77%
F1-Score: 98.34%

Analysis

The model achieves an exceptional Recall of 98.77%, missing only 14 out of 1,139 phishing attempts. The low False Positive rate (24 cases) ensures that legitimate communication is rarely interrupted, making this model suitable for a first-pass automated mail filter.

3. Disclaimers & Bias Statement

Disclaimer

Important: This model is an AI-based heuristic tool and should not be used as the sole defense against cyber threats. It cannot inspect encrypted attachments or analyze the reputation of external URLs. Use this model in combination with SPF/DKIM/DMARC checks and robust endpoint protection.

Dataset & Potential Bias

Source Bias: The model may be biased toward "standard" phishing templates (e.g., bank alerts, password resets). It may be less effective against highly personalized Spear Phishing or "Whaling" attacks that lack typical malicious keywords.
Temporal Bias: As phishing tactics evolve (e.g., using QR codes or brand-new social engineering hooks), the model's effectiveness may decrease without regular retraining on updated datasets.
Over-sensitivity to Urgency: The model may incorrectly flag legitimate but urgent business communications (e.g., "Invoice Overdue" or "System Maintenance") due to the high correlation between urgency and phishing.

Technical Limitations

Due to the input length constraints of Flan-T5-Small, extremely long email threads may be truncated. If the malicious "hook" appears at the very end of a long message, it may be missed during inference.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ambrosehui/flan-t5-small-phishing-email

Base model

google/flan-t5-small

Adapter

(68)

this model

Dataset used to train ambrosehui/flan-t5-small-phishing-email

Collection including ambrosehui/flan-t5-small-phishing-email

Judgment

Collection

The model series for judgement specific task • 3 items • Updated 15 days ago