# LLM BERT Model for HIPAA-Sensitive Database Fields Classification

This repository hosts a fine-tuned BERT-base model that classifies database column names as either **PHI HIPAA-sensitive** (e.g., `birthDate`, `ssn`, `address`) or **non-sensitive** (e.g., `color`, `food`, `country`).

Use this model for:
- Automatically auditing database schemas for HIPAA compliance
- Preprocessing before data anonymization
- Enhancing security in healthcare and mHealth applications

---

## 🧠 Model Info

- **Base Model**: `bert-base-uncased`
- **Task**: Binary classification (PHI HIPAA Sensitive vs Non-sensitive)
- **Trained On**: Synthetic and real-world column name examples
- **Framework**: Hugging Face Transformers
- **Model URL**: [https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema](https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema)

---

## 🚀 Usage Example (End-to-End)

### 1. Install Requirements
```bash
pip install torch transformers
```

### 2. Example
```bash
import torch
from transformers import BertTokenizer, BertForSequenceClassification

# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
model.eval()

# Example column names
texts = ["birthDate", "country", "jwtToken", "color"]

# Tokenize input
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=1)

# Display results
for text, pred in zip(texts, predictions):
    label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
    print(f"{text}: {label}")

```

### 3. Output
```bash
birthDate: Sensitive
country: Non-sensitive
jwtToken: Sensitive
color: Non-sensitive
```

This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.