LLM BERT Model for HIPAA-Sensitive Database Fields Classification

This repository hosts a fine-tuned BERT-base model that classifies database column names as either PHI HIPAA-sensitive (e.g., birthDate, ssn, address) or non-sensitive (e.g., color, food, country).

Use this model for:

Automatically auditing database schemas for HIPAA compliance
Preprocessing before data anonymization
Enhancing security in healthcare and mHealth applications

🧠 Model Info

Base Model: bert-base-uncased
Task: Binary classification (PHI HIPAA Sensitive vs Non-sensitive)
Trained On: Synthetic and real-world column name examples
Framework: Hugging Face Transformers
Model URL: https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema

🚀 Usage Example (End-to-End)

1. Install Requirements

pip install torch transformers

2. Example

import torch
from transformers import BertTokenizer, BertForSequenceClassification

# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
model.eval()

# Example column names
texts = ["birthDate", "country", "jwtToken", "color"]

# Tokenize input
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=1)

# Display results
for text, pred in zip(texts, predictions):
    label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
    print(f"{text}: {label}")

3. Output

birthDate: Sensitive
country: Non-sensitive
jwtToken: Sensitive
color: Non-sensitive

This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.