LLM BERT Model for HIPAA-Sensitive Database Fields Classification
This repository hosts a fine-tuned BERT-base model that classifies database column names as either PHI HIPAA-sensitive (e.g., birthDate, ssn, address) or non-sensitive (e.g., color, food, country).
Use this model for:
- Automatically auditing database schemas for HIPAA compliance
- Preprocessing before data anonymization
- Enhancing security in healthcare and mHealth applications
๐ง Model Info
- Base Model:
bert-base-uncased - Task: Binary classification (PHI HIPAA Sensitive vs Non-sensitive)
- Trained On: Synthetic and real-world column name examples
- Framework: Hugging Face Transformers
- Model URL: https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema
๐ Usage Example (End-to-End)
1. Install Requirements
pip install torch transformers
2. Example
import torch
from transformers import BertTokenizer, BertForSequenceClassification
# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
model.eval()
# Example column names
texts = ["birthDate", "country", "jwtToken", "color"]
# Tokenize input
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=1)
# Display results
for text, pred in zip(texts, predictions):
label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
print(f"{text}: {label}")
3. Output
birthDate: Sensitive
country: Non-sensitive
jwtToken: Sensitive
color: Non-sensitive
This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.