barek2k2's picture
README updated
0dc2584
|
raw
history blame
2.02 kB

LLM BERT Model for HIPAA-Sensitive Database Fields Classification

This repository hosts a fine-tuned BERT-base model that classifies database column names as either PHI HIPAA-sensitive (e.g., birthDate, ssn, address) or non-sensitive (e.g., color, food, country).

Use this model for:

  • Automatically auditing database schemas for HIPAA compliance
  • Preprocessing before data anonymization
  • Enhancing security in healthcare and mHealth applications

๐Ÿง  Model Info


๐Ÿš€ Usage Example (End-to-End)

1. Install Requirements

pip install torch transformers

2. Example

import torch
from transformers import BertTokenizer, BertForSequenceClassification

# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
model.eval()

# Example column names
texts = ["birthDate", "country", "jwtToken", "color"]

# Tokenize input
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=1)

# Display results
for text, pred in zip(texts, predictions):
    label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
    print(f"{text}: {label}")

3. Output

birthDate: Sensitive
country: Non-sensitive
jwtToken: Sensitive
color: Non-sensitive

This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.