# LLM BERT Model for HIPAA-Sensitive Database Fields Classification This repository hosts a fine-tuned BERT-base model that classifies database column names as either **PHI HIPAA-sensitive** (e.g., `birthDate`, `ssn`, `address`) or **non-sensitive** (e.g., `color`, `food`, `country`). Use this model for: - Automatically auditing database schemas for HIPAA compliance - Preprocessing before data anonymization - Enhancing security in healthcare and mHealth applications --- ## 🧠 Model Info - **Base Model**: `bert-base-uncased` - **Task**: Binary classification (PHI HIPAA Sensitive vs Non-sensitive) - **Trained On**: Synthetic and real-world column name examples - **Framework**: Hugging Face Transformers - **Model URL**: [https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema](https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema) --- ## 🚀 Usage Example (End-to-End) ### 1. Install Requirements ```bash pip install torch transformers ``` ### 2. Example ```bash import torch from transformers import BertTokenizer, BertForSequenceClassification # Load model and tokenizer model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema") tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema") model.eval() # Example column names texts = ["birthDate", "country", "jwtToken", "color"] # Tokenize input inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128) # Predict with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=1) # Display results for text, pred in zip(texts, predictions): label = "Sensitive" if pred.item() == 1 else "Non-sensitive" print(f"{text}: {label}") ``` ### 3. Output ```bash birthDate: Sensitive country: Non-sensitive jwtToken: Sensitive color: Non-sensitive ``` This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.