README updated

Browse files

Files changed (1) hide show

README.md +65 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# LLM BERT Model for HIPAA-Sensitive Database Fields Classification
+This repository hosts a fine-tuned BERT-base model that classifies database column names as either **PHI HIPAA-sensitive** (e.g., `birthDate`, `ssn`, `address`) or **non-sensitive** (e.g., `color`, `food`, `country`).
+Use this model for:
+- Automatically auditing database schemas for HIPAA compliance
+- Preprocessing before data anonymization
+- Enhancing security in healthcare and mHealth applications
+---
+## 🧠 Model Info
+- **Base Model**: `bert-base-uncased`
+- **Task**: Binary classification (PHI HIPAA Sensitive vs Non-sensitive)
+- **Trained On**: Synthetic and real-world column name examples
+- **Framework**: Hugging Face Transformers
+- **Model URL**: [https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema](https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema)
+---
+## 🚀 Usage Example (End-to-End)
+### 1. Install Requirements
+```bash
+pip install torch transformers
+```
+### 2. Example
+```bash
+import torch
+from transformers import BertTokenizer, BertForSequenceClassification
+# Load model and tokenizer
+model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
+tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
+model.eval()
+# Example column names
+texts = ["birthDate", "country", "jwtToken", "color"]
+# Tokenize input
+inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.argmax(outputs.logits, dim=1)
+# Display results
+for text, pred in zip(texts, predictions):
+    label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
+    print(f"{text}: {label}")
+```
+### 3. Output
+```bash
+birthDate: Sensitive
+country: Non-sensitive
+jwtToken: Sensitive
+color: Non-sensitive
+```
+This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.