barek2k2 commited on
Commit
0dc2584
·
1 Parent(s): 1284eab

README updated

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLM BERT Model for HIPAA-Sensitive Database Fields Classification
2
+
3
+ This repository hosts a fine-tuned BERT-base model that classifies database column names as either **PHI HIPAA-sensitive** (e.g., `birthDate`, `ssn`, `address`) or **non-sensitive** (e.g., `color`, `food`, `country`).
4
+
5
+ Use this model for:
6
+ - Automatically auditing database schemas for HIPAA compliance
7
+ - Preprocessing before data anonymization
8
+ - Enhancing security in healthcare and mHealth applications
9
+
10
+ ---
11
+
12
+ ## 🧠 Model Info
13
+
14
+ - **Base Model**: `bert-base-uncased`
15
+ - **Task**: Binary classification (PHI HIPAA Sensitive vs Non-sensitive)
16
+ - **Trained On**: Synthetic and real-world column name examples
17
+ - **Framework**: Hugging Face Transformers
18
+ - **Model URL**: [https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema](https://huggingface.co/barek2k2/bert_hipaa_sensitive_db_schema)
19
+
20
+ ---
21
+
22
+ ## 🚀 Usage Example (End-to-End)
23
+
24
+ ### 1. Install Requirements
25
+ ```bash
26
+ pip install torch transformers
27
+ ```
28
+
29
+ ### 2. Example
30
+ ```bash
31
+ import torch
32
+ from transformers import BertTokenizer, BertForSequenceClassification
33
+
34
+ # Load model and tokenizer
35
+ model = BertForSequenceClassification.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
36
+ tokenizer = BertTokenizer.from_pretrained("barek2k2/bert_hipaa_sensitive_db_schema")
37
+ model.eval()
38
+
39
+ # Example column names
40
+ texts = ["birthDate", "country", "jwtToken", "color"]
41
+
42
+ # Tokenize input
43
+ inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
44
+
45
+ # Predict
46
+ with torch.no_grad():
47
+ outputs = model(**inputs)
48
+ predictions = torch.argmax(outputs.logits, dim=1)
49
+
50
+ # Display results
51
+ for text, pred in zip(texts, predictions):
52
+ label = "Sensitive" if pred.item() == 1 else "Non-sensitive"
53
+ print(f"{text}: {label}")
54
+
55
+ ```
56
+
57
+ ### 3. Output
58
+ ```bash
59
+ birthDate: Sensitive
60
+ country: Non-sensitive
61
+ jwtToken: Sensitive
62
+ color: Non-sensitive
63
+ ```
64
+
65
+ This LLM model is provided for research and educational purposes only. Always verify compliance before using in production environments.