Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -59,53 +59,15 @@ This is a fine-tuned version of the Llama-3.1 model specifically optimized for D
|
|
| 59 |
- **Framework**: MLX-LM
|
| 60 |
- **Hardware**: Apple M1 Max (64GB RAM)
|
| 61 |
|
| 62 |
-
##
|
| 63 |
|
| 64 |
-
###
|
| 65 |
-
|
| 66 |
-
1. **Election Security**: Detect and classify disinformation campaigns targeting electoral processes
|
| 67 |
-
2. **Content Moderation**: Identify harmful content that undermines electoral integrity
|
| 68 |
-
3. **Research**: Academic research on disinformation patterns and meta-narratives
|
| 69 |
-
4. **Policy Analysis**: Support policy development for election security measures
|
| 70 |
-
|
| 71 |
-
### Target Applications
|
| 72 |
-
|
| 73 |
-
- Social media monitoring platforms
|
| 74 |
-
- Election security organizations
|
| 75 |
-
- Fact-checking organizations
|
| 76 |
-
- Academic research institutions
|
| 77 |
-
- Government agencies
|
| 78 |
-
- Civil society organizations
|
| 79 |
-
|
| 80 |
-
## Training Data
|
| 81 |
-
|
| 82 |
-
The model was trained on the [DISARM Election Watch Dataset](https://huggingface.co/datasets/ArapCheruiyot/disarm-election-watch-dataset), which contains:
|
| 83 |
-
|
| 84 |
-
### Data Sources
|
| 85 |
-
- **Telegram**: 3,632 examples (60.3%)
|
| 86 |
-
- **X/Twitter**: 2,038 examples (33.9%)
|
| 87 |
-
- **TikTok**: 248 examples (4.1%)
|
| 88 |
-
- **DISARM**: 101 examples (1.7%)
|
| 89 |
-
|
| 90 |
-
### Task Types
|
| 91 |
-
- **DISARM Classification**: 101 examples
|
| 92 |
-
- **Content Analysis**: 5,770 examples
|
| 93 |
-
- **Narrative Analysis**: 148 examples
|
| 94 |
-
|
| 95 |
-
### Data Split
|
| 96 |
-
- **Training**: 4,815 examples (80%)
|
| 97 |
-
- **Validation**: 601 examples (10%)
|
| 98 |
-
- **Test**: 603 examples (10%)
|
| 99 |
-
|
| 100 |
-
## Usage
|
| 101 |
-
|
| 102 |
-
### With MLX-LM (Fused Model)
|
| 103 |
|
| 104 |
```python
|
| 105 |
from mlx_lm import load, generate
|
| 106 |
|
| 107 |
# Load the complete fine-tuned model
|
| 108 |
-
model, tokenizer = load("
|
| 109 |
|
| 110 |
# Example prompt
|
| 111 |
prompt = """### Instruction:
|
|
@@ -121,34 +83,36 @@ response = generate(model, tokenizer, prompt, max_tokens=256, temp=0.1)
|
|
| 121 |
print(response)
|
| 122 |
```
|
| 123 |
|
| 124 |
-
###
|
| 125 |
|
| 126 |
-
```
|
| 127 |
-
|
|
|
|
| 128 |
|
| 129 |
-
#
|
| 130 |
-
|
| 131 |
-
|
| 132 |
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
```
|
| 137 |
|
| 138 |
-
###
|
| 139 |
|
| 140 |
```json
|
| 141 |
{
|
| 142 |
"meta_narrative": "Compromised Election Technology",
|
| 143 |
-
"primary_disarm_technique": "T0022.
|
| 144 |
-
"confidence_score": 0.
|
| 145 |
-
"key_indicators": [
|
| 146 |
-
"BVAS",
|
| 147 |
-
"INEC",
|
| 148 |
-
"pre-loaded",
|
| 149 |
-
"rigged",
|
| 150 |
-
"incumbent"
|
| 151 |
-
],
|
| 152 |
"platform": "WhatsApp",
|
| 153 |
"language": "en",
|
| 154 |
"category": "Undermining Electoral Institutions"
|
|
@@ -173,25 +137,6 @@ print(response)
|
|
| 173 |
- **Metal GPU**: Accelerated inference
|
| 174 |
- **Memory Management**: 16GB wired memory optimization
|
| 175 |
|
| 176 |
-
## Limitations and Biases
|
| 177 |
-
|
| 178 |
-
### Known Limitations
|
| 179 |
-
1. **Language**: Trained primarily on English content
|
| 180 |
-
2. **Geographic Focus**: Primarily Nigerian election context
|
| 181 |
-
3. **Platform Bias**: Limited to specific social media platforms
|
| 182 |
-
4. **Temporal Context**: Training data from specific election periods
|
| 183 |
-
|
| 184 |
-
### Potential Biases
|
| 185 |
-
1. **Cultural Context**: May not generalize to other cultural contexts
|
| 186 |
-
2. **Platform-Specific**: May not capture platform-specific nuances
|
| 187 |
-
3. **Evolving Tactics**: May not capture new disinformation techniques
|
| 188 |
-
|
| 189 |
-
### Ethical Considerations
|
| 190 |
-
1. **Privacy**: Ensure compliance with data protection regulations
|
| 191 |
-
2. **Transparency**: Use responsibly with clear disclosure of AI involvement
|
| 192 |
-
3. **Bias Mitigation**: Regular evaluation for unintended biases
|
| 193 |
-
4. **Human Oversight**: Always maintain human oversight in critical applications
|
| 194 |
-
|
| 195 |
## Model Files
|
| 196 |
|
| 197 |
### Fused Model (Complete)
|
|
@@ -204,43 +149,15 @@ print(response)
|
|
| 204 |
- **Format**: safetensors
|
| 205 |
- **Files**: Final adapters + training checkpoints
|
| 206 |
|
| 207 |
-
|
| 208 |
-
- **Frequency**: Every 100 iterations
|
| 209 |
-
- **Purpose**: Model evaluation and recovery
|
| 210 |
-
- **Format**: safetensors
|
| 211 |
-
|
| 212 |
-
## Citation
|
| 213 |
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
title={DISARM Election Watch: Fine-tuned Llama-3.1 for Election Disinformation Detection},
|
| 219 |
-
author={ArapCheruiyot},
|
| 220 |
-
year={2024},
|
| 221 |
-
url={https://huggingface.co/ArapCheruiyot/disarm-ew-llama3-finetuned}
|
| 222 |
-
}
|
| 223 |
-
```
|
| 224 |
-
|
| 225 |
-
## License
|
| 226 |
-
|
| 227 |
-
This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
| 228 |
-
|
| 229 |
-
## Acknowledgments
|
| 230 |
-
|
| 231 |
-
- **DISARM Framework**: For the classification methodology
|
| 232 |
-
- **MLX-LM**: For the fine-tuning framework
|
| 233 |
-
- **Apple**: For Apple Silicon optimization
|
| 234 |
-
- **Hugging Face**: For model hosting and distribution
|
| 235 |
|
| 236 |
## Contact
|
| 237 |
|
| 238 |
For questions, issues, or collaboration opportunities:
|
| 239 |
- **Model Repository**: [ArapCheruiyot/disarm-ew-llama3-finetuned](https://huggingface.co/ArapCheruiyot/disarm-ew-llama3-finetuned)
|
| 240 |
- **Dataset Repository**: [ArapCheruiyot/disarm-election-watch-dataset](https://huggingface.co/datasets/ArapCheruiyot/disarm-election-watch-dataset)
|
| 241 |
-
|
| 242 |
-
## Version History
|
| 243 |
-
|
| 244 |
-
- **v1.0.0**: Initial release with 600 training iterations
|
| 245 |
-
- **Training Data**: 6,019 examples from multiple platforms
|
| 246 |
-
- **Framework**: MLX-LM with Apple Silicon optimization
|
|
|
|
| 59 |
- **Framework**: MLX-LM
|
| 60 |
- **Hardware**: Apple M1 Max (64GB RAM)
|
| 61 |
|
| 62 |
+
## Quick Start
|
| 63 |
|
| 64 |
+
### Using with MLX-LM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
```python
|
| 67 |
from mlx_lm import load, generate
|
| 68 |
|
| 69 |
# Load the complete fine-tuned model
|
| 70 |
+
model, tokenizer = load("models/disarm_ew_llama3_finetuned")
|
| 71 |
|
| 72 |
# Example prompt
|
| 73 |
prompt = """### Instruction:
|
|
|
|
| 83 |
print(response)
|
| 84 |
```
|
| 85 |
|
| 86 |
+
### Using with Ollama
|
| 87 |
|
| 88 |
+
```bash
|
| 89 |
+
# Create Ollama model
|
| 90 |
+
ollama create disarm-ew-llama3-finetuned -f Modelfile
|
| 91 |
|
| 92 |
+
# Run the model
|
| 93 |
+
ollama run disarm-ew-llama3-finetuned "Your prompt here"
|
| 94 |
+
```
|
| 95 |
|
| 96 |
+
### Example Usage
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
ollama run disarm-ew-llama3-finetuned "### Instruction:
|
| 100 |
+
Classify the following content according to DISARM Framework techniques and meta-narratives:
|
| 101 |
+
|
| 102 |
+
### Input:
|
| 103 |
+
A viral WhatsApp broadcast claims that the BVAS machines have been pre-loaded with votes by INEC in favour of the incumbent party.
|
| 104 |
+
|
| 105 |
+
### Response:"
|
| 106 |
```
|
| 107 |
|
| 108 |
+
### Expected Output
|
| 109 |
|
| 110 |
```json
|
| 111 |
{
|
| 112 |
"meta_narrative": "Compromised Election Technology",
|
| 113 |
+
"primary_disarm_technique": "T0022.001: Develop False Conspiracy Theory Narratives about Electoral Manipulation and Compromise",
|
| 114 |
+
"confidence_score": 0.98,
|
| 115 |
+
"key_indicators": ["BVAS", "pre-loaded", "INEC"],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
"platform": "WhatsApp",
|
| 117 |
"language": "en",
|
| 118 |
"category": "Undermining Electoral Institutions"
|
|
|
|
| 137 |
- **Metal GPU**: Accelerated inference
|
| 138 |
- **Memory Management**: 16GB wired memory optimization
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
## Model Files
|
| 141 |
|
| 142 |
### Fused Model (Complete)
|
|
|
|
| 149 |
- **Format**: safetensors
|
| 150 |
- **Files**: Final adapters + training checkpoints
|
| 151 |
|
| 152 |
+
## Local Deployment Benefits
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
+
- **Privacy**: Run locally without sending data to external servers
|
| 155 |
+
- **Speed**: Fast inference on local hardware
|
| 156 |
+
- **Customization**: Modify prompts and parameters as needed
|
| 157 |
+
- **Offline**: Works without internet connection
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
|
| 159 |
## Contact
|
| 160 |
|
| 161 |
For questions, issues, or collaboration opportunities:
|
| 162 |
- **Model Repository**: [ArapCheruiyot/disarm-ew-llama3-finetuned](https://huggingface.co/ArapCheruiyot/disarm-ew-llama3-finetuned)
|
| 163 |
- **Dataset Repository**: [ArapCheruiyot/disarm-election-watch-dataset](https://huggingface.co/datasets/ArapCheruiyot/disarm-election-watch-dataset)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|