--- language: en tags: - spacy - ner - cybersecurity - token-classification license: mit model-index: - name: ner-cybersecurity results: - task: type: token-classification name: Named Entity Recognition metrics: - type: f1 value: 0.9831 name: F1 - type: precision value: 0.9792 name: Precision - type: recall value: 0.9869 name: Recall --- # Cybersecurity NER Model NER model for cybersecurity domain. F1: 98.31%. ## Model Details **Version:** v5 **Framework:** spaCy 3.8+ **Training Date:** 2025-12-29 **Examples:** 1922 (stratified 80/10/10) **Backbone:** Domain-adapted RoBERTa ## Entities (13) | Entity | F1 | Examples | |--------|-----|----------| | CERTIFICATION | 100% | CISSP, OSCP, CEH | | SECURITY_ROLE | 100% | CISO, SOC Analyst | | SECURITY_TOOL | 100% | Splunk, Metasploit | | ATTACK_TECHNIQUE | 100% | SQL Injection, XSS | | FRAMEWORK | 100% | NIST CSF, ISO 27001 | | THREAT_TYPE | 100% | APT, ransomware | | AUDIT_TERM | 100% | Compliance, Audit | | CVE | 100% | CVE-2021-44228 | | SECURITY_DOMAIN | 99.10% | Cloud Security | | TECHNICAL_SKILL | 95.30% | Incident Response | | REGULATION | 94.44% | GDPR, HIPAA | | ACRONYM | 88.89% | SIEM, EDR | | CONTROL_ID | 0% | See hybrid approach | ## Performance **Metrics:** - F1: 98.31% - Precision: 97.92% - Recall: 98.69% - Inference: ~60ms/doc **v5 changes from v4:** - Tuned hyperparameters (dropout 0.25, L2 0.02) - Improved REGULATION (+6.64pp), ACRONYM (+22.22pp) - Overall +0.25pp F1 ## CONTROL_ID Handling Model F1 for CONTROL_ID: 0% (insufficient training data: 25 examples). **Solution:** Hybrid approach - regex extraction for production use. Patterns: ISO 27001, NIST CSF, CIS Controls, SOC 2, PCI-DSS. See service implementation for details. ## Usage ```bash pip install spacy>=3.7.0 spacy-transformers>=1.3.0 ``` ```python import spacy nlp = spacy.load("pki/ner-cybersecurity") doc = nlp("CISO with CISSP, expert in Splunk and ISO 27001") for ent in doc.ents: print(f"{ent.text:20} | {ent.label_}") ``` **Output:** ``` CISO | SECURITY_ROLE CISSP | CERTIFICATION Splunk | SECURITY_TOOL ISO 27001 | FRAMEWORK ``` ## Use Cases - Job/CV matching - Threat intelligence extraction - Compliance documentation parsing - Security policy analysis ## Training Config ```ini max_steps = 8000 dropout = 0.25 L2 = 0.02 learning_rate = 0.00003 hidden_width = 128 maxout_pieces = 3 batch_size = 128 ``` ## Limitations - ACRONYM: Lower F1 (88.89%) - limited examples (46) - CONTROL_ID: Requires hybrid regex approach - Domain-specific: Optimized for cybersecurity text - Context-dependent ambiguity on some terms ## License MIT ## Version History | Version | Date | F1 | Examples | Notes | |---------|------|-----|----------|-------| | v5 | 2025-12-29 | 98.31% | 1922 | Hyperparameter tuning | | v4 | 2025-12-29 | 98.06% | 1922 | Stratified split, domain RoBERTa | | v3 | 2025-01 | 69.4% | 1000 | spaCy 3.x migration | | v2 | 2024-12 | 99.5%* | 1805 | spaCy 2.x (*train accuracy) | ## Contact Issues: Model repository