language:
- es
pipeline_tag: token-classification
tags:
- biomedical
- ner
- clinical
- ehr
- cardiology
- nlp
- edsnlp
- caradioccc
license: apache-2.0
library_name: edsnlp
base_model:
- PlanTL-GOB-ES/bsc-bio-ehr-es
model-index:
- name: Aremaki/eds-ner-cardioccc
results:
- task:
type: token-classification
dataset:
name: CardioCCC
type: public
metrics:
- type: precision
name: Token Scores / MEDICATION / Precision
value: 0.93
- type: recall
name: Token Scores / MEDICATION / Recall
value: 0.94
- type: f1
name: Token Scores / MEDICATION / F1
value: 0.93
- type: precision
name: Token Scores / PROCEDURE / Precision
value: 0.85
- type: recall
name: Token Scores / PROCEDURE / Recall
value: 0.85
- type: f1
name: Token Scores / PROCEDURE / F1
value: 0.85
- type: precision
name: Token Scores / DISEASE / Precision
value: 0.82
- type: recall
name: Token Scores / DISEASE / Recall
value: 0.82
- type: f1
name: Token Scores / DISEASE / F1
value: 0.82
- type: precision
name: Token Scores / SYMPTOM / Precision
value: 0.8
- type: recall
name: Token Scores / SYMPTOM / Recall
value: 0.81
- type: f1
name: Token Scores / SYMPTOM / F1
value: 0.8
EDS-NER-CARDIOCCC
This repository contains the final NER model trained on the CardioCCC dataset. CardioCCC is a collection of cardiology clinical case reports used for domain adaptation. Clinical case reports are a textual genre in medicine that describe a patient鈥檚 medical history, symptoms, diagnosis, and treatment in detail.
The model implementation is based on EDS-NLP, a library developed by the data science team of the Greater Paris University Hospitals (AP-HP) for clinical natural language processing.
The entities that are detected are listed below.
| Label | Description |
|---|---|
MEDICATION |
Names of drugs or chemical substances used in treatment, e.g., Metformina. |
PROCEDURE |
Medical or surgical procedures performed on a patient, e.g., biopsia, radiograf铆a. |
DISEASE |
Diagnosed diseases or medical conditions, e.g., diabetes mellitus, hipertensi贸n. |
SYMPTOM |
Reported signs or symptoms experienced by a patient, e.g., fiebre, dolor de cabeza. |
Quickstart
Install the latest version of edsnlp
pip install "edsnlp[ml]" -ULoad the model
import edsnlp nlp = edsnlp.load("Aremaki/eds-ner-cardioccc", auto_update=True) doc = nlp( "La paciente con diabetes mellitus " "present贸 fiebre y se le realiz贸 " "una radiograf铆a antes de tomar metformina. " ) for ent in doc.ents: print(ent, ent.label_)
To apply the model on many documents using one or more GPUs, refer to the documentation of edsnlp.
Metrics
| Token Scores | Precision | Recall | F1 |
|---|---|---|---|
| MEDICATION | 93.0 | 94.0 | 93.0 |
| PROCEDURE | 85.0 | 85.0 | 85.0 |
| DISEASE | 82.0 | 82.0 | 82.0 |
| SYMPTOM | 80.0 | 81.0 | 80.0 |
Installation to reproduce
If you'd like to reproduce eds-ner-cardioccc's training or contribute to its development, you should first clone it:
git clone https://github.com/Aremaki/eds_ner_cardioccc.git
cd eds_ner_cardioccc
Acknowledgement
We would like to thank the Life science team at the Barcelona Supercomputing Center (BSC) who designed the CardioCCC dataset and trained the base model bsc-bio-ehr-es We would like to thank the data science team of the Greater Paris University Hospitals (AP-HP) who developped the EDS-NLP library.