Improve language tag

#3
by lbourdois - opened
Files changed (1) hide show
  1. README.md +126 -114
README.md CHANGED
@@ -1,115 +1,127 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - zh
5
- base_model:
6
- - Qwen/Qwen2.5-7B-Instruct
7
- pipeline_tag: feature-extraction
8
- tags:
9
- - structuring
10
- - EHR
11
- - medical
12
- - IE
13
- ---
14
- # Model Card for GENIE
15
-
16
-
17
- ## Model Details
18
-
19
- Model Size: 7B
20
-
21
- Max Tokens: 8192
22
-
23
- Base model: Qwen 2.5 7B
24
-
25
- ### Model Description
26
-
27
- GENIE (Generative Note Information Extraction, 中文名:病历精灵) is an end-to-end model designed to structure free text from electronic health records (EHRs). It processes EHRs in a single pass, extracting biomedical named entities along with their assertion statuses, body locations, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format. This streamlined approach simplifies traditional natural language processing workflows by replacing all the analysis components with a single model, making the system easier to maintain while leveraging the advanced analytical capabilities of large language models (LLMs). Comparing with general-purpose LLMs, GENIE does not require prompt engineering or few-shot examples. Additionally, it generates all relevant attributes in one pass, significantly reducing both runtime and operational costs.
28
- GENIE is co-developed by the groups of Sheng Yu (https://www.stat.tsinghua.edu.cn/teachers/shengyu/), Tianxi Cai (https://dbmi.hms.harvard.edu/people/tianxi-cai), and Isaac Kohane (https://dbmi.hms.harvard.edu/people/isaac-kohane).
29
-
30
-
31
- ## Usage
32
-
33
- ```python
34
- from vllm import LLM, SamplingParams
35
-
36
- PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:\n"
37
- sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
38
- EHR = ['xxxxx1','xxxxx2']
39
- texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
40
- output = model.generate(texts, sampling_params)
41
- ```
42
-
43
- # An example
44
-
45
- Input:
46
- ```python
47
- EHR = ['慢性乙型肝炎病史10余年,曾有肝功能异常,中医治疗后好转;1年余前查HBsAg转阴,但肝脏病理提示病毒性肝炎伴肝纤维化(G1S3-4)']
48
- ```
49
-
50
- Output:
51
- ```python
52
- res = [
53
- { "术语": "慢性乙型肝炎",
54
- "语义类型": "疾病、综合征、病理功能",
55
- "叙述状态": "存在",
56
- "身体部位": "无",
57
- "数值": "NA",
58
- "单位": "NA",
59
- "修饰词": "无" },
60
- { "术语": "肝功能异常",
61
- "语义类型": "症状、体征、临床所见",
62
- "叙述状态": "存在",
63
- "身体部位": "无",
64
- "数值": "NA",
65
- "单位": "NA",
66
- "修饰词": "" },
67
- { "术语": "HBsAg",
68
- "语义类型": "化学物质、药物",
69
- "叙述状态": "不存在",
70
- "身体部位": "NA",
71
- "数值": "无",
72
- "单位": "NA",
73
- "修饰词": "NA" },
74
- { "术语": "肝脏病理",
75
- "语义类型": "诊断操作",
76
- "叙述状态": "存在",
77
- "身体部位": "",
78
- "数值": "无",
79
- "单位": "NA",
80
- "修饰词": "NA" },
81
- { "术语": "病毒性肝炎",
82
- "语义类型": "疾病、综合征、病理功能",
83
- "叙述状态": "存在",
84
- "身体部位": "",
85
- "数值": "NA",
86
- "单位": "NA",
87
- "修饰词": "" },
88
- { "术语": "肝纤维化",
89
- "语义类型": "疾病、综合征、病理功能",
90
- "叙述状态": "存在",
91
- "身体部位": "",
92
- "数值": "NA",
93
- "单位": "NA",
94
- "修饰词": "" },
95
- ]
96
- ```
97
-
98
-
99
-
100
- ## Citation
101
-
102
- If you find our paper or models helpful, please consider cite:
103
-
104
- **BibTeX:**
105
- ```
106
- @misc{ying2025geniegenerativenoteinformation,
107
- title={GENIE: Generative Note Information Extraction model for structuring EHR data},
108
- author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
109
- year={2025},
110
- eprint={2501.18435},
111
- archivePrefix={arXiv},
112
- primaryClass={cs.CL},
113
- url={https://arxiv.org/abs/2501.18435},
114
- }
 
 
 
 
 
 
 
 
 
 
 
 
115
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-7B-Instruct
19
+ pipeline_tag: feature-extraction
20
+ tags:
21
+ - structuring
22
+ - EHR
23
+ - medical
24
+ - IE
25
+ ---
26
+ # Model Card for GENIE
27
+
28
+
29
+ ## Model Details
30
+
31
+ Model Size: 7B
32
+
33
+ Max Tokens: 8192
34
+
35
+ Base model: Qwen 2.5 7B
36
+
37
+ ### Model Description
38
+
39
+ GENIE (Generative Note Information Extraction, 中文名:病历精灵) is an end-to-end model designed to structure free text from electronic health records (EHRs). It processes EHRs in a single pass, extracting biomedical named entities along with their assertion statuses, body locations, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format. This streamlined approach simplifies traditional natural language processing workflows by replacing all the analysis components with a single model, making the system easier to maintain while leveraging the advanced analytical capabilities of large language models (LLMs). Comparing with general-purpose LLMs, GENIE does not require prompt engineering or few-shot examples. Additionally, it generates all relevant attributes in one pass, significantly reducing both runtime and operational costs.
40
+ GENIE is co-developed by the groups of Sheng Yu (https://www.stat.tsinghua.edu.cn/teachers/shengyu/), Tianxi Cai (https://dbmi.hms.harvard.edu/people/tianxi-cai), and Isaac Kohane (https://dbmi.hms.harvard.edu/people/isaac-kohane).
41
+
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from vllm import LLM, SamplingParams
47
+
48
+ PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:\n"
49
+ sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
50
+ EHR = ['xxxxx1','xxxxx2']
51
+ texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
52
+ output = model.generate(texts, sampling_params)
53
+ ```
54
+
55
+ # An example
56
+
57
+ Input:
58
+ ```python
59
+ EHR = ['慢性乙型肝炎病史10余年,曾有肝功能异常,中医治疗后好转;1年余前查HBsAg转阴,但肝脏病理提示病毒性肝炎伴肝纤维化(G1S3-4)']
60
+ ```
61
+
62
+ Output:
63
+ ```python
64
+ res = [
65
+ { "术语": "慢性乙型肝炎",
66
+ "语义类型": "疾病、综合征、病理功能",
67
+ "叙述状态": "存在",
68
+ "身体部位": "",
69
+ "数值": "NA",
70
+ "单位": "NA",
71
+ "修饰词": "无" },
72
+ { "术语": "肝功能异常",
73
+ "语义类型": "症状、体征、临床所见",
74
+ "叙述状态": "存在",
75
+ "身体部位": "",
76
+ "数值": "NA",
77
+ "单位": "NA",
78
+ "修饰词": "无" },
79
+ { "术语": "HBsAg",
80
+ "语义类型": "化学物质、药物",
81
+ "叙述状态": "不存在",
82
+ "身体部位": "NA",
83
+ "数值": "",
84
+ "单位": "NA",
85
+ "修饰词": "NA" },
86
+ { "术语": "肝脏病理",
87
+ "语义类型": "诊断操作",
88
+ "叙述状态": "存在",
89
+ "身体部位": "",
90
+ "数值": "",
91
+ "单位": "NA",
92
+ "修饰词": "NA" },
93
+ { "术语": "病毒性肝炎",
94
+ "语义类型": "疾病、综合征、病理功能",
95
+ "叙述状态": "存在",
96
+ "身体部位": "无",
97
+ "数值": "NA",
98
+ "单位": "NA",
99
+ "修饰词": "无" },
100
+ { "术语": "肝纤维化",
101
+ "语义类型": "疾病、综合征、病理功能",
102
+ "叙述状态": "存在",
103
+ "身体部位": "无",
104
+ "数值": "NA",
105
+ "单位": "NA",
106
+ "修饰词": "无" },
107
+ ]
108
+ ```
109
+
110
+
111
+
112
+ ## Citation
113
+
114
+ If you find our paper or models helpful, please consider cite:
115
+
116
+ **BibTeX:**
117
+ ```
118
+ @misc{ying2025geniegenerativenoteinformation,
119
+ title={GENIE: Generative Note Information Extraction model for structuring EHR data},
120
+ author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
121
+ year={2025},
122
+ eprint={2501.18435},
123
+ archivePrefix={arXiv},
124
+ primaryClass={cs.CL},
125
+ url={https://arxiv.org/abs/2501.18435},
126
+ }
127
  ```