Update README.md
Browse files
README.md
CHANGED
|
@@ -5,19 +5,19 @@ language:
|
|
| 5 |
tags:
|
| 6 |
- LLM
|
| 7 |
- tensorRT
|
| 8 |
-
-
|
| 9 |
---
|
| 10 |
## Model Card for lyraChatGLM
|
| 11 |
|
| 12 |
-
lyraChatGLM is currently the **fastest
|
| 13 |
|
| 14 |
-
The inference speed of lyraChatGLM
|
| 15 |
|
| 16 |
Among its main features are:
|
| 17 |
|
| 18 |
- weights: original ChatGLM-6B weights released by THUDM.
|
| 19 |
-
- device: lyraChatGLM is mainly based on FasterTransformer compiled for SM=80 (A100, for example).
|
| 20 |
-
- batch_size:
|
| 21 |
|
| 22 |
## Speed
|
| 23 |
|
|
@@ -87,7 +87,7 @@ print(res)
|
|
| 87 |
``` bibtex
|
| 88 |
@Misc{lyraChatGLM2023,
|
| 89 |
author = {Kangjian Wu, Zhengtao Wang, Bin Wu},
|
| 90 |
-
title = {
|
| 91 |
howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
|
| 92 |
year = {2023}
|
| 93 |
}
|
|
|
|
| 5 |
tags:
|
| 6 |
- LLM
|
| 7 |
- tensorRT
|
| 8 |
+
- ChatGLM
|
| 9 |
---
|
| 10 |
## Model Card for lyraChatGLM
|
| 11 |
|
| 12 |
+
lyraChatGLM is currently the **fastest ChatGLM-6B** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
|
| 13 |
|
| 14 |
+
The inference speed of lyraChatGLM has achieved **10x** acceleration upon the original version. We are still working hard to further improve the performance.
|
| 15 |
|
| 16 |
Among its main features are:
|
| 17 |
|
| 18 |
- weights: original ChatGLM-6B weights released by THUDM.
|
| 19 |
+
- device: lyraChatGLM is mainly based on FasterTransformer compiled for SM=80 (A100, for example), but a lot faster.
|
| 20 |
+
- batch_size: compiled with dynamic batch size, max batch_size = 8
|
| 21 |
|
| 22 |
## Speed
|
| 23 |
|
|
|
|
| 87 |
``` bibtex
|
| 88 |
@Misc{lyraChatGLM2023,
|
| 89 |
author = {Kangjian Wu, Zhengtao Wang, Bin Wu},
|
| 90 |
+
title = {lyraChatGLM: Accelerating ChatGLM by 10x+},
|
| 91 |
howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
|
| 92 |
year = {2023}
|
| 93 |
}
|