File size: 8,178 Bytes
ff6ae3e
 
 
 
4f4f2b5
ff6ae3e
 
 
 
 
 
11cf15b
 
 
 
 
 
 
 
 
7e9fdc7
 
 
ff6ae3e
11cf15b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f4f2b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff6ae3e
 
 
 
 
 
 
455b222
af9314f
 
 
 
 
 
 
 
 
 
 
 
455b222
9a300d1
455b222
9a300d1
455b222
9a300d1
 
 
455b222
9a300d1
455b222
 
 
52a48df
455b222
52a48df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
455b222
 
 
 
 
 
 
 
 
 
ff6ae3e
 
 
4f4f2b5
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- trl
- code
- 'medical '
- farmer
- doctor
- Mega-Series
- Cyber-Series
- Role-Play
- Self-Rag
- ThinkingBot
- milestone
- mega-series
- SpydazWebAI
base_model: LeroyDyer/Mixtral_AI_CyberTron_Ultra
datasets:
- gretelai/synthetic_text_to_sql
- HuggingFaceTB/cosmopedia
- teknium/OpenHermes-2.5
- Open-Orca/SlimOrca
- Open-Orca/OpenOrca
- cognitivecomputations/dolphin-coder
- databricks/databricks-dolly-15k
- yahma/alpaca-cleaned
- uonlp/CulturaX
- mwitiderrick/SwahiliPlatypus
- swahili
- Rogendo/English-Swahili-Sentence-Pairs
- ise-uiuc/Magicoder-Evol-Instruct-110K
- meta-math/MetaMathQA
metrics:
- accuracy
- bertscore
- bleu
- brier_score
- cer
- character
- charcut_mt
- chrf
- code_eval
model-index:
- name: SpydazWeb_AI_CyberTron_Ultra_7b
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 15.56
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 27.75
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 1.36
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 5.7
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 10.3
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 20.73
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
      name: Open LLM Leaderboard
---

# Uploaded  model

- **Developed by:** LeroyDyer
- **License:** apache-2.0
- **Finetuned from model :** LeroyDyer/Mixtral_AI_CyberTron_Ultra

[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
https://github.com/spydaz


* The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

* Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1

    * 32k context window (vs 8k context in v0.1)
    * Rope-theta = 1e6
    * No Sliding-Window Attention


# What does he NOT KNOW ! that is the question!

### MOTTO FOR MODEL!

## Models are the same as loras , take them with light weight they are like tablets of knowledge!  
Exactly ! ( models / loras ? is there a difference ? only mega merges make a true difference ! 
                      the small merges are just applying an adapter lol - Its in there somewhere?) 

### Ok Its a Great MODEL ! (My Favorite Goto Brain now ! - will be fine tuned even more ! (if i get cloud credits)) 



Highly Math Trained As well as many TextBooks and Lessons Highly fit datasets  as well as Coding Datasets highly tuned! 

This model has absorbed all its previous generations as well as ALL high performers and Specialist models (mistral) It has absorb many foriegn languge models and still stays as an english model !

Very impressive responses Short and long as also it was trained on some binary datasets to return a direct answer! and others to perform step by step response as wel as other to perform interactive response with clients for vairous tasks, such as product design and system design discussion:

Finacial information and other finacial tasks have been highly tunes also : Infact when returning to previous aligned datasets they stayed in line and was sdtill able to achieve High tuning!
Hence a process of merging with a specific topic or role and then training for the role and topic on themed data, hence previous itterations heavily tuned for medical or law or role play as the conception was that intergating the model into a single enity may even corrput them , so the decision to seperate concerns was taken :
This enabled for ssstrategic merging and tuning !

Concepts : chain of thought and functin calling Self rag ! Thoughts , emotive responses have been enhance where possibel with the data given . even sexy books have been highly tuned into the model : 
but also i think american genera books (sci fi, fantasy, romance novels are required) for great role play which some expect: )
I have recently seen a strategy in which prompts can be embedded into the adapter to Trigger Specific Roles : 
I hae tried to remove such prompting as you are a helpful ai to a character theme instead such as you are a cyber hacker by day and business man by night ! ie to give the model various internal personas !
after some training i noticed it was also talking to itself !! (rehersing) but the tokens for thought were missing so it lookeed strange until i noticed the bug; 
After removing the thought tokens they were displayed in the output as the tokenizer was masking them !

But Still a Great Model , Given a Task based data set it Coverges Super quickly hence my enjoyment of the model as training of it is super quick !
Now when ii load up datasets : they are generally only a few bad steps before it begins to drop below zero maintaining a steady 0.6 etc whilst loading the unnseen new dataset , hence not needing so many epochs to adjust the matrix to the new information !

Im not sure if Lora actually works when you save them but i do save some and use them to load models for training ! as they are jump starts for model which did not recive that fine tuning , they can be merged and alligned ! (probably thiey are Good! )













This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/LeroyDyer__SpydazWeb_AI_CyberTron_Ultra_7b-details)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |13.57|
|IFEval (0-Shot)    |15.56|
|BBH (3-Shot)       |27.75|
|MATH Lvl 5 (4-Shot)| 1.36|
|GPQA (0-shot)      | 5.70|
|MuSR (0-shot)      |10.30|
|MMLU-PRO (5-shot)  |20.73|