Upload folder using huggingface_hub
Browse files- README.md +124 -3
- config.json +89 -0
- label_to_emocontext_classes.json +30 -0
- labels.txt +28 -0
- merges.txt +0 -0
- model.safetensors +3 -0
- special_tokens_map.json +51 -0
- tokenizer.json +0 -0
- tokenizer_config.json +64 -0
- trainer_state.json +119 -0
- training_args.bin +3 -0
- vocab.json +0 -0
README.md
CHANGED
|
@@ -1,3 +1,124 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
base_model: roberta-large-emopillars-contextual
|
| 4 |
+
metrics:
|
| 5 |
+
- f1
|
| 6 |
+
model-index:
|
| 7 |
+
- name: roberta-large-emopillars-contextual-emocontext
|
| 8 |
+
results: []
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# roberta-large-emopillars-contextual-emocontext
|
| 12 |
+
|
| 13 |
+
This model is a fine-tuned version of [roberta-large-emopillars-contextual](https://huggingface.co/alex-shvets/roberta-large-emopillars-contextual) on the [EmoContext dataset](https://paperswithcode.com/dataset/emocontext).
|
| 14 |
+
|
| 15 |
+
<img src="https://huggingface.co/datasets/alex-shvets/images/resolve/main/emopillars_color_2.png" width="450">
|
| 16 |
+
|
| 17 |
+
## Model description
|
| 18 |
+
|
| 19 |
+
The model is a multi-label classifier over 28 emotional classes for a context-aware scenario, fine-tuned on a dataset of 4 classes (_angry_, _sad_, _happy_, and _others_) that we initially relabelled (see _Training data_ for details). The resulting model takes as input either a context concatenated with a character description and an utterance, and extracts emotions only from the utterance, or a three-turn dialogue and identifies emotions in the last utterance.
|
| 20 |
+
|
| 21 |
+
## How to use
|
| 22 |
+
|
| 23 |
+
Here is how to use this model:
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
>>> import torch
|
| 27 |
+
>>> from transformers import pipeline
|
| 28 |
+
>>> model_name = "roberta-large-emopillars-contextual-emocontext"
|
| 29 |
+
>>> threshold = 0.10
|
| 30 |
+
>>> emotions = [
|
| 31 |
+
>>> "admiration", "amusement", "anger", "annoyance", "approval", "caring", "confusion",
|
| 32 |
+
>>> "curiosity", "desire", "disappointment", "disapproval", "disgust", "embarrassment",
|
| 33 |
+
>>> "excitement", "fear", "gratitude", "grief", "joy", "love", "nervousness", "optimism",
|
| 34 |
+
>>> "pride", "realization", "relief", "remorse", "sadness", "surprise", "neutral"
|
| 35 |
+
>>> ]
|
| 36 |
+
>>> label_to_emotion = dict(zip(list(range(len(emotions))), emotions))
|
| 37 |
+
>>> emotion_to_emocontext = dict(zip(emotions, ["others"]*len(emotions)))
|
| 38 |
+
>>> emotion_to_emocontext.update({
|
| 39 |
+
>>> "anger": "angry",
|
| 40 |
+
>>> "sadness": "sad",
|
| 41 |
+
>>> "joy": "happy"
|
| 42 |
+
>>> })
|
| 43 |
+
>>> device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
|
| 44 |
+
>>> pipe = pipeline("text-classification", model=model_name, truncation=True,
|
| 45 |
+
>>> return_all_scores=True, device=-1 if device.type=="cpu" else 0)
|
| 46 |
+
>>> # input in a format f"{context} {character}: \"{utterance}\""
|
| 47 |
+
>>> # alternative input format: f"{persona1}: {utterance1}, {persona2}:\"{utterance2}\"\n{persona1}: \"{utterance3}\""
|
| 48 |
+
>>> utterances_in_contexts = [
|
| 49 |
+
>>> "A user watched a video of a musical performance on YouTube. This user expresses an opinion and thoughts. User: \"Ok is it just me or is anyone else getting goosebumps too???\"",
|
| 50 |
+
>>> "User: \"But...\", Conversational Agent: \"then\"\nUser: \"I’m feeling nervous\""
|
| 51 |
+
>>> ]
|
| 52 |
+
>>> outcome = pipe(utterances_in_contexts)
|
| 53 |
+
>>> dominant_classes = [
|
| 54 |
+
>>> [prediction for prediction in example if prediction['score'] >= threshold]
|
| 55 |
+
>>> for example in outcome
|
| 56 |
+
>>> ]
|
| 57 |
+
>>> for example in dominant_classes:
|
| 58 |
+
>>> print(", ".join([
|
| 59 |
+
>>> "%s (%s): %.2lf" % (
|
| 60 |
+
>>> label_to_emotion[int(prediction['label'])],
|
| 61 |
+
>>> emotion_to_emocontext[label_to_emotion[int(prediction['label'])]],
|
| 62 |
+
>>> prediction['score']
|
| 63 |
+
>>> )
|
| 64 |
+
>>> for prediction in sorted(example, key=lambda x: x['score'], reverse=True)
|
| 65 |
+
>>> ]))
|
| 66 |
+
excitement (others): 0.73, joy (happy): 0.23
|
| 67 |
+
sadness (sad): 0.89, nervousness (others): 0.12
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## Training data
|
| 71 |
+
|
| 72 |
+
The training data consists of 30,157 samples of the [EmoContext dataset](https://paperswithcode.com/dataset/emocontext). We relabelled the _others_ examples in the training set by choosing the most probable label predicted by our base 28-class contextual model (see our paper for details).
|
| 73 |
+
|
| 74 |
+
## Training procedure
|
| 75 |
+
|
| 76 |
+
### Training hyperparameters
|
| 77 |
+
|
| 78 |
+
The following hyperparameters were used during training:
|
| 79 |
+
- learning_rate: 2e-05
|
| 80 |
+
- train_batch_size: 16
|
| 81 |
+
- eval_batch_size: 8
|
| 82 |
+
- seed: 752
|
| 83 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 84 |
+
- lr_scheduler_type: linear
|
| 85 |
+
- num_epochs: 3.0
|
| 86 |
+
|
| 87 |
+
### Framework versions
|
| 88 |
+
|
| 89 |
+
- Transformers 4.45.0.dev0
|
| 90 |
+
- Pytorch 2.4.0a0+gite3b9b71
|
| 91 |
+
- Datasets 2.21.0
|
| 92 |
+
- Tokenizers 0.19.1
|
| 93 |
+
|
| 94 |
+
## Evaluation
|
| 95 |
+
|
| 96 |
+
Scores for the evaluation on the EmoContext dev split:
|
| 97 |
+
|
| 98 |
+
| **class** | **precision**| **recall** | **f1-score** | **support** |
|
| 99 |
+
| :--- | :---: | :---: | :---: | ---: |
|
| 100 |
+
| angry | 0.76 | 0.80 | 0.78 | 150 |
|
| 101 |
+
| sad | 0.81 | 0.80 | 0.81 | 125 |
|
| 102 |
+
| happy | 0.72 | 0.75 | 0.73 | 142 |
|
| 103 |
+
| others | 0.96 | 0.96 | 0.96 | 2335 |
|
| 104 |
+
| **micro avg** | 0.93 | 0.93 | 0.93 | 2752 |
|
| 105 |
+
| **macro avg** | 0.81 | 0.83 | 0.82 | 2752 |
|
| 106 |
+
| **weighted avg** | 0.93 | 0.93 | 0.93 | 2752 |
|
| 107 |
+
| **samples avg** | 0.93 | 0.93 | 0.93 | 2752 |
|
| 108 |
+
|
| 109 |
+
For more details on the evaluation, please visit our [GitHub repository](https://github.com/alex-shvets/emopillars).
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
## Disclaimer
|
| 113 |
+
|
| 114 |
+
<details>
|
| 115 |
+
|
| 116 |
+
<summary>Click to expand</summary>
|
| 117 |
+
|
| 118 |
+
The model published in this repository is intended for a generalist purpose and is available to third parties. This model may have bias and/or any other undesirable distortions.
|
| 119 |
+
|
| 120 |
+
When third parties deploy or provide systems and/or services to other parties using this model (or using systems based on this model) or become users of the model, they should note that it is their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
|
| 121 |
+
|
| 122 |
+
In no event shall the creator of the model be liable for any results arising from the use made by third parties of this model.
|
| 123 |
+
|
| 124 |
+
</details>
|
config.json
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "roberta_mistfull_64b_10ep752_context_actor_rewutt",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"RobertaForSequenceClassification"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.1,
|
| 7 |
+
"bos_token_id": 0,
|
| 8 |
+
"classifier_dropout": null,
|
| 9 |
+
"eos_token_id": 2,
|
| 10 |
+
"finetuning_task": "text-classification",
|
| 11 |
+
"hidden_act": "gelu",
|
| 12 |
+
"hidden_dropout_prob": 0.1,
|
| 13 |
+
"hidden_size": 1024,
|
| 14 |
+
"id2label": {
|
| 15 |
+
"0": "0",
|
| 16 |
+
"1": "1",
|
| 17 |
+
"2": "10",
|
| 18 |
+
"3": "11",
|
| 19 |
+
"4": "12",
|
| 20 |
+
"5": "13",
|
| 21 |
+
"6": "14",
|
| 22 |
+
"7": "15",
|
| 23 |
+
"8": "16",
|
| 24 |
+
"9": "17",
|
| 25 |
+
"10": "18",
|
| 26 |
+
"11": "19",
|
| 27 |
+
"12": "2",
|
| 28 |
+
"13": "20",
|
| 29 |
+
"14": "21",
|
| 30 |
+
"15": "22",
|
| 31 |
+
"16": "23",
|
| 32 |
+
"17": "24",
|
| 33 |
+
"18": "25",
|
| 34 |
+
"19": "26",
|
| 35 |
+
"20": "27",
|
| 36 |
+
"21": "3",
|
| 37 |
+
"22": "4",
|
| 38 |
+
"23": "5",
|
| 39 |
+
"24": "6",
|
| 40 |
+
"25": "7",
|
| 41 |
+
"26": "8",
|
| 42 |
+
"27": "9"
|
| 43 |
+
},
|
| 44 |
+
"initializer_range": 0.02,
|
| 45 |
+
"intermediate_size": 4096,
|
| 46 |
+
"label2id": {
|
| 47 |
+
"0": 0,
|
| 48 |
+
"1": 1,
|
| 49 |
+
"10": 2,
|
| 50 |
+
"11": 3,
|
| 51 |
+
"12": 4,
|
| 52 |
+
"13": 5,
|
| 53 |
+
"14": 6,
|
| 54 |
+
"15": 7,
|
| 55 |
+
"16": 8,
|
| 56 |
+
"17": 9,
|
| 57 |
+
"18": 10,
|
| 58 |
+
"19": 11,
|
| 59 |
+
"2": 12,
|
| 60 |
+
"20": 13,
|
| 61 |
+
"21": 14,
|
| 62 |
+
"22": 15,
|
| 63 |
+
"23": 16,
|
| 64 |
+
"24": 17,
|
| 65 |
+
"25": 18,
|
| 66 |
+
"26": 19,
|
| 67 |
+
"27": 20,
|
| 68 |
+
"3": 21,
|
| 69 |
+
"4": 22,
|
| 70 |
+
"5": 23,
|
| 71 |
+
"6": 24,
|
| 72 |
+
"7": 25,
|
| 73 |
+
"8": 26,
|
| 74 |
+
"9": 27
|
| 75 |
+
},
|
| 76 |
+
"layer_norm_eps": 1e-05,
|
| 77 |
+
"max_position_embeddings": 514,
|
| 78 |
+
"model_type": "roberta",
|
| 79 |
+
"num_attention_heads": 16,
|
| 80 |
+
"num_hidden_layers": 24,
|
| 81 |
+
"pad_token_id": 1,
|
| 82 |
+
"position_embedding_type": "absolute",
|
| 83 |
+
"problem_type": "multi_label_classification",
|
| 84 |
+
"torch_dtype": "float32",
|
| 85 |
+
"transformers_version": "4.45.0.dev0",
|
| 86 |
+
"type_vocab_size": 1,
|
| 87 |
+
"use_cache": true,
|
| 88 |
+
"vocab_size": 50265
|
| 89 |
+
}
|
label_to_emocontext_classes.json
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"anger": "angry",
|
| 3 |
+
"sadness": "sad",
|
| 4 |
+
"joy": "happy",
|
| 5 |
+
"admiration": "others",
|
| 6 |
+
"amusement": "others",
|
| 7 |
+
"annoyance": "others",
|
| 8 |
+
"approval": "others",
|
| 9 |
+
"caring": "others",
|
| 10 |
+
"confusion": "others",
|
| 11 |
+
"curiosity": "others",
|
| 12 |
+
"desire": "others",
|
| 13 |
+
"disappointment": "others",
|
| 14 |
+
"disapproval": "others",
|
| 15 |
+
"disgust": "others",
|
| 16 |
+
"embarrassment": "others",
|
| 17 |
+
"excitement": "others",
|
| 18 |
+
"fear": "others",
|
| 19 |
+
"gratitude": "others",
|
| 20 |
+
"grief": "others",
|
| 21 |
+
"love": "others",
|
| 22 |
+
"nervousness": "others",
|
| 23 |
+
"optimism": "others",
|
| 24 |
+
"pride": "others",
|
| 25 |
+
"realization": "others",
|
| 26 |
+
"relief": "others",
|
| 27 |
+
"remorse": "others",
|
| 28 |
+
"surprise": "others",
|
| 29 |
+
"neutral": "others"
|
| 30 |
+
}
|
labels.txt
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
admiration
|
| 2 |
+
amusement
|
| 3 |
+
anger
|
| 4 |
+
annoyance
|
| 5 |
+
approval
|
| 6 |
+
caring
|
| 7 |
+
confusion
|
| 8 |
+
curiosity
|
| 9 |
+
desire
|
| 10 |
+
disappointment
|
| 11 |
+
disapproval
|
| 12 |
+
disgust
|
| 13 |
+
embarrassment
|
| 14 |
+
excitement
|
| 15 |
+
fear
|
| 16 |
+
gratitude
|
| 17 |
+
grief
|
| 18 |
+
joy
|
| 19 |
+
love
|
| 20 |
+
nervousness
|
| 21 |
+
optimism
|
| 22 |
+
pride
|
| 23 |
+
realization
|
| 24 |
+
relief
|
| 25 |
+
remorse
|
| 26 |
+
sadness
|
| 27 |
+
surprise
|
| 28 |
+
neutral
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d96256a9cb2f0522a96ec69197482b5629a5dbe880de6df0dd04c61b3ab01db5
|
| 3 |
+
size 1421602016
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<s>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": true,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"cls_token": {
|
| 10 |
+
"content": "<s>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": true,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"eos_token": {
|
| 17 |
+
"content": "</s>",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": true,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
},
|
| 23 |
+
"mask_token": {
|
| 24 |
+
"content": "<mask>",
|
| 25 |
+
"lstrip": true,
|
| 26 |
+
"normalized": false,
|
| 27 |
+
"rstrip": false,
|
| 28 |
+
"single_word": false
|
| 29 |
+
},
|
| 30 |
+
"pad_token": {
|
| 31 |
+
"content": "<pad>",
|
| 32 |
+
"lstrip": false,
|
| 33 |
+
"normalized": true,
|
| 34 |
+
"rstrip": false,
|
| 35 |
+
"single_word": false
|
| 36 |
+
},
|
| 37 |
+
"sep_token": {
|
| 38 |
+
"content": "</s>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": true,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false
|
| 43 |
+
},
|
| 44 |
+
"unk_token": {
|
| 45 |
+
"content": "<unk>",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": true,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false
|
| 50 |
+
}
|
| 51 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"0": {
|
| 5 |
+
"content": "<s>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": true,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": true
|
| 11 |
+
},
|
| 12 |
+
"1": {
|
| 13 |
+
"content": "<pad>",
|
| 14 |
+
"lstrip": false,
|
| 15 |
+
"normalized": true,
|
| 16 |
+
"rstrip": false,
|
| 17 |
+
"single_word": false,
|
| 18 |
+
"special": true
|
| 19 |
+
},
|
| 20 |
+
"2": {
|
| 21 |
+
"content": "</s>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": true,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false,
|
| 26 |
+
"special": true
|
| 27 |
+
},
|
| 28 |
+
"3": {
|
| 29 |
+
"content": "<unk>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": true,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false,
|
| 34 |
+
"special": true
|
| 35 |
+
},
|
| 36 |
+
"50264": {
|
| 37 |
+
"content": "<mask>",
|
| 38 |
+
"lstrip": true,
|
| 39 |
+
"normalized": false,
|
| 40 |
+
"rstrip": false,
|
| 41 |
+
"single_word": false,
|
| 42 |
+
"special": true
|
| 43 |
+
}
|
| 44 |
+
},
|
| 45 |
+
"bos_token": "<s>",
|
| 46 |
+
"clean_up_tokenization_spaces": true,
|
| 47 |
+
"cls_token": "<s>",
|
| 48 |
+
"eos_token": "</s>",
|
| 49 |
+
"errors": "replace",
|
| 50 |
+
"mask_token": "<mask>",
|
| 51 |
+
"max_length": 512,
|
| 52 |
+
"model_max_length": 512,
|
| 53 |
+
"pad_to_multiple_of": null,
|
| 54 |
+
"pad_token": "<pad>",
|
| 55 |
+
"pad_token_type_id": 0,
|
| 56 |
+
"padding_side": "right",
|
| 57 |
+
"sep_token": "</s>",
|
| 58 |
+
"stride": 0,
|
| 59 |
+
"tokenizer_class": "RobertaTokenizer",
|
| 60 |
+
"trim_offsets": true,
|
| 61 |
+
"truncation_side": "left",
|
| 62 |
+
"truncation_strategy": "longest_first",
|
| 63 |
+
"unk_token": "<unk>"
|
| 64 |
+
}
|
trainer_state.json
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"best_metric": null,
|
| 3 |
+
"best_model_checkpoint": null,
|
| 4 |
+
"epoch": 3.0,
|
| 5 |
+
"eval_steps": 500,
|
| 6 |
+
"global_step": 5655,
|
| 7 |
+
"is_hyper_param_search": false,
|
| 8 |
+
"is_local_process_zero": true,
|
| 9 |
+
"is_world_process_zero": true,
|
| 10 |
+
"log_history": [
|
| 11 |
+
{
|
| 12 |
+
"epoch": 0.26525198938992045,
|
| 13 |
+
"grad_norm": 1.645719051361084,
|
| 14 |
+
"learning_rate": 1.8231653404067196e-05,
|
| 15 |
+
"loss": 0.057,
|
| 16 |
+
"step": 500
|
| 17 |
+
},
|
| 18 |
+
{
|
| 19 |
+
"epoch": 0.5305039787798409,
|
| 20 |
+
"grad_norm": 1.2083187103271484,
|
| 21 |
+
"learning_rate": 1.6463306808134398e-05,
|
| 22 |
+
"loss": 0.0443,
|
| 23 |
+
"step": 1000
|
| 24 |
+
},
|
| 25 |
+
{
|
| 26 |
+
"epoch": 0.7957559681697612,
|
| 27 |
+
"grad_norm": 0.7489728331565857,
|
| 28 |
+
"learning_rate": 1.4694960212201592e-05,
|
| 29 |
+
"loss": 0.0428,
|
| 30 |
+
"step": 1500
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"epoch": 1.0610079575596818,
|
| 34 |
+
"grad_norm": 1.094058871269226,
|
| 35 |
+
"learning_rate": 1.292661361626879e-05,
|
| 36 |
+
"loss": 0.04,
|
| 37 |
+
"step": 2000
|
| 38 |
+
},
|
| 39 |
+
{
|
| 40 |
+
"epoch": 1.3262599469496021,
|
| 41 |
+
"grad_norm": 1.3806736469268799,
|
| 42 |
+
"learning_rate": 1.1158267020335986e-05,
|
| 43 |
+
"loss": 0.0294,
|
| 44 |
+
"step": 2500
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"epoch": 1.5915119363395225,
|
| 48 |
+
"grad_norm": 0.7832322120666504,
|
| 49 |
+
"learning_rate": 9.389920424403184e-06,
|
| 50 |
+
"loss": 0.0297,
|
| 51 |
+
"step": 3000
|
| 52 |
+
},
|
| 53 |
+
{
|
| 54 |
+
"epoch": 1.8567639257294428,
|
| 55 |
+
"grad_norm": 1.1327711343765259,
|
| 56 |
+
"learning_rate": 7.6215738284703815e-06,
|
| 57 |
+
"loss": 0.0273,
|
| 58 |
+
"step": 3500
|
| 59 |
+
},
|
| 60 |
+
{
|
| 61 |
+
"epoch": 2.1220159151193636,
|
| 62 |
+
"grad_norm": 1.6918989419937134,
|
| 63 |
+
"learning_rate": 5.853227232537579e-06,
|
| 64 |
+
"loss": 0.0229,
|
| 65 |
+
"step": 4000
|
| 66 |
+
},
|
| 67 |
+
{
|
| 68 |
+
"epoch": 2.387267904509284,
|
| 69 |
+
"grad_norm": 1.099871039390564,
|
| 70 |
+
"learning_rate": 4.084880636604775e-06,
|
| 71 |
+
"loss": 0.0155,
|
| 72 |
+
"step": 4500
|
| 73 |
+
},
|
| 74 |
+
{
|
| 75 |
+
"epoch": 2.6525198938992043,
|
| 76 |
+
"grad_norm": 1.586369276046753,
|
| 77 |
+
"learning_rate": 2.316534040671972e-06,
|
| 78 |
+
"loss": 0.0158,
|
| 79 |
+
"step": 5000
|
| 80 |
+
},
|
| 81 |
+
{
|
| 82 |
+
"epoch": 2.9177718832891246,
|
| 83 |
+
"grad_norm": 0.8191050291061401,
|
| 84 |
+
"learning_rate": 5.481874447391689e-07,
|
| 85 |
+
"loss": 0.0161,
|
| 86 |
+
"step": 5500
|
| 87 |
+
},
|
| 88 |
+
{
|
| 89 |
+
"epoch": 3.0,
|
| 90 |
+
"step": 5655,
|
| 91 |
+
"total_flos": 8.432016912385229e+16,
|
| 92 |
+
"train_loss": 0.030565504348984238,
|
| 93 |
+
"train_runtime": 3113.0357,
|
| 94 |
+
"train_samples_per_second": 29.062,
|
| 95 |
+
"train_steps_per_second": 1.817
|
| 96 |
+
}
|
| 97 |
+
],
|
| 98 |
+
"logging_steps": 500,
|
| 99 |
+
"max_steps": 5655,
|
| 100 |
+
"num_input_tokens_seen": 0,
|
| 101 |
+
"num_train_epochs": 3,
|
| 102 |
+
"save_steps": 500,
|
| 103 |
+
"stateful_callbacks": {
|
| 104 |
+
"TrainerControl": {
|
| 105 |
+
"args": {
|
| 106 |
+
"should_epoch_stop": false,
|
| 107 |
+
"should_evaluate": false,
|
| 108 |
+
"should_log": false,
|
| 109 |
+
"should_save": true,
|
| 110 |
+
"should_training_stop": true
|
| 111 |
+
},
|
| 112 |
+
"attributes": {}
|
| 113 |
+
}
|
| 114 |
+
},
|
| 115 |
+
"total_flos": 8.432016912385229e+16,
|
| 116 |
+
"train_batch_size": 16,
|
| 117 |
+
"trial_name": null,
|
| 118 |
+
"trial_params": null
|
| 119 |
+
}
|
training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:851e02b88c71b052af89553746aa72f33f9edcf5091ee3d635e840987b950631
|
| 3 |
+
size 5304
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|