2024-10-20 22:16:51,441 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[INFO|configuration_utils.py:670] 2024-10-20 22:16:51,788 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:16:51,791 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:16:51,794 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:51,807 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:51,807 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:51,807 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:51,807 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:51,808 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2478] 2024-10-20 22:16:52,177 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:670] 2024-10-20 22:16:52,177 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:16:52,178 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:16:52,179 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:52,180 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:52,180 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:52,180 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:52,180 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:16:52,180 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2478] 2024-10-20 22:16:52,459 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/20/2024 22:16:52 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
[INFO|configuration_utils.py:670] 2024-10-20 22:16:52,511 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:16:52,512 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:16:52,512 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

10/20/2024 22:16:52 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3723] 2024-10-20 22:16:52,543 >> loading weights file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json
[INFO|modeling_utils.py:1622] 2024-10-20 22:16:52,544 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1099] 2024-10-20 22:16:52,545 >> Generate config GenerationConfig {
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "pad_token_id": 151329
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.04it/s]
[INFO|modeling_utils.py:4568] 2024-10-20 22:16:57,619 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4576] 2024-10-20 22:16:57,619 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1052] 2024-10-20 22:16:57,622 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/generation_config.json
[INFO|configuration_utils.py:1099] 2024-10-20 22:16:57,623 >> Generate config GenerationConfig {
  "do_sample": true,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "max_length": 128000,
  "pad_token_id": 151329,
  "temperature": 0.8,
  "top_p": 0.8
}

10/20/2024 22:16:57 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
10/20/2024 22:16:58 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
10/20/2024 22:16:58 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-15-06
10/20/2024 22:16:58 - INFO - llamafactory.model.loader - all params: 9,399,951,360
10/20/2024 22:16:58 - WARNING - llamafactory.chat.hf_engine - There is no current event loop, creating a new one.
10/20/2024 22:21:38 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16
2024-10-20 22:21:38,590 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[INFO|configuration_utils.py:670] 2024-10-20 22:21:38,924 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:21:38,928 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:21:38,931 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:38,940 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:38,941 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:38,941 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:38,941 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:38,941 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2478] 2024-10-20 22:21:39,322 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:670] 2024-10-20 22:21:39,323 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:21:39,323 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:21:39,324 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:39,326 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:39,326 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:39,326 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:39,326 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2212] 2024-10-20 22:21:39,326 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2478] 2024-10-20 22:21:39,616 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/20/2024 22:21:39 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
10/20/2024 22:21:39 - INFO - llamafactory.data.loader - Loading dataset my_demo.json...
training example:
input_ids:
[151331, 151333, 151336, 198, 109377, 151337, 198, 121658, 3837, 101328, 117148, 99134, 15457, 113255, 3837, 98444, 98582, 117148, 99134, 15457, 100917, 99134, 103065, 103192, 15223, 54235, 102, 98520, 3837, 118295, 100119, 99526, 1773, 98964, 106546, 98342, 107410, 98540, 110000, 11314, 151329]
inputs:
[gMASK] <sop> <|user|> 
你好 <|assistant|> 
您好，我是 信息中心AI助手，一个由 信息中心AI研发中心 开发的 AI 助手，很高兴认识您。请问我能为您做些什么？ <|endoftext|>
label_ids:
[151329, -100, -100, -100, -100, -100, 198, 121658, 3837, 101328, 117148, 99134, 15457, 113255, 3837, 98444, 98582, 117148, 99134, 15457, 100917, 99134, 103065, 103192, 15223, 54235, 102, 98520, 3837, 118295, 100119, 99526, 1773, 98964, 106546, 98342, 107410, 98540, 110000, 11314, 151329]
labels:
<|endoftext|> 
您好，我是 信息中心AI助手，一个由 信息中心AI研发中心 开发的 AI 助手，很高兴认识您。请问我能为您做些什么？ <|endoftext|>
[INFO|configuration_utils.py:670] 2024-10-20 22:21:42,185 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:670] 2024-10-20 22:21:42,187 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:21:42,187 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|modeling_utils.py:3723] 2024-10-20 22:21:42,220 >> loading weights file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json
[INFO|modeling_utils.py:1622] 2024-10-20 22:21:42,221 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1099] 2024-10-20 22:21:42,222 >> Generate config GenerationConfig {
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "pad_token_id": 151329
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.54it/s]
[INFO|modeling_utils.py:4568] 2024-10-20 22:21:48,773 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4576] 2024-10-20 22:21:48,773 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1052] 2024-10-20 22:21:48,777 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/generation_config.json
[INFO|configuration_utils.py:1099] 2024-10-20 22:21:48,777 >> Generate config GenerationConfig {
  "do_sample": true,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "max_length": 128000,
  "pad_token_id": 151329,
  "temperature": 0.8,
  "top_p": 0.8
}

10/20/2024 22:21:48 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
10/20/2024 22:21:48 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
10/20/2024 22:21:48 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
10/20/2024 22:21:48 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
10/20/2024 22:21:48 - INFO - llamafactory.model.model_utils.misc - Found linear modules: query_key_value,dense_h_to_4h,dense,dense_4h_to_h
10/20/2024 22:21:49 - INFO - llamafactory.model.loader - trainable params: 21,176,320 || all params: 9,421,127,680 || trainable%: 0.2248
[INFO|trainer.py:667] 2024-10-20 22:21:49,278 >> Using auto half precision backend
[INFO|trainer.py:2243] 2024-10-20 22:21:49,879 >> ***** Running training *****
[INFO|trainer.py:2244] 2024-10-20 22:21:49,879 >>   Num examples = 80
[INFO|trainer.py:2245] 2024-10-20 22:21:49,879 >>   Num Epochs = 3
[INFO|trainer.py:2246] 2024-10-20 22:21:49,879 >>   Instantaneous batch size per device = 2
[INFO|trainer.py:2249] 2024-10-20 22:21:49,879 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2250] 2024-10-20 22:21:49,879 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:2251] 2024-10-20 22:21:49,879 >>   Total optimization steps = 15
[INFO|trainer.py:2252] 2024-10-20 22:21:49,882 >>   Number of trainable parameters = 21,176,320
  0%|                                                                                                                                                           | 0/15 [00:00<?, ?it/s]/root/autodl-tmp/conda/llama_factory/lib/python3.11/site-packages/torch/autograd/graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at ../aten/src/ATen/native/cudnn/MHA.cpp:674.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 33%|█████████████████████████████████████████████████                                                                                                  | 5/15 [00:19<00:31,  3.17s/it]10/20/2024 22:22:08 - INFO - llamafactory.train.callbacks - {'loss': 5956.1574, 'learning_rate': 3.7500e-04, 'epoch': 1.00, 'throughput': 209.45}
{'loss': 5956.1574, 'grad_norm': nan, 'learning_rate': 0.000375, 'epoch': 1.0, 'num_input_tokens_seen': 4000}                                                                          
 67%|█████████████████████████████████████████████████████████████████████████████████████████████████▎                                                | 10/15 [00:30<00:11,  2.26s/it]10/20/2024 22:22:20 - INFO - llamafactory.train.callbacks - {'loss': 0.0000, 'learning_rate': 1.2500e-04, 'epoch': 2.00, 'throughput': 259.81}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.00012500000000000006, 'epoch': 2.0, 'num_input_tokens_seen': 7952}                                                                  
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:40<00:00,  2.04s/it]10/20/2024 22:22:30 - INFO - llamafactory.train.callbacks - {'loss': 0.0000, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 295.72}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.0, 'num_input_tokens_seen': 12000}                                                                                    
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:40<00:00,  2.04s/it][INFO|trainer.py:3705] 2024-10-20 22:22:30,465 >> Saving model checkpoint to saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/checkpoint-15
[INFO|configuration_utils.py:670] 2024-10-20 22:22:30,505 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:22:30,506 >> Model config ChatGLMConfig {
  "_name_or_path": "THUDM/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2649] 2024-10-20 22:22:30,635 >> tokenizer config file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/checkpoint-15/tokenizer_config.json
[INFO|tokenization_utils_base.py:2658] 2024-10-20 22:22:30,635 >> Special tokens file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/checkpoint-15/special_tokens_map.json
[INFO|tokenization_utils_base.py:2709] 2024-10-20 22:22:30,636 >> added tokens file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/checkpoint-15/added_tokens.json
[INFO|trainer.py:2505] 2024-10-20 22:22:30,830 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 40.9486, 'train_samples_per_second': 5.861, 'train_steps_per_second': 0.366, 'train_loss': 1985.3858072916667, 'epoch': 3.0, 'num_input_tokens_seen': 12000}         
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:40<00:00,  2.73s/it]
[INFO|trainer.py:3705] 2024-10-20 22:22:30,832 >> Saving model checkpoint to saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02
[INFO|configuration_utils.py:670] 2024-10-20 22:22:30,849 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:739] 2024-10-20 22:22:30,850 >> Model config ChatGLMConfig {
  "_name_or_path": "THUDM/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|tokenization_utils_base.py:2649] 2024-10-20 22:22:30,962 >> tokenizer config file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/tokenizer_config.json
[INFO|tokenization_utils_base.py:2658] 2024-10-20 22:22:30,962 >> Special tokens file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/special_tokens_map.json
[INFO|tokenization_utils_base.py:2709] 2024-10-20 22:22:30,962 >> added tokens file saved in saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/added_tokens.json
***** train metrics *****
  epoch                    =        3.0
  num_input_tokens_seen    =      12000
  total_flos               =   590110GF
  train_loss               =  1985.3858
  train_runtime            = 0:00:40.94
  train_samples_per_second =      5.861
  train_steps_per_second   =      0.366
Figure saved at: saves/GLM-4-9B-Chat/lora/train_2024-10-20-22-21-02/training_loss.png
10/20/2024 22:22:31 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
10/20/2024 22:22:31 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|modelcard.py:449] 2024-10-20 22:22:31,050 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}