【求助】微调训练出错，报错信息和截屏如下 #372

New Issue

leofly007 · 2024-11-12T17:03:36+08:00

leofly007 commented

2024-11-12 17:03:36 +08:00

[INFO|configuration_utils.py:670] 2024-11-12 16:28:15,297 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json

[INFO|configuration_utils.py:670] 2024-11-12 16:28:15,300 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json

[INFO|configuration_utils.py:739] 2024-11-12 16:28:15,301 >> Model config ChatGLMConfig { "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": [ 151329, 151336, 151338 ], "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1.5625e-07, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_hidden_layers": 40, "num_layers": 40, "original_rope": true, "pad_token_id": 151329, "padded_vocab_size": 151552, "post_layer_norm": true, "rmsnorm": true, "rope_ratio": 500, "seq_length": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "vocab_size": 151552 }

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer.model

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer.json

[INFO|tokenization_utils_base.py:2478] 2024-11-12 16:28:15,839 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|configuration_utils.py:670] 2024-11-12 16:28:15,840 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json

[INFO|configuration_utils.py:670] 2024-11-12 16:28:15,841 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json

[INFO|configuration_utils.py:739] 2024-11-12 16:28:15,843 >> Model config ChatGLMConfig { "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": [ 151329, 151336, 151338 ], "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1.5625e-07, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_hidden_layers": 40, "num_layers": 40, "original_rope": true, "pad_token_id": 151329, "padded_vocab_size": 151552, "post_layer_norm": true, "rmsnorm": true, "rope_ratio": 500, "seq_length": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "vocab_size": 151552 }

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer.model

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer.json

[INFO|tokenization_utils_base.py:2478] 2024-11-12 16:28:16,365 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[WARNING|arrow_dataset.py:3098] 2024-11-12 16:28:17,203 >> num_proc must be <= 7. Reducing num_proc to 7 for dataset of size 7.

损失

[INFO|configuration_utils.py:670] 2024-11-12 16:28:15,297 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:670] 2024-11-12 16:28:15,300 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:739] 2024-11-12 16:28:15,301 >> Model config ChatGLMConfig { "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": [ 151329, 151336, 151338 ], "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1.5625e-07, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_hidden_layers": 40, "num_layers": 40, "original_rope": true, "pad_token_id": 151329, "padded_vocab_size": 151552, "post_layer_norm": true, "rmsnorm": true, "rope_ratio": 500, "seq_length": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "vocab_size": 151552 } [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,307 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2478] 2024-11-12 16:28:15,839 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:670] 2024-11-12 16:28:15,840 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:670] 2024-11-12 16:28:15,841 >> loading configuration file /root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:739] 2024-11-12 16:28:15,843 >> Model config ChatGLMConfig { "_name_or_path": "/root/autodl-tmp/modelscope/hub/ZhipuAI/glm-4-9b-chat", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": [ 151329, 151336, 151338 ], "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1.5625e-07, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_hidden_layers": 40, "num_layers": 40, "original_rope": true, "pad_token_id": 151329, "padded_vocab_size": 151552, "post_layer_norm": true, "rmsnorm": true, "rope_ratio": 500, "seq_length": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "vocab_size": 151552 } [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2212] 2024-11-12 16:28:15,845 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2478] 2024-11-12 16:28:16,365 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|arrow_dataset.py:3098] 2024-11-12 16:28:17,203 >> num_proc must be <= 7. Reducing num_proc to 7 for dataset of size 7. 损失

屏幕截图 2024-11-12 170146.png

96 KiB

image.png

96 KiB

x commented

2024-11-12 17:29:38 +08:00

在提供的日志和截图里面没有看到相关的报错信息，日志中的WARNING可以忽略

leofly007 commented

2024-11-12 18:05:11 +08:00

那怎么会训练出错呢

21547230244cs commented

2024-11-12 19:31:38 +08:00

请问这个dataset是您自己设置的嘛还是根据课程操作手册来弄的？

leofly007 commented

2024-11-13 10:25:48 +08:00

哪个dataset?json文件吗？这个是按操作手册来写的

rayhsw commented

2024-11-13 18:29:42 +08:00

为方便协助调试，请点击”预览数据集“按钮，看看数据库是否合法？最好是录制整个出错的过程，包括控制台的报错，谢谢。

leofly007 commented

2024-11-14 09:12:46 +08:00

请老师看看，多谢

20241114_090506.mp4

19 MiB

屏幕截图 2024-11-14 091129.png

299 KiB

rayhsw commented

2024-11-14 15:05:06 +08:00

image.png

1.0 MiB

rayhsw commented

2024-11-14 15:13:08 +08:00

另外推荐你查看这个帖子：#308，有可能是您启动了多个webui，在正常情况下，webui默认是监听7860端口，只有在你已经移动了一个webui，7860被占用的时候，才会选择使用7861端口。

另外推荐你查看这个帖子：https://hsw-git.huishiwei.cn/HswOAuth/llm_course/issues/308，有可能是您启动了多个webui，在正常情况下，webui默认是监听7860端口，只有在你已经移动了一个webui，7860被占用的时候，才会选择使用7861端口。

leofly007 commented

2024-11-14 18:17:06 +08:00

隧道没问题，原来可以训练的json文件现在也加载不了，到底怎么回事啊

20241114_181403.mp4

2.3 MiB

20241114_180001.mp4

33 MiB

rayhsw commented

2024-11-15 01:10:25 +08:00

该json文件的格式，与上课时候提供的JSON文件，内部的文件组织形式不一样。建议，先一模一样按照老师的教程做，然后再使用自己的数据集。另外：数据集格式需要和老师的格式一样。

<img width="948" alt="image" src="/attachments/b2054592-d456-4226-b0d0-79be189855fb"> 该json文件的格式，与上课时候提供的JSON文件，内部的文件组织形式不一样。建议，先一模一样按照老师的教程做，然后再使用自己的数据集。另外：数据集格式需要和老师的格式一样。

image.png

439 KiB

leofly007 commented

2024-11-16 21:41:49 +08:00

我的json文件直接用的你的文件，格式也是按你们的做的，现在连调取都调取不了，到底是什么鬼啊，这么多天了怎么都解决不了

屏幕截图 2024-11-16 213855.png

129 KiB

屏幕截图 2024-11-16 213912.png

344 KiB

leofly007 commented

2024-11-16 21:42:23 +08:00

No description provided.

my_demo1.json

20 KiB

leofly007 commented

2024-11-16 21:43:52 +08:00

No description provided.

mydemo1.json

32 KiB

rayhsw commented

2024-11-19 15:16:15 +08:00

No description provided.

> 麻烦看一下data文件夹下的dataset_info.json文件里面是否加入了mydemo1.json的信息？随便通过附件形式，提供一下您的dataset_info.json文件给我们帮助排查。也就是，您可以再确认下教程里面的这一步有没有做？ <img width="1334" alt="image" src="/attachments/b11a278b-edf4-4892-8b48-509308b4e2dd">

image.png

492 KiB