【求助帖】xtuner课程复现遇到的问题【已解决】 #131

New Issue

12390900721cs · 2024-09-26T17:24:40+08:00

12390900721cs commented

2024-09-26 17:24:40 +08:00

这里我尝试了qwen_1_8b和ChatGLM-6B模型，一开始都是遇到了相同的问题。

配置路径

运行代码

报错详情

requires bitsandbytes>=0.39.0

尝试使用pip install bitsandbytes==0.39.0 和 pip install --upgrade bitsandbytes 解决问题，出现如下报错：

AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4

google发现说是不兼容，求大佬帮助

这里我尝试了qwen_1_8b和<font style="color:#333333;">ChatGLM-6B模型，一开始都是遇到了相同的问题。</font> ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1727342717227-9f6cbf41-0542-4f12-937a-d8c16682c9d7.png) ### 配置路径 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1727342297718-9af73075-94c1-4596-af9a-a4319fdde4fc.png) ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1727342329677-8b9df441-986c-4a1b-8b15-b48005007bae.png) ### 运行代码 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1727342415886-58477008-56c9-4a4f-85f2-2d9c8412d2b6.png) ### 报错详情 #### requires bitsandbytes>=0.39.0 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1727342470068-98104711-98d1-4093-8a1c-e8cbf9f03916.png) #### 尝试使用pip install bitsandbytes==0.39.0 和 pip install --upgrade bitsandbytes 解决问题，出现如下报错： ```plain AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4 ``` google发现说是不兼容，求大佬帮助

12390900721cs changed title from ~~xtuner课程复现遇到的问题~~ to 【求助帖】xtuner课程复现遇到的问题

2024-09-26 17:52:16 +08:00

21970855250cs commented

2024-09-26 23:05:14 +08:00

课上提了一下，可能大家没注意：我们平台DCU目前支持的量化参数兼容性还不太好，大家可以将xtuner中的量化部分注释掉，可以参考老师写的代码，或者直接注释掉或者直接删掉下面这部分代码
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')
另外，假设有同学在自己的机器上运行训练，如果发现自己的GPU不支持bfloat16，也需要修改下配置文件里的dtype字段的值
比如torch_dtype=torch.float16改成torch_dtype=torch.float16

课上提了一下，可能大家没注意：我们平台DCU目前支持的量化参数兼容性还不太好，大家可以将xtuner中的量化部分注释掉，可以参考老师写的代码，或者直接注释掉或者直接删掉下面这部分代码 quantization_config=dict( type=BitsAndBytesConfig, load_in_4bit=True, load_in_8bit=False, llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4') 另外，假设有同学在自己的机器上运行训练，如果发现自己的GPU不支持bfloat16，也需要修改下配置文件里的dtype字段的值比如torch_dtype=torch.float16改成torch_dtype=torch.float16

12390900721cs commented

2024-09-27 14:30:18 +08:00

完成注释后问题得到解决。
应该是使用DCU的情况都需要修改，平台暂时不支持量化训练。

完成注释后问题得到解决。应该是使用DCU的情况都需要修改，平台暂时不支持量化训练。

48b892200220265d4d5840dad90075c.png

105 KiB

8bbd927b6ca8efd0c5614f643dbc1ba.png

135 KiB