在notebook下基于function call微调Llama3-8B-Instruct模型 #40

Open
opened 2024-09-09 20:21:40 +08:00 by 12019701659cs · 0 comments

创建notebook

每个notebook的创建设置
image 1.png
image.png

单卡测试

创建一个notebook,实现单机1*4卡微调。

在命令行内键入

# 设置环境变量
export NCCL_DEBUG=INFO
export NCCL_IB_DISABLE=0
export NCCL_IB_HCA=mlx5
export NCCL_SOCKET_IFNAME=eth0
export GLOO_SOCKET_IFNAME=eth0
export HF_HOME=/code/huggingface-cache/
export HF_ENDPOINT=https://hf-mirror.com
export http_proxy=http://10.10.9.50:3000
export https_proxy=http://10.10.9.50:3000
export no_proxy=localhost,127.0.0.1
# 开始训练
cd /code/

NPROC_PER_NODE=4 NNODES=1 PORT=12345 ADDR=10.244.59.92 NODE_RANK=0  xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offload

运行中间结果
image 9.png

多卡测试

创建两个notebook,实现两机2*4卡共同微调。

notebook1

在命令行内键入

# 设置环境变量
export NCCL_DEBUG=INFO
export NCCL_IB_DISABLE=0
export NCCL_IB_HCA=mlx5
export NCCL_SOCKET_IFNAME=eth0
export GLOO_SOCKET_IFNAME=eth0
export HF_HOME=/code/huggingface-cache/
export HF_ENDPOINT=https://hf-mirror.com
export http_proxy=http://10.10.9.50:3000
export https_proxy=http://10.10.9.50:3000
export no_proxy=localhost,127.0.0.1
# 开始训练
cd /code/

NPROC_PER_NODE=4 NNODES=2 PORT=12345 ADDR=10.244.125.204 NODE_RANK=0  xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offload

notebook2

在命令行内键入

# 设置环境变量
export NCCL_DEBUG=INFO
export NCCL_IB_DISABLE=0
export NCCL_IB_HCA=mlx5
export NCCL_SOCKET_IFNAME=eth0
export GLOO_SOCKET_IFNAME=eth0
export HF_HOME=/code/huggingface-cache/
export HF_ENDPOINT=https://hf-mirror.com
export http_proxy=http://10.10.9.50:3000
export https_proxy=http://10.10.9.50:3000
export no_proxy=localhost,127.0.0.1
# 开始训练
cd /code/

NPROC_PER_NODE=4 NNODES=2 PORT=12345 ADDR=10.244.125.204 NODE_RANK=1  xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offloa

运行中间结果
image 10.png

# 创建notebook 每个notebook的创建设置 ![image 1.png](/attachments/b08e0d64-17f8-4617-9fb6-154102f75543) ![image.png](/attachments/e59075a5-76df-49d1-b2ef-06ad7d59179c) # 单卡测试 创建一个notebook,实现单机1*4卡微调。 在命令行内键入 ```bash # 设置环境变量 export NCCL_DEBUG=INFO export NCCL_IB_DISABLE=0 export NCCL_IB_HCA=mlx5 export NCCL_SOCKET_IFNAME=eth0 export GLOO_SOCKET_IFNAME=eth0 export HF_HOME=/code/huggingface-cache/ export HF_ENDPOINT=https://hf-mirror.com export http_proxy=http://10.10.9.50:3000 export https_proxy=http://10.10.9.50:3000 export no_proxy=localhost,127.0.0.1 # 开始训练 cd /code/ NPROC_PER_NODE=4 NNODES=1 PORT=12345 ADDR=10.244.59.92 NODE_RANK=0 xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offload ``` 运行中间结果 ![image 9.png](/attachments/bbc2038d-e128-4358-a2c9-4a8ead53835f) # 多卡测试 创建两个notebook,实现两机2*4卡共同微调。 ## notebook1 在命令行内键入 ```bash # 设置环境变量 export NCCL_DEBUG=INFO export NCCL_IB_DISABLE=0 export NCCL_IB_HCA=mlx5 export NCCL_SOCKET_IFNAME=eth0 export GLOO_SOCKET_IFNAME=eth0 export HF_HOME=/code/huggingface-cache/ export HF_ENDPOINT=https://hf-mirror.com export http_proxy=http://10.10.9.50:3000 export https_proxy=http://10.10.9.50:3000 export no_proxy=localhost,127.0.0.1 # 开始训练 cd /code/ NPROC_PER_NODE=4 NNODES=2 PORT=12345 ADDR=10.244.125.204 NODE_RANK=0 xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offload ``` ## notebook2 在命令行内键入 ```bash # 设置环境变量 export NCCL_DEBUG=INFO export NCCL_IB_DISABLE=0 export NCCL_IB_HCA=mlx5 export NCCL_SOCKET_IFNAME=eth0 export GLOO_SOCKET_IFNAME=eth0 export HF_HOME=/code/huggingface-cache/ export HF_ENDPOINT=https://hf-mirror.com export http_proxy=http://10.10.9.50:3000 export https_proxy=http://10.10.9.50:3000 export no_proxy=localhost,127.0.0.1 # 开始训练 cd /code/ NPROC_PER_NODE=4 NNODES=2 PORT=12345 ADDR=10.244.125.204 NODE_RANK=1 xtuner train /code/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir /userhome/llama3-8b-ft/agent-flan --deepspeed deepspeed_zero3_offloa ``` 运行中间结果 ![image 10.png](/attachments/5b1f46a4-46b1-43b1-acae-d068c64feb00)
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: HswOAuth/llm_course#40
No description provided.