第三讲：提示词工程部署跟练及视频 #89

New Issue

GANGUAGUA · 2024-09-14T15:02:55+08:00

GANGUAGUA commented

2024-09-14 15:02:55 +08:00

准备API Key

首先，根据自己的需求申请API key

创建开发环境

由于待会需要运用2机8卡训练，所以先在平台上创建2个notebook。

打开notebook进入VS Code

打开文件夹，输入/code/进入code栏目

在右上角打开终端

设置环境变量

网络连接配置

在两台服务器上都配置HuggingFace镜像和网络代理：

export HF_HOME=/code/huggingface-cache/

export HF_ENDPOINT=https://hf-mirror.com

export http_proxy=http://10.10.9.50:3000

export https_proxy=http://10.10.9.50:3000

export no_proxy=localhost,127.0.0.1

设置对应的API-Key

export ZHIPUAI_API_KEY=*****

配置IB网卡

在两台服务器上分别输入：

export NCCL_DEBUG=INFO

export NCCL_IB_DISABLE=0

export NCCL_IB_HCA=mlx5

export NCCL_SOCKET_IFNAME=eth0

export GLOO_SOCKET_IFNAME=eth0

本地开源模型部署部署

使用FastChat部署codellama-7b-hf模型

新建四个终端，分别输入：

#新建终端一：

python -m fastchat.serve.controller --host 0.0.0.0

#新建终端二：

python -m fastchat.serve.model_worker --model-path /dataset/CodeLlama-7b-hf/ --host 0.0.0.0 --num-gpus 4 --max-gpu-memory 15GiB

#新建终端三：

python -m fastchat.serve.model_worker --model-path /dataset/CodeLlama-7b-hf/ --host 0.0.0.0 --num-gpus 4 --max-gpu-memory 15GiB

模型调用

#新建终端四：

curl -X POST http://localhost:8000/v1/completions

-H "Content-Type: application/json"

-d '{

"model": "CodeLlama-7b-hf",

"prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. You must output the SQL query that answers the question. ### Input: Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska? ### Context: CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR) ### Response:",

"max_tokens": 41,

"temperature": 0.5

}'

开始微调训练

获取 master IP 地址

任意选择一台服务做（notebook）作为主服务器（master），并用ifconfig获取该服务器的IP地址

启动训练

使用刚刚查询的主服务器IP地址，启动训练的命令如下：

在第一台服务器输入：

NPROC_PER_NODE=4NNODES=2PORT=12345ADDR=10.244.132.114 NODE_RANK=0 xtuner train llama2_7b_chat_qlora_sql_e3_copy.py --work-dir /code/xtuner-workdir --deepspeed deepspeed_zero3_offload

在第二台服务器输入：

NPROC_PER_NODE=4NNODES=2PORT=12345ADDR=10.244.132.114 NODE_RANK=1 xtuner train llama2_7b_chat_qlora_sql_e3_copy.py --work-dir /code/xtuner-workdir --deepspeed deepspeed_zero3_offload

训练结束后

模型转换

将lora模型训练的checkpoint转成hf格式的模型：

xtuner convert pth_to_hf /code/llama2_7b_chat_qlora_sql_e3_copy.py /code/xtuner-workdir/iter_500.pth/ /code/iter_500_hf/

执行测试

用如下命令测试微调后的结果：

python final_test.py

微调后结果如下：

## 准备API Key 首先，根据自己的需求申请API key ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726218502472-45c2ad8a-e56f-482a-8ae7-7b12dd80d9c8.png) ## 创建开发环境由于待会需要运用2机8卡训练，所以先在平台上创建2个notebook。 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726217797723-9a7a6999-83bc-4c6f-a43e-f00b24c33bd4.png) 打开notebook进入VS Code ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726218698632-3f65645a-288d-479c-9028-185fa0b77ace.png) 打开文件夹，输入/code/进入code栏目 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726285814509-1b8f6599-9fa2-49f8-b3a3-fdef23195b45.png) 在右上角打开终端 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726285890423-f1cd03b3-85fd-4370-b537-c61490ed9827.png) ## 设置环境变量 #### 网络连接配置在两台服务器上都配置HuggingFace镜像和网络代理： export HF_HOME=/code/huggingface-cache/ export HF_ENDPOINT=https://hf-mirror.com export http_proxy=http://10.10.9.50:3000 export https_proxy=http://10.10.9.50:3000 export no_proxy=localhost,127.0.0.1 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726285655646-2f5bb9c4-5f4b-4c6b-acba-ab3afb1da8a4.png) #### 设置对应的API-Key export ZHIPUAI_API_KEY=***** #### 配置IB网卡在两台服务器上分别输入： export NCCL_DEBUG=INFO export NCCL_IB_DISABLE=0 export NCCL_IB_HCA=mlx5 export NCCL_SOCKET_IFNAME=eth0 export GLOO_SOCKET_IFNAME=eth0 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726286244869-36cc72dc-865b-4a01-9b13-c99ff8dfab55.png) ## **本地开源模型部署部署** #### **使用FastChat部署codellama-7b-hf模型** 新建四个终端，分别输入： #新建终端一： python -m fastchat.serve.controller --host 0.0.0.0 #新建终端二： python -m fastchat.serve.model_worker --model-path /dataset/CodeLlama-7b-hf/ --host 0.0.0.0 --num-gpus 4 --max-gpu-memory 15GiB #新建终端三： python -m fastchat.serve.model_worker --model-path /dataset/CodeLlama-7b-hf/ --host 0.0.0.0 --num-gpus 4 --max-gpu-memory 15GiB #### 模型调用 #新建终端四： curl -X POST http://localhost:8000/v1/completions \ -H "Content-Type: application/json"\ -d '{ "model": "CodeLlama-7b-hf", "prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. You must output the SQL query that answers the question. ### Input: Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska? ### Context: CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR) ### Response:", "max_tokens": 41, "temperature": 0.5 }' ## 开始微调训练 #### 获取 master IP 地址任意选择一台服务做（notebook）作为主服务器（master），并用ifconfig获取该服务器的IP地址 ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726286513422-a629bb97-4393-4424-a9a0-2abed1c5e5cf.png) #### 启动训练使用刚刚查询的主服务器IP地址，启动训练的命令如下：在第一台服务器输入： NPROC_PER_NODE=4NNODES=2PORT=12345ADDR=10.244.132.114 NODE_RANK=0 xtuner train llama2_7b_chat_qlora_sql_e3_copy.py --work-dir /code/xtuner-workdir --deepspeed deepspeed_zero3_offload 在第二台服务器输入： NPROC_PER_NODE=4NNODES=2PORT=12345ADDR=10.244.132.114 NODE_RANK=1 xtuner train llama2_7b_chat_qlora_sql_e3_copy.py --work-dir /code/xtuner-workdir --deepspeed deepspeed_zero3_offload ## 训练结束后 #### 模型转换将lora模型训练的checkpoint转成hf格式的模型： xtuner convert pth_to_hf /code/llama2_7b_chat_qlora_sql_e3_copy.py /code/xtuner-workdir/iter_500.pth/ /code/iter_500_hf/ #### 执行测试 用如下命令测试微调后的结果： python final_test.py 微调后结果如下： ![](https://cdn.nlark.com/yuque/0/2024/png/48118617/1726294999306-0834f645-d4b6-41b1-82ae-660bc6c9f7b4.png)

Sign in to join this conversation.