llama3仿openai api的实验（AutoDL服务器） #161

New Issue

12390900721cs · 2024-10-08T14:41:08+08:00

12390900721cs commented

2024-10-08 14:41:08 +08:00

本次实验是使用AutoDL平台服务器来实现llama3仿openai api的实验。AutoDL上租用一台gpu按量付费并不贵，在后面的llama factory也用的是这个。

本次课属于【大模型本地部署应用】，基于Chinese-LLaMA-Alpaca-3项目，介绍如何封装一个私有的兼容openai api的大模型接口，并使用ChatGPTNextWeb开源工具调用此接口。

租用服务器

首先点这里租用实例；

然后在这里按照要求选择你要用的gpu；

然后选择需要用的镜像创建即可;

创建完成后在相应实例选择开机就可以使用了。

实验环境搭建

首先下载实验需要用到的代码：

#源码下载
cd ~/ && wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz
#源码解压
tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz

安装miniconda：

#AutoDL是自带miniconda环境的，如果没有的话用如下命令行下载

#下载安装包
cd ~/ && wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz
#解压安装包
tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz

创建并启动虚拟环境

#创建指定版本的虚拟环境
conda create -n chinese_llama_alpaca_3 python=3.8.17 pip -y
#注意启动虚拟环境之前要先重启shell
conda activate chinese_llama_alpaca_3

代码bug修复

在Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py文件中将generation_kwargs中的内容做如下修改：

generation_kwargs = dict(
    streamer=streamer,
    input_ids=input_ids,
    generation_config=generation_config,
    return_dict_in_generate=True,
    output_scores=False,
    max_new_tokens=max_new_tokens,
    repetition_penalty=float(repetition_penalty),
    pad_token_id=tokenizer.eos_token_id, # 新添加的参数
    eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")] # 新添加的参数
)

加入上面的参数主要是为了兼容llama3特有的停止token，不然流式接口返回的内容会不断的自动重复，不停止。

安装依赖

首先备份原来存在的脚本

mv requirements.txt requirements.bk.txt

创建新的requirements.txt并写入新的依赖

accelerate==0.30.0
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.43.1
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
datasets==2.20.0
deepspeed==0.13.1
dill==0.3.7
dnspython==2.6.1
einops==0.8.0
email_validator==2.1.1
exceptiongroup==1.2.1
fastapi==0.109.2
fastapi-cli==0.0.3
filelock==3.14.0
frozenlist==1.4.1
fsspec==2023.10.0
h11==0.14.0
hjson==3.1.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.3
idna==3.7
Jinja2==3.1.4
joblib==1.4.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
modelscope==1.17.1
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
networkx==3.1
ninja==1.11.1.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
orjson==3.10.3
packaging==24.0
pandas==2.0.3
peft==0.7.1
psutil==5.9.8
py-cpuinfo==9.0.0
pyarrow==16.0.0
pyarrow-hotfix==0.6
pydantic==1.10.11
pydantic_core==2.18.2
Pygments==2.18.0
pynvml==11.5.0
python-dateutil==2.9.0.post0
python-decouple==3.8
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
regex==2024.4.28
requests==2.32.3
rich==13.7.1
safetensors==0.4.3
scikit-learn==1.3.2
scipy==1.10.1
shellingham==1.5.4
shortuuid==1.0.13
six==1.16.0
sniffio==1.3.1
sse-starlette==2.1.0
starlette==0.36.3
sympy==1.12
threadpoolctl==3.5.0
tokenizers==0.19.1
torch==2.1.2
tqdm==4.66.4
transformers==4.41.2
triton==2.1.0
typer==0.12.3
typing_extensions==4.11.0
tzdata==2024.1
ujson==5.9.0
urllib3==2.2.1
uvicorn==0.29.0
uvloop==0.19.0
watchfiles==0.21.0
websockets==12.0
xxhash==3.4.1
yarl==1.9.4

然后安装以上依赖：

pip install -r Chinese-LLaMA-Alpaca-3-3.0/requirements.txt -i https://mirrors.aliyun.com/pypi/simple

启动服务

查看模型所在位置

ls -alhrt ~/.cache/modelscope/hub/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3

启动服务

python Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py --gpus 0 --base_model ~/.cache/model
scope/hub/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3

测试效果

ChatGPTNextWeb下载

Windows下载ChatGPTNextWeb工具

设置接入刚刚部署的大模型

本次实验是使用AutoDL平台服务器来实现llama3仿openai api的实验。AutoDL上租用一台gpu按量付费并不贵，在后面的llama factory也用的是这个。 本次课属于【大模型**本地部署**应用】，基于[Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)项目，介绍如何封装一个私有的兼容openai api的大模型接口，并使用ChatGPTNextWeb开源工具调用此接口。 ## 租用服务器首先点这里租用实例； ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728365312729-3eb54ae9-b296-4c10-bd52-71243fb54b54.png) 然后在这里按照要求选择你要用的gpu； ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728365346720-76c68583-79ea-4fad-bc1d-10231e916d66.png) 然后选择需要用的镜像创建即可; ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728365390571-04fec4e1-462e-4793-85c3-212ba64d27d5.png) 创建完成后在相应实例选择开机就可以使用了。 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728365550306-a6783c78-431f-4bc7-ac7d-96c6ae2f4dae.png) ## 实验环境搭建 ### 首先下载实验需要用到的代码： ```plain #源码下载 cd ~/ && wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz #源码解压 tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz ``` ### 安装miniconda： ```plain #AutoDL是自带miniconda环境的，如果没有的话用如下命令行下载 #下载安装包 cd ~/ && wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz #解压安装包 tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz ``` ### 创建并启动虚拟环境 ```plain #创建指定版本的虚拟环境 conda create -n chinese_llama_alpaca_3 python=3.8.17 pip -y #注意启动虚拟环境之前要先重启shell conda activate chinese_llama_alpaca_3 ``` ### 代码bug修复在Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py文件中将generation_kwargs中的内容做如下修改： ```plain generation_kwargs = dict( streamer=streamer, input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=False, max_new_tokens=max_new_tokens, repetition_penalty=float(repetition_penalty), pad_token_id=tokenizer.eos_token_id, # 新添加的参数 eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")] # 新添加的参数 ) ``` 加入上面的参数主要是为了兼容llama3特有的停止token，不然流式接口返回的内容会不断的自动重复，不停止。 ### 安装依赖首先备份原来存在的脚本 ```plain mv requirements.txt requirements.bk.txt ``` 创建新的requirements.txt并写入新的依赖 ```plain accelerate==0.30.0 aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.6.0 anyio==4.3.0 async-timeout==4.0.3 attrs==23.2.0 bitsandbytes==0.43.1 certifi==2024.2.2 charset-normalizer==3.3.2 click==8.1.7 datasets==2.20.0 deepspeed==0.13.1 dill==0.3.7 dnspython==2.6.1 einops==0.8.0 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.109.2 fastapi-cli==0.0.3 filelock==3.14.0 frozenlist==1.4.1 fsspec==2023.10.0 h11==0.14.0 hjson==3.1.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 idna==3.7 Jinja2==3.1.4 joblib==1.4.2 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 modelscope==1.17.1 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.15 networkx==3.1 ninja==1.11.1.1 numpy==1.24.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105 orjson==3.10.3 packaging==24.0 pandas==2.0.3 peft==0.7.1 psutil==5.9.8 py-cpuinfo==9.0.0 pyarrow==16.0.0 pyarrow-hotfix==0.6 pydantic==1.10.11 pydantic_core==2.18.2 Pygments==2.18.0 pynvml==11.5.0 python-dateutil==2.9.0.post0 python-decouple==3.8 python-dotenv==1.0.1 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 regex==2024.4.28 requests==2.32.3 rich==13.7.1 safetensors==0.4.3 scikit-learn==1.3.2 scipy==1.10.1 shellingham==1.5.4 shortuuid==1.0.13 six==1.16.0 sniffio==1.3.1 sse-starlette==2.1.0 starlette==0.36.3 sympy==1.12 threadpoolctl==3.5.0 tokenizers==0.19.1 torch==2.1.2 tqdm==4.66.4 transformers==4.41.2 triton==2.1.0 typer==0.12.3 typing_extensions==4.11.0 tzdata==2024.1 ujson==5.9.0 urllib3==2.2.1 uvicorn==0.29.0 uvloop==0.19.0 watchfiles==0.21.0 websockets==12.0 xxhash==3.4.1 yarl==1.9.4 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728366323851-cfe61292-e855-4c3e-ad0b-2d3ac5c9ea98.png) 然后安装以上依赖： ```plain pip install -r Chinese-LLaMA-Alpaca-3-3.0/requirements.txt -i https://mirrors.aliyun.com/pypi/simple ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728366359630-27388914-c09c-4e02-955a-401d4ab181d5.png) ## 启动服务 ### 查看模型所在位置 ```plain ls -alhrt ~/.cache/modelscope/hub/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728366455698-2fc959bd-b9ed-4b79-abc0-48e485c0eeae.png) ### 启动服务 ```plain python Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py --gpus 0 --base_model ~/.cache/model scope/hub/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728368131944-301cc32a-8858-4360-8f9e-d28f59f6f68b.png) ## 测试效果 ### ChatGPTNextWeb下载 [Windows下载ChatGPTNextWeb工具](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/releases/download/v2.14.2/NextChat_2.14.2_x64-setup.exe) ### 设置接入刚刚部署的大模型 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728369521675-fd642c9b-4dce-4a65-adce-6a23bc8fd483.png) ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1728369541402-3bc4eb4a-b04e-4d87-9752-12d0c89eeeb6.png)

12390900721cs changed title from ~~llama3仿openai api的实验~~ to llama3仿openai api的实验（AutoDL服务器）

2024-10-10 15:08:30 +08:00

Sign in to join this conversation.