04-Chinese LLaMA Alpaca系列模型OpenAI API调用实现(跟练)-部署本地“chatgpt” #24

New Issue

12535224197cs · 2024-09-06T17:34:59+08:00

12535224197cs commented

2024-09-06 17:34:59 +08:00

服务端

step1: 下载源码
wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz
tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz
setp2: 模型下载
pip install modelscope -i https://mirrors.aliyun.com/pypi/simple
modelscope download --model ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3 --cache_dir /mnt/wksp/agi/models

step3: 创建虚拟环境
conda create -n chinese_llama_alpaca_3 python=3.8.17 pip -y
conda activate chinese_llama_alpaca_3
cd Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
step4: 修改文件
Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py
①处修改：增加停止token，防止回复重复内容(兼容llama3特有的停止token，不然流式接口返回的内容会不断的自动重复，不停止）
②处修改：在用户终止回复时，释放资源(在修改前，如果项目的使用场景是生成长文本，即使用户提前终止大模型回复，脚本依然会占用GPU资源直到整个结果完全生成)

step5: 启动服务
cd Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/
python openai_api_server.py --gpus 0, 1 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/

客户端

step1: 客户端配置
①输入启动服务的接口地址
②自定义本地模型名称
③选择模型

step2: 用部署的本地模型进行问答

## 服务端 **step1: 下载源码** wget https://file.huishiwei.top/Chinese-LLaMA-Alpaca-3-3.0.tar.gz tar -xvf Chinese-LLaMA-Alpaca-3-3.0.tar.gz **setp2: 模型下载** pip install modelscope -i https://mirrors.aliyun.com/pypi/simple modelscope download --model ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3 --cache_dir /mnt/wksp/agi/models ![X.jpg](https://cdn.nlark.com/yuque/0/2024/jpeg/44993204/1725607920439-b314ff08-0bb3-47c6-a65f-1dad37a70582.jpeg?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0%2Finterlace%2C1) **step3: 创建虚拟环境** conda create -n chinese_llama_alpaca_3 python=3.8.17 pip -y conda activate chinese_llama_alpaca_3 cd Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple **step4: 修改文件** Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/openai_api_server.py ①处修改：增加停止token，防止回复重复内容(兼容llama3特有的停止token，不然流式接口返回的内容会不断的自动重复，不停止） ②处修改：在用户终止回复时，释放资源(在修改前，如果项目的使用场景是生成长文本，即使用户提前终止大模型回复，脚本依然会占用GPU资源直到整个结果完全生成) ![X.jpg](https://cdn.nlark.com/yuque/0/2024/jpeg/44993204/1725613356544-49c3ac0a-f725-4db3-9b88-cf17ea156355.jpeg?x-oss-process=image%2Fresize%2Cw_1142%2Climit_0%2Finterlace%2C1) **step5: 启动服务** cd Chinese-LLaMA-Alpaca-3-3.0/scripts/oai_api_demo/ python openai_api_server.py --gpus 0, 1 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/ ![X.jpg](https://cdn.nlark.com/yuque/0/2024/jpeg/44993204/1725613739504-10b2a0b3-82f5-490d-9480-00d2423c42e8.jpeg?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0%2Finterlace%2C1) ## 客户端 **step1: 客户端配置** ①输入启动服务的接口地址 ②自定义本地模型名称 ③选择模型 ![X.jpg](https://cdn.nlark.com/yuque/0/2024/jpeg/44993204/1725613772646-1e221a82-67ab-4c8e-9b14-d815baf047a1.jpeg?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0%2Finterlace%2C1) **step2: 用部署的本地模型进行问答** ![X.jpg](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725613807020-97e11d22-5841-4f90-9e0f-f7fd201cfc65.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0)

12535224197cs changed title from ~~Chinese LLaMA Alpaca系列模型OpenAI API调用实现(跟练)-部署本地“chatgpt”~~ to 04-Chinese LLaMA Alpaca系列模型OpenAI API调用实现(跟练)-部署本地“chatgpt”

2024-09-28 17:06:12 +08:00

Sign in to join this conversation.