05-私有化LLM仿OpenAI API接口的高可用工程实践(跟练) #32

New Issue

12535224197cs · 2024-09-07T17:33:17+08:00

12535224197cs commented

2024-09-07 17:33:17 +08:00

本地服务配置：#24 (comment)

step1: 1.启动gpu服务，端口号19300；2.启动cpu服务，端口号19333
python openai_api_server.py --gpus 0,1 --port 19300 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/
python openai_api_server.py --only_cpu --port 19333 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/

step2: 启动OneAPI
step3: OneAPI添加渠道

step4: OneAPI设置令牌

step5: nextchat配置

step6: 对话测试(gpu，cpu都可用)
gpu、cpu渠道都可用时，由OneAPI负责调度决定具体使用哪一个进行响应。
cpu响应

gpu响应

step7: 对话测试(仅cpu可用)

本地服务配置：https://hsw-git.huishiwei.cn/HswOAuth/llm_course/issues/24#issue-25 **step1: 1.启动gpu服务，端口号19300；2.启动cpu服务，端口号19333** python openai_api_server.py --gpus 0,1 --port 19300 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/ python openai_api_server.py --only_cpu --port 19333 --base_model /mnt/wksp/agi/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3/ ![x.png](https://cdn.nlark.com/yuque/0/2024/jpeg/44993204/1725766605356-96a24c53-a313-46bf-a026-83a8d514664b.jpeg?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0%2Finterlace%2C1) **step2: 启动OneAPI** **step3: OneAPI添加渠道** ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700194194-c9c117ad-fd04-4b56-9776-44e7f1b89ceb.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700194073-2ef98822-3119-405d-b19a-216830436d88.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700432721-d8470f47-fca8-4fe8-82d8-ee175fb0bb67.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) **step4: OneAPI设置令牌** ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700977755-32f87e1e-d110-4702-bb8a-d1edc636244f.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) **step5: nextchat配置** ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700824405-77927acd-4d6d-42c9-a4e2-1a175235ddb7.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) **step6: 对话测试(gpu，cpu都可用)** gpu、cpu渠道都可用时，由OneAPI负责调度决定具体使用哪一个进行响应。 cpu响应 ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700709283-88188030-48c9-41da-98fa-9c9ecf51d9d2.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) gpu响应 ![x.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1725700709365-366eba90-f7e5-4648-8689-f405c6ff8999.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) **step7: 对话测试(仅cpu可用)** <video src="/attachments/166cdf22-0824-43f3-be96-17b7ebe7c5f1" title="llam-cpu.mov" controls></video>

llam-cpu.mov

7.3 MiB

12535224197cs changed title from ~~私有化LLM仿OpenAI API接口的高可用工程实践(跟练)~~ to 05-私有化LLM仿OpenAI API接口的高可用工程实践(跟练)

2024-09-28 17:06:30 +08:00

Sign in to join this conversation.