12 基于ollama本地模型部署 #38

Open
opened 2024-11-07 18:04:52 +08:00 by 12390900721cs · 0 comments

Ollama

Ollama是一个开源的本地大语言模型运行框架,它旨在简化在本地运行大型语言模型(LLM)的过程。Ollama通过将模型权重、配置和数据集打包成一个名为Modelfile的统一包来实现这一点,使得下载、安装和使用LLM变得更加方便,即使对于没有广泛技术知识的用户也是如此。

Ollama的主要特点包括:

  1. 易用性:Ollama易于安装和使用,适合技术知识不广泛的用户。
  2. 灵活性:支持各种LLM,并具有广泛的自定义选项。
  3. 可访问性:作为一个开源工具,Ollama免费供任何人使用。
  4. 模型支持:支持多种大型语言模型,如Llama 2、Mistral、CodeLLaMA、Falcon、Vicuna和Wizard等。

Ollama适用于多种用例,包括研究LLM的行为和性能、开发使用LLM的应用程序以及教育人们了解LLM。它提供了一个简洁易用的命令行界面和服务器,让你能够轻松下载、运行和管理各种开源LLM。Ollama还提供了类似Docker的模型应用标准,使得模型管理更加标准化和便捷。

常用命令

# 重启ollama服务
systemctl restart ollama
# 启动ollama服务
systemctl start ollama
# 关闭ollama服务
systemctl stop ollama
# 查看ollama服务的状态
systemctl status ollama
# 查看ollama服务的日志
journalctl -u ollama
# 设置开机启动
systemctl enable ollama
# 禁止开机启动
systemctl disable ollama

实验环境

  1. 操作系统:本地连接Linux远程服务器
  2. GPU:RTX3080
  3. 0.3.8 版本Ollama 下载地址

安装ollama

这里我们使用离线安装的方式,最快、最稳妥。

下载安装脚本

curl -fsSL https://ollama.com/install.sh -o install.sh

下载对应安装包

这里我们选择0.3.8,这个版本比较稳定。

因为我用的是服务器做实验,所以下载到本地后需要上传到服务器。

按本地环境更新安装脚本

这里是将本地压缩包文件路径添加进去,脚本会自动对其进行解压缩。

执行安装命令

sh install.sh

测试安装是否成功

这里部署较小的qwen:0.5b模型,测试是否成功安装Ollama。

启动推理

使用单卡推理

ollama run qwen2

使用多卡推理

需要设置环境变量OLLAMA_SCHED_SPREAD=1

export OLLAMA_SCHED_SPREAD=1

交互方式

使用OpenAI python SDK调用ollama部署的模型

# call_ollama.py
from openai import OpenAI
#client = OpenAI(base_url="http://192.168.31.143:11434/v1", api_key="EMPTY")
client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY")

print(client.models.list())

completion = client.chat.completions.create(
  model="qwen:0.5b",
  messages=[
    {"role": "system", "content": "你是一个很有帮助的AI助手"},
    {"role": "user", "content": "2018年世界杯冠军是哪个国家?"},
    {"role": "assistant", "content": "法国。"},
    {"role": "user", "content": "决赛在哪里踢的?"}
  ],
  max_tokens=128
)

print(completion.choices[0].message.content)

测试下function call

# call_ollama_tools.py
from openai import OpenAI
#client = OpenAI(base_url="http://192.168.31.50:11434/v1", api_key="EMPTY")
client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False
            }
        }
    }
]

messages = []
messages.append({"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."})
messages.append({"role": "user", "content": "Hi, can you tell me the delivery date for my order?"})
messages.append({"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"})
messages.append({"role": "user", "content": "i think it is order_12345"})

response = client.chat.completions.create(
    model='llama3.1:8b',

    messages=messages,
    tools=tools
)

print(response)

使用curl方式调用

注意:这里的ip地址要改成自己所用的服务器的IP地址

curl http://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.1:8b",
  "messages": [
  {
     "role": "system",
     "content": "以海盗的口吻简单作答。"
  },
  {
    "role": "user",
    "content": "天空为什么是蓝色的?"
  }
 ],
  "stream": false
 }'

命令行模式

对话模式

ollama run qwen2  #进入交互模式

非对话模式

ollama run qwen2 "介绍下你自己" # 可以加 --verbose 查看执行效率

使用Open webUI

使用如下命令即可启动Open webUI:

sudo docker run -d -p 28080:8080 \
        -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
        -v open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:main


太慢了

sudo nano /etc/docker/daemon.json

重新启动docker以更新配置

sudo systemctl restart docker

再次启动Open webUI

sudo docker run -d -p 28080:8080 \
        -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
        -v open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:main

效果如下:

针对模型还可以注册tools以及文档,使用起来也很方便

【工作空间】--> 【选择一个模型】-->【修改配置】

添加知识库和tools

常用设置及命令

以下说明的方法都是在Linux上作为进程直接启动的,并非docker环境。

默认启动方式

cat /etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

[Install]
WantedBy=default.target

使用如下命令可以启动、关闭、重启ollama服务

# 重启ollama服务
systemctl restart ollama
# 启动ollama服务
systemctl start ollama
# 关闭ollama服务
systemctl stop ollama
# 查看ollama服务的状态
systemctl status ollama
# 查看ollama服务的日志
journalctl -u ollama
# 设置开机启动
systemctl enable ollama
# 禁止开机启动
systemctl disable ollama

常用的环境变量

OLLAMA_MODELS:模型保存位置
OLLAMA_KEEP_ALIVE:模型加载在显存中的时间,默认5分钟,5分钟后释放显存
OLLAMA_MAX_LOADED_MODELS:同时最多加载模型数量
OLLAMA_NUM_PARALLEL:并发数量
OLLAMA_MAX_QUEUE:排队的最大请求数量,队列长度
OLLAMA_NOHISTORY:是否保留会话历史,默认存放在${HOME}/.ollama/history
OLLAMA_SCHED_SPREAD:是否使用多卡推理,1是,0否
OLLAMA_DEBUG:是否开启debug模式,debug模式会打更多日志,1是,0否

如何修改环境变量?

  1. 创建ollama配置文件夹:
# 创建ollama配置文件夹
sudo mkdir -p /etc/systemd/system/ollama.service.d

  1. 编写如下文件:
sudo vi /etc/systemd/system/ollama.service.d/environment.conf

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="CUDA_VISIBLE_DEVICES=0,1"
Environment="OLLAMA_SCHED_SPREAD=1"

  1. 重启Ollama:
systemctl daemon-reload  # 重新加载启动文件
systemctl restart ollama # 重启ollama serve

定制化模型

  1. ollama支持修改模型参数以发布新的模型,比如修改模型的system prompts
# ./Modelfile
FROM llama3.1:8b
# 设置模型温度,温度越高越随机,温度越低,答案越固定、保守
PARAMETER temperature 1
# 设置上下文长度
PARAMETER num_ctx 4096
# 设置对话模板
SYSTEM You are Mario from super mario bros, acting as an assistant.

  1. 创建模型:
ollama create super-mario -f ./Modelfile

  1. 查看:
ollama list 

  1. 运行测试:
ollama run super-mario:latest

接入FastGPT

这里再做一次安装OneAPI和FastGPT

安装Docker

#移除 Docker 相关的软件包和容器运行时
apt remove docker-ce docker-ce-cli containerd.io docker-compose-plugin docker docker-engine docker.io containerd runc

# 切换国内源
sed -i -r 's#http://(archive|security).ubuntu.com#https://mirrors.aliyun.com#g' /etc/apt/sources.list && apt update -y

### 安装docker
# 安装ca-certificates curl gnupg lsb-release
apt install ca-certificates curl gnupg lsb-release -y
#下载 Docker 的 GPG 密钥,并将其添加到 apt-key 中
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
# 为Docker 添加阿里源
add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
# 更新系统的软件包
apt -y update
# 安装docker相关的包
apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y

#测试是否安装成功
docker --version

安装docker-compose

apt-get install docker-compose-plugin

#测试安装是否成功
docker compose version

安装OneAPI 与 FastGPT

这里需要准备两个文件:docker-compose.yml以及config.json,其中config.json是FastGPT的配置文件。

首先创建目录,将所有关于fastgpt的文件放在一个新的文件夹下:

mkdir fastgpt
cd fastgpt

然后准备docker-compose.yml:

# 数据库的默认账号和密码仅首次运行时设置有效
# 如果修改了账号密码,记得改数据库和项目连接参数,别只改一处~
# 该配置文件只是给快速启动,测试使用。正式使用,记得务必修改账号密码,以及调整合适的知识库参数,共享内存等。
# 如何无法访问 dockerhub 和 git,可以用阿里云(阿里云没有arm包)

version: '3.3'
services:
  # db
  pg:
    #image: pgvector/pgvector:0.7.0-pg15 # docker hub
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 # 阿里云
    container_name: pg
    restart: always
    ports: # 生产环境建议不要暴露
      - 5432:5432
    networks:
      - fastgpt
    environment:
      # 这里的配置只有首次运行生效。修改后,重启镜像是不会生效的。需要把持久化数据删除再重启,才有效果
      - POSTGRES_USER=username
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=postgres
    volumes:
      - ./pg/data:/var/lib/postgresql/data
  mongo:
    #image: mongo:5.0.18 # dockerhub
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18 # 阿里云
    # image: mongo:4.4.29 # cpu不支持AVX时候使用
    container_name: mongo
    restart: always
    ports:
      - 27017:27017
    networks:
      - fastgpt
    command: mongod --keyFile /data/mongodb.key --replSet rs0
    environment:
      - MONGO_INITDB_ROOT_USERNAME=myusername
      - MONGO_INITDB_ROOT_PASSWORD=mypassword
    volumes:
      - ./mongo/data:/data/db
    entrypoint:
      - bash
      - -c
      - |
        openssl rand -base64 128 > /data/mongodb.key
        chmod 400 /data/mongodb.key
        chown 999:999 /data/mongodb.key
        echo 'const isInited = rs.status().ok === 1
        if(!isInited){
          rs.initiate({
              _id: "rs0",
              members: [
                  { _id: 0, host: "mongo:27017" }
              ]
          })
        }' > /data/initReplicaSet.js
        # 启动MongoDB服务
        exec docker-entrypoint.sh "$$@" &

        # 等待MongoDB服务启动
        until mongo -u myusername -p mypassword --authenticationDatabase admin --eval "print('waited for connection')" > /dev/null 2>&1; do
          echo "Waiting for MongoDB to start..."
          sleep 2
        done

        # 执行初始化副本集的脚本
        mongo -u myusername -p mypassword --authenticationDatabase admin /data/initReplicaSet.js

        # 等待docker-entrypoint.sh脚本执行的MongoDB服务进程
        wait $$!

  # fastgpt
  sandbox:
    container_name: sandbox
    # image: ghcr.io/labring/fastgpt-sandbox:latest # git
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest # 阿里云
    networks:
      - fastgpt
    restart: always
  fastgpt:
    container_name: fastgpt
    # image: ghcr.io/labring/fastgpt:v4.8.8-fix2 # git
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.8-fix2 # 阿里云
    ports:
      - 3000:3000
    networks:
      - fastgpt
    depends_on:
      - mongo
      - pg
      - sandbox
    restart: always
    environment:
      # root 密码,用户名为: root。如果需要修改 root 密码,直接修改这个环境变量,并重启即可。
      - DEFAULT_ROOT_PSW=1234
      # AI模型的API地址哦。务必加 /v1。这里默认填写了OneApi的访问地址。
      - OPENAI_BASE_URL=http://oneapi:3000/v1
      # AI模型的API Key。(这里默认填写了OneAPI的快速默认key,测试通后,务必及时修改)
      - CHAT_API_KEY=sk-fastgpt
      # 数据库最大连接数
      - DB_MAX_LINK=30
      # 登录凭证密钥
      - TOKEN_KEY=any
      # root的密钥,常用于升级时候的初始化请求
      - ROOT_KEY=root_key
      # 文件阅读加密
      - FILE_TOKEN_KEY=filetoken
      # MongoDB 连接参数. 用户名myusername,密码mypassword。
      - MONGODB_URI=mongodb://myusername:mypassword@mongo:27017/fastgpt?authSource=admin
      # pg 连接参数
      - PG_URL=postgresql://username:password@pg:5432/postgres
      # sandbox 地址
      - SANDBOX_URL=http://sandbox:3000
      # 日志等级: debug, info, warn, error
      - LOG_LEVEL=info
      - STORE_LOG_LEVEL=warn
    volumes:
      - ./config.json:/app/data/config.json

  # oneapi
  mysql:
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36 # 阿里云
    # image: mysql:8.0.36
    container_name: mysql
    restart: always
    ports:
      - 3306:3306
    networks:
      - fastgpt
    command: --default-authentication-plugin=mysql_native_password
    environment:
      # 默认root密码,仅首次运行有效
      MYSQL_ROOT_PASSWORD: oneapimmysql
      MYSQL_DATABASE: oneapi
    volumes:
      - ./mysql:/var/lib/mysql
  oneapi:
    container_name: oneapi
    # image: ghcr.io/songquanpeng/one-api:v0.6.7
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6 # 阿里云
    ports:
      - 3001:3000
    depends_on:
      - mysql
    networks:
      - fastgpt
    restart: always
    environment:
      # mysql 连接参数
      - SQL_DSN=root:oneapimmysql@tcp(mysql:3306)/oneapi
      # 登录凭证加密密钥
      - SESSION_SECRET=oneapikey
      # 内存缓存
      - MEMORY_CACHE_ENABLED=true
      # 启动聚合更新,减少数据交互频率
      - BATCH_UPDATE_ENABLED=true
      # 聚合更新时长
      - BATCH_UPDATE_INTERVAL=10
      # 初始化的 root 密钥(建议部署完后更改,否则容易泄露)
      - INITIAL_ROOT_TOKEN=fastgpt
    volumes:
      - ./oneapi:/data
networks:
  fastgpt:

然后准备config.json:

{
  "feConfigs": {
    "lafEnv": "https://laf.dev"
  },
  "systemEnv": {
    "vectorMaxProcess": 15,
    "qaMaxProcess": 15,
    "pgHNSWEfSearch": 100
  },
  "llmModels": [
    {
        "model": "llama-3.1-instruct",
        "name": "llama-3.1-instruct",
        "maxContext": 128000,
        "maxResponse": 128000,
        "quoteMaxToken": 32000,
        "maxTemperature": 1.2,
      "charsPointsPrice": 0,
      "censor": false,
      "vision": false,
      "datasetProcess": false,
      "usedInClassify": true,
      "usedInExtractFields": true,
      "usedInToolCall": true,
      "usedInQueryExtension": true,
      "toolChoice": false,
      "functionCall": false,
      "customCQPrompt": "",
      "customExtractPrompt": "",
      "defaultSystemChatPrompt": "",
      "defaultConfig": {}
    },
    {
        "model": "qwen2-instruct",
        "name": "qwen2-instruct",
        "avatar": "/imgs/model/qwen.svg",
        "maxContext": 128000,
        "maxResponse": 128000,
        "quoteMaxToken": 32000,
        "maxTemperature": 1.2,
        "charsPointsPrice": 0,
        "censor": false,
        "vision": false,
        "datasetProcess": true,
        "usedInClassify": true,
        "usedInExtractFields": true,
        "usedInToolCall": true,
        "usedInQueryExtension": true,
        "toolChoice": false,
        "functionCall": false,
        "customCQPrompt": "",
        "customExtractPrompt": "",
        "defaultSystemChatPrompt": "",
        "defaultConfig": {}
    },
    {
        "model": "glm-4v",
        "name": "glm-4v",
        "avatar": "/imgs/model/chatglm.svg",
        "maxContext": 128000,
        "maxResponse": 128000,
        "quoteMaxToken": 32000,
        "maxTemperature": 1.2,
        "charsPointsPrice": 0,
        "censor": false,
        "vision": true,
        "datasetProcess": true,
        "usedInClassify": true,
        "usedInExtractFields": true,
        "usedInToolCall": true,
        "usedInQueryExtension": true,
        "toolChoice": true,
        "functionCall": false,
        "customCQPrompt": "",
        "customExtractPrompt": "",
        "defaultSystemChatPrompt": "",
        "defaultConfig": {}
    },
    {
        "model": "ERNIE-Speed-128K",
        "name": "ERNIE-Speed-128K",
	"avatar": "/imgs/model/ernie.svg",
        "maxContext": 128000,
        "maxResponse": 128000,
        "quoteMaxToken": 32000,
        "maxTemperature": 1.2,
        "charsPointsPrice": 0,
        "censor": false,
        "vision": false,
        "datasetProcess": true,
        "usedInClassify": true,
        "usedInExtractFields": true,
        "usedInToolCall": true,
        "usedInQueryExtension": true,
        "toolChoice": true,
        "functionCall": false,
        "customCQPrompt": "",
        "customExtractPrompt": "",
        "defaultSystemChatPrompt": "",
        "defaultConfig": {}
    },
    {
      "model": "gpt-4o-mini",
      "name": "gpt-4o-mini",
      "avatar": "/imgs/model/openai.svg",
      "maxContext": 125000,
      "maxResponse": 4000,
      "quoteMaxToken": 120000,
      "maxTemperature": 1.2,
      "charsPointsPrice": 0,
      "censor": false,
      "vision": true,
      "datasetProcess": true,
      "usedInClassify": true,
      "usedInExtractFields": true,
      "usedInToolCall": true,
      "usedInQueryExtension": true,
      "toolChoice": true,
      "functionCall": false,
      "customCQPrompt": "",
      "customExtractPrompt": "",
      "defaultSystemChatPrompt": "",
      "defaultConfig": {}
    },
    {
      "model": "gpt-4o",
      "name": "gpt-4o",
      "avatar": "/imgs/model/openai.svg",
      "maxContext": 125000,
      "maxResponse": 4000,
      "quoteMaxToken": 120000,
      "maxTemperature": 1.2,
      "charsPointsPrice": 0,
      "censor": false,
      "vision": true,
      "datasetProcess": false,
      "usedInClassify": true,
      "usedInExtractFields": true,
      "usedInToolCall": true,
      "usedInQueryExtension": true,
      "toolChoice": true,
      "functionCall": false,
      "customCQPrompt": "",
      "customExtractPrompt": "",
      "defaultSystemChatPrompt": "",
      "defaultConfig": {}
    }
  ],
  "vectorModels": [
    {
      "model": "m3e-base",
      "name": "m3e-base",
      "charsPointsPrice": 0,
      "defaultToken": 256,
      "maxToken": 512,
      "weight": 100,
      "defaultConfig": {},
      "dbConfig": {},
      "queryConfig": {}
    },
    {
      "model": "text-embedding-ada-002",
      "name": "Embedding-2",
      "avatar": "/imgs/model/openai.svg",
      "charsPointsPrice": 0,
      "defaultToken": 700,
      "maxToken": 3000,
      "weight": 100,
      "defaultConfig": {},
      "dbConfig": {},
      "queryConfig": {}
    },
    {
      "model": "text-embedding-3-large",
      "name": "text-embedding-3-large",
      "avatar": "/imgs/model/openai.svg",
      "charsPointsPrice": 0,
      "defaultToken": 512,
      "maxToken": 3000,
      "weight": 100,
      "defaultConfig": {
        "dimensions": 1024
      }
    },
    {
      "model": "embeding3",
      "name": "embeding3",
      "avatar": "/imgs/model/chatglm.svg",
      "charsPointsPrice": 0,
      "defaultToken": 512,
      "maxToken": 3000,
      "weight": 100
    }
  ],
  "reRankModels": [
    {
      "model": "bge-reranker-v2-m3",
      "name": "bge-reranker-v2-m3"
    }
  ],
  "audioSpeechModels": [
    {
      "model": "tts-1",
      "name": "OpenAI TTS1",
      "charsPointsPrice": 0,
      "voices": [
        { "label": "Alloy", "value": "alloy", "bufferId": "openai-Alloy" },
        { "label": "Echo", "value": "echo", "bufferId": "openai-Echo" },
        { "label": "Fable", "value": "fable", "bufferId": "openai-Fable" },
        { "label": "Onyx", "value": "onyx", "bufferId": "openai-Onyx" },
        { "label": "Nova", "value": "nova", "bufferId": "openai-Nova" },
        { "label": "Shimmer", "value": "shimmer", "bufferId": "openai-Shimmer" }
      ]
    }
  ],
  "whisperModel": {
    "model": "whisper-1",
    "name": "Whisper1",
    "charsPointsPrice": 0
  }
}

启动服务

在fastgpt目录下执行如下命令:

docker-compose up -d
## Ollama <font style="color:rgb(6, 6, 7);">Ollama是一个开源的本地大语言模型运行框架,它旨在简化在本地运行大型语言模型(LLM)的过程。Ollama通过将模型权重、配置和数据集打包成一个名为Modelfile的统一包来实现这一点,使得下载、安装和使用LLM变得更加方便,即使对于没有广泛技术知识的用户也是如此</font><font style="color:rgb(6, 6, 7);">。</font> <font style="color:rgb(6, 6, 7);">Ollama的主要特点包括:</font> 1. **<font style="color:rgb(6, 6, 7);">易用性</font>**<font style="color:rgb(6, 6, 7);">:Ollama易于安装和使用,适合技术知识不广泛的用户。</font> 2. **<font style="color:rgb(6, 6, 7);">灵活性</font>**<font style="color:rgb(6, 6, 7);">:支持各种LLM,并具有广泛的自定义选项。</font> 3. **<font style="color:rgb(6, 6, 7);">可访问性</font>**<font style="color:rgb(6, 6, 7);">:作为一个开源工具,Ollama免费供任何人使用。</font> 4. **<font style="color:rgb(6, 6, 7);">模型支持</font>**<font style="color:rgb(6, 6, 7);">:支持多种大型语言模型,如Llama 2、Mistral、CodeLLaMA、Falcon、Vicuna和Wizard等</font><font style="color:rgb(6, 6, 7);">。</font> <font style="color:rgb(6, 6, 7);">Ollama适用于多种用例,包括研究LLM的行为和性能、开发使用LLM的应用程序以及教育人们了解LLM。它提供了一个简洁易用的命令行界面和服务器,让你能够轻松下载、运行和管理各种开源LLM。Ollama还提供了类似Docker的模型应用标准,使得模型管理更加标准化和便捷。</font> ## 常用命令 ```plain # 重启ollama服务 systemctl restart ollama # 启动ollama服务 systemctl start ollama # 关闭ollama服务 systemctl stop ollama # 查看ollama服务的状态 systemctl status ollama # 查看ollama服务的日志 journalctl -u ollama # 设置开机启动 systemctl enable ollama # 禁止开机启动 systemctl disable ollama ``` ## 实验环境 1. 操作系统:本地连接Linux远程服务器 2. GPU:RTX3080 3. 0.3.8 版本Ollama [下载地址](https://github.com/ollama/ollama/tags?after=v0.3.11-rc2) ## 安装ollama 这里我们使用离线安装的方式,最快、最稳妥。 ### 下载安装脚本 ```plain curl -fsSL https://ollama.com/install.sh -o install.sh ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730253286926-40ea6ddf-41a5-48a0-b897-899a8a39841a.png) ### 下载对应安装包 这里我们选择0.3.8,这个版本比较稳定。 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730253336186-64f36ed1-3a62-4698-9d79-819e1db22b2c.png) 因为我用的是服务器做实验,所以下载到本地后需要上传到服务器。 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730117682664-7c23ec31-e4b4-49db-83dc-06ae14babe16.png) ### 按本地环境更新安装脚本 这里是将本地压缩包文件路径添加进去,脚本会自动对其进行解压缩。 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730117717766-f6154d5a-74c2-4641-b38c-4292298db553.png) ### 执行安装命令 ```plain sh install.sh ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730117745203-26309824-0529-4bbf-99b8-bbbebd3ab6c0.png) ### 测试安装是否成功 这里部署较小的qwen:0.5b模型,测试是否成功安装Ollama。 ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730172018081-f9e9c181-0fc8-4ed3-a00b-34164c2bc263.png) ## 启动推理 ### 使用单卡推理 ```plain ollama run qwen2 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730254240226-0963ea2b-f252-46d4-887d-99e8cca08208.png) ### 使用多卡推理 <font style="color:#333333;">需要设置环境变量OLLAMA_SCHED_SPREAD=1</font> ```plain export OLLAMA_SCHED_SPREAD=1 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730172670823-adc29025-80aa-47d0-b235-4465e1a5916a.png) ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730254867676-d47e22b6-9d81-45ba-9c0a-e4f666ea9129.png) ## 交互方式 ### 使用OpenAI python SDK调用ollama部署的模型 ```plain # call_ollama.py from openai import OpenAI #client = OpenAI(base_url="http://192.168.31.143:11434/v1", api_key="EMPTY") client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY") print(client.models.list()) completion = client.chat.completions.create( model="qwen:0.5b", messages=[ {"role": "system", "content": "你是一个很有帮助的AI助手"}, {"role": "user", "content": "2018年世界杯冠军是哪个国家?"}, {"role": "assistant", "content": "法国。"}, {"role": "user", "content": "决赛在哪里踢的?"} ], max_tokens=128 ) print(completion.choices[0].message.content) ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730258731470-77683298-de8d-45fb-ab86-5904251fe8aa.png) #### <font style="color:#333333;">测试下function call</font> ```plain # call_ollama_tools.py from openai import OpenAI #client = OpenAI(base_url="http://192.168.31.50:11434/v1", api_key="EMPTY") client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY") tools = [ { "type": "function", "function": { "name": "get_delivery_date", "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The customer's order ID." } }, "required": ["order_id"], "additionalProperties": False } } } ] messages = [] messages.append({"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."}) messages.append({"role": "user", "content": "Hi, can you tell me the delivery date for my order?"}) messages.append({"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"}) messages.append({"role": "user", "content": "i think it is order_12345"}) response = client.chat.completions.create( model='llama3.1:8b', messages=messages, tools=tools ) print(response) ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730256730429-c1621961-0ed3-4d4b-8d39-090a418edb41.png) ### 使用curl方式调用 注意:这里的ip地址要改成自己所用的服务器的IP地址 ```plain curl http://127.0.0.1:11434/api/chat -d '{ "model": "llama3.1:8b", "messages": [ { "role": "system", "content": "以海盗的口吻简单作答。" }, { "role": "user", "content": "天空为什么是蓝色的?" } ], "stream": false }' ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730258928651-6b62e361-33ec-410d-97a9-89432952bd4b.png) ### 命令行模式 #### 对话模式 ```plain ollama run qwen2 #进入交互模式 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730259179617-d83ec152-edea-4a1c-866e-a27428a9f7db.png) #### 非对话模式 ```plain ollama run qwen2 "介绍下你自己" # 可以加 --verbose 查看执行效率 ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730259247037-43c69ce7-408a-4a9e-abdc-70c3ecbcd3c4.png) #### 使用Open webUI 使用如下命令即可启动Open webUI: ```plain sudo docker run -d -p 28080:8080 \ -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730260007753-952d2916-2434-4e8e-b105-790e45c15658.png) 太慢了 ```plain sudo nano /etc/docker/daemon.json ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730275244859-6eab84b1-5250-44d2-a708-33c9d05ff8a6.png) ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730275236640-b6016e77-61b2-4f5b-a745-491c0603972f.png) 重新启动docker以更新配置 ```plain sudo systemctl restart docker ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730275295551-9e4e4afe-cc99-48aa-9643-fff87527cb1f.png) 再次启动Open webUI ```plain sudo docker run -d -p 28080:8080 \ -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main ``` 效果如下: ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730278942368-733df31b-02cf-4b40-a4f8-57ac5cd0bede.png) <font style="color:#333333;">针对模型还可以注册tools以及文档,使用起来也很方便</font> <font style="color:#333333;">【工作空间】--> 【选择一个模型】-->【修改配置】</font> ![](https://cdn.nlark.com/yuque/0/2024/jpeg/48516026/1730278972454-2968f5b8-547f-4fc9-9ffc-bc91c51e954e.jpeg) <font style="color:#333333;">添加知识库和tools</font> ![](https://cdn.nlark.com/yuque/0/2024/jpeg/48516026/1730278992635-15914c39-1cb2-4efa-a46f-b82a143ed024.jpeg) ![](https://cdn.nlark.com/yuque/0/2024/jpeg/48516026/1730279700860-45ea3780-d903-49dc-8c3e-2c46be325c8b.jpeg) <font style="color:#333333;"></font> ## 常用设置及命令 <font style="color:#333333;">以下说明的方法都是在Linux上作为进程直接启动的,并非docker环境。</font> ### <font style="color:#333333;">默认启动方式</font> ```plain cat /etc/systemd/system/ollama.service [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" [Install] WantedBy=default.target ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730279795605-173ea9ce-10ac-4f22-b56a-5fa0c848f626.png) <font style="color:#333333;">使用如下命令可以启动、关闭、重启ollama服务</font> ```plain # 重启ollama服务 systemctl restart ollama # 启动ollama服务 systemctl start ollama # 关闭ollama服务 systemctl stop ollama # 查看ollama服务的状态 systemctl status ollama # 查看ollama服务的日志 journalctl -u ollama # 设置开机启动 systemctl enable ollama # 禁止开机启动 systemctl disable ollama ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730279824873-5498aec7-bff1-47ba-9ff9-4fac44dd9abf.png) ### 常用的环境变量 ```plain OLLAMA_MODELS:模型保存位置 OLLAMA_KEEP_ALIVE:模型加载在显存中的时间,默认5分钟,5分钟后释放显存 OLLAMA_MAX_LOADED_MODELS:同时最多加载模型数量 OLLAMA_NUM_PARALLEL:并发数量 OLLAMA_MAX_QUEUE:排队的最大请求数量,队列长度 OLLAMA_NOHISTORY:是否保留会话历史,默认存放在${HOME}/.ollama/history OLLAMA_SCHED_SPREAD:是否使用多卡推理,1是,0否 OLLAMA_DEBUG:是否开启debug模式,debug模式会打更多日志,1是,0否 ``` 如何修改环境变量? 1. 创建ollama配置文件夹: ```plain # 创建ollama配置文件夹 sudo mkdir -p /etc/systemd/system/ollama.service.d ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730280194654-21cd3681-9460-4762-83b2-350fd482a36e.png) 2. <font style="color:#333333;">编写如下文件:</font> ```plain sudo vi /etc/systemd/system/ollama.service.d/environment.conf [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="CUDA_VISIBLE_DEVICES=0,1" Environment="OLLAMA_SCHED_SPREAD=1" ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730279967288-b40b4000-d78c-4274-aa4a-2aad10996374.png) ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730281774395-46227474-0a50-465a-a441-7808940513b4.png) 3. 重启Ollama: ```plain systemctl daemon-reload # 重新加载启动文件 systemctl restart ollama # 重启ollama serve ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730280560043-59f3f0bd-a22b-4035-a771-82f41f4a99a1.png) ## 定制化模型 1. <font style="color:#333333;">ollama支持修改模型参数以发布新的模型,比如修改模型的system prompts</font> ```plain # ./Modelfile FROM llama3.1:8b # 设置模型温度,温度越高越随机,温度越低,答案越固定、保守 PARAMETER temperature 1 # 设置上下文长度 PARAMETER num_ctx 4096 # 设置对话模板 SYSTEM You are Mario from super mario bros, acting as an assistant. ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730281934549-39444aa5-2532-4ebe-8f69-acbb6bde87b0.png) 2. <font style="color:#333333;">创建模型:</font> ```plain ollama create super-mario -f ./Modelfile ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730281971286-1d0f3202-3f79-44fe-9082-002659e5725e.png) 3. <font style="color:#333333;">查看:</font> ```plain ollama list ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730282026078-e8758482-5000-4619-8a13-f49313fe5aa7.png) 4. <font style="color:#333333;">运行测试:</font> ```plain ollama run super-mario:latest ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730282076557-05699c0d-b1c1-4636-93b3-250ecf6c5d62.png) ## <font style="color:#333333;">接入FastGPT</font> 这里再做一次安装OneAPI和FastGPT ### 安装Docker ```plain #移除 Docker 相关的软件包和容器运行时 apt remove docker-ce docker-ce-cli containerd.io docker-compose-plugin docker docker-engine docker.io containerd runc # 切换国内源 sed -i -r 's#http://(archive|security).ubuntu.com#https://mirrors.aliyun.com#g' /etc/apt/sources.list && apt update -y ### 安装docker # 安装ca-certificates curl gnupg lsb-release apt install ca-certificates curl gnupg lsb-release -y #下载 Docker 的 GPG 密钥,并将其添加到 apt-key 中 curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add - # 为Docker 添加阿里源 add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" # 更新系统的软件包 apt -y update # 安装docker相关的包 apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y #测试是否安装成功 docker --version ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730283108032-155605a4-d770-43dd-8540-efa97fbb806e.png) ### 安装docker-compose ```plain apt-get install docker-compose-plugin #测试安装是否成功 docker compose version ``` ![](https://cdn.nlark.com/yuque/0/2024/png/48516026/1730283207365-1532b8d1-d23b-4c61-9333-3edebbe8bfbf.png) ### 安装OneAPI 与 <font style="color:rgb(6, 6, 7);">FastGPT</font> <font style="color:#333333;">这里需要准备两个文件:docker-compose.yml以及</font><font style="color:#333333;">config.json,其中config.json是FastGPT的配置文件。</font> <font style="color:#333333;">首先创建目录,将所有关于fastgpt的文件放在一个新的文件夹下:</font> ```plain mkdir fastgpt cd fastgpt ``` 然后准备docker-compose.yml: ```plain # 数据库的默认账号和密码仅首次运行时设置有效 # 如果修改了账号密码,记得改数据库和项目连接参数,别只改一处~ # 该配置文件只是给快速启动,测试使用。正式使用,记得务必修改账号密码,以及调整合适的知识库参数,共享内存等。 # 如何无法访问 dockerhub 和 git,可以用阿里云(阿里云没有arm包) version: '3.3' services: # db pg: #image: pgvector/pgvector:0.7.0-pg15 # docker hub image: registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 # 阿里云 container_name: pg restart: always ports: # 生产环境建议不要暴露 - 5432:5432 networks: - fastgpt environment: # 这里的配置只有首次运行生效。修改后,重启镜像是不会生效的。需要把持久化数据删除再重启,才有效果 - POSTGRES_USER=username - POSTGRES_PASSWORD=password - POSTGRES_DB=postgres volumes: - ./pg/data:/var/lib/postgresql/data mongo: #image: mongo:5.0.18 # dockerhub image: registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18 # 阿里云 # image: mongo:4.4.29 # cpu不支持AVX时候使用 container_name: mongo restart: always ports: - 27017:27017 networks: - fastgpt command: mongod --keyFile /data/mongodb.key --replSet rs0 environment: - MONGO_INITDB_ROOT_USERNAME=myusername - MONGO_INITDB_ROOT_PASSWORD=mypassword volumes: - ./mongo/data:/data/db entrypoint: - bash - -c - | openssl rand -base64 128 > /data/mongodb.key chmod 400 /data/mongodb.key chown 999:999 /data/mongodb.key echo 'const isInited = rs.status().ok === 1 if(!isInited){ rs.initiate({ _id: "rs0", members: [ { _id: 0, host: "mongo:27017" } ] }) }' > /data/initReplicaSet.js # 启动MongoDB服务 exec docker-entrypoint.sh "$$@" & # 等待MongoDB服务启动 until mongo -u myusername -p mypassword --authenticationDatabase admin --eval "print('waited for connection')" > /dev/null 2>&1; do echo "Waiting for MongoDB to start..." sleep 2 done # 执行初始化副本集的脚本 mongo -u myusername -p mypassword --authenticationDatabase admin /data/initReplicaSet.js # 等待docker-entrypoint.sh脚本执行的MongoDB服务进程 wait $$! # fastgpt sandbox: container_name: sandbox # image: ghcr.io/labring/fastgpt-sandbox:latest # git image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest # 阿里云 networks: - fastgpt restart: always fastgpt: container_name: fastgpt # image: ghcr.io/labring/fastgpt:v4.8.8-fix2 # git image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.8-fix2 # 阿里云 ports: - 3000:3000 networks: - fastgpt depends_on: - mongo - pg - sandbox restart: always environment: # root 密码,用户名为: root。如果需要修改 root 密码,直接修改这个环境变量,并重启即可。 - DEFAULT_ROOT_PSW=1234 # AI模型的API地址哦。务必加 /v1。这里默认填写了OneApi的访问地址。 - OPENAI_BASE_URL=http://oneapi:3000/v1 # AI模型的API Key。(这里默认填写了OneAPI的快速默认key,测试通后,务必及时修改) - CHAT_API_KEY=sk-fastgpt # 数据库最大连接数 - DB_MAX_LINK=30 # 登录凭证密钥 - TOKEN_KEY=any # root的密钥,常用于升级时候的初始化请求 - ROOT_KEY=root_key # 文件阅读加密 - FILE_TOKEN_KEY=filetoken # MongoDB 连接参数. 用户名myusername,密码mypassword。 - MONGODB_URI=mongodb://myusername:mypassword@mongo:27017/fastgpt?authSource=admin # pg 连接参数 - PG_URL=postgresql://username:password@pg:5432/postgres # sandbox 地址 - SANDBOX_URL=http://sandbox:3000 # 日志等级: debug, info, warn, error - LOG_LEVEL=info - STORE_LOG_LEVEL=warn volumes: - ./config.json:/app/data/config.json # oneapi mysql: image: registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36 # 阿里云 # image: mysql:8.0.36 container_name: mysql restart: always ports: - 3306:3306 networks: - fastgpt command: --default-authentication-plugin=mysql_native_password environment: # 默认root密码,仅首次运行有效 MYSQL_ROOT_PASSWORD: oneapimmysql MYSQL_DATABASE: oneapi volumes: - ./mysql:/var/lib/mysql oneapi: container_name: oneapi # image: ghcr.io/songquanpeng/one-api:v0.6.7 image: registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6 # 阿里云 ports: - 3001:3000 depends_on: - mysql networks: - fastgpt restart: always environment: # mysql 连接参数 - SQL_DSN=root:oneapimmysql@tcp(mysql:3306)/oneapi # 登录凭证加密密钥 - SESSION_SECRET=oneapikey # 内存缓存 - MEMORY_CACHE_ENABLED=true # 启动聚合更新,减少数据交互频率 - BATCH_UPDATE_ENABLED=true # 聚合更新时长 - BATCH_UPDATE_INTERVAL=10 # 初始化的 root 密钥(建议部署完后更改,否则容易泄露) - INITIAL_ROOT_TOKEN=fastgpt volumes: - ./oneapi:/data networks: fastgpt: ``` 然后准备config.json: ```plain { "feConfigs": { "lafEnv": "https://laf.dev" }, "systemEnv": { "vectorMaxProcess": 15, "qaMaxProcess": 15, "pgHNSWEfSearch": 100 }, "llmModels": [ { "model": "llama-3.1-instruct", "name": "llama-3.1-instruct", "maxContext": 128000, "maxResponse": 128000, "quoteMaxToken": 32000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": false, "datasetProcess": false, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": false, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} }, { "model": "qwen2-instruct", "name": "qwen2-instruct", "avatar": "/imgs/model/qwen.svg", "maxContext": 128000, "maxResponse": 128000, "quoteMaxToken": 32000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": false, "datasetProcess": true, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": false, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} }, { "model": "glm-4v", "name": "glm-4v", "avatar": "/imgs/model/chatglm.svg", "maxContext": 128000, "maxResponse": 128000, "quoteMaxToken": 32000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": true, "datasetProcess": true, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": true, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} }, { "model": "ERNIE-Speed-128K", "name": "ERNIE-Speed-128K", "avatar": "/imgs/model/ernie.svg", "maxContext": 128000, "maxResponse": 128000, "quoteMaxToken": 32000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": false, "datasetProcess": true, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": true, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} }, { "model": "gpt-4o-mini", "name": "gpt-4o-mini", "avatar": "/imgs/model/openai.svg", "maxContext": 125000, "maxResponse": 4000, "quoteMaxToken": 120000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": true, "datasetProcess": true, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": true, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} }, { "model": "gpt-4o", "name": "gpt-4o", "avatar": "/imgs/model/openai.svg", "maxContext": 125000, "maxResponse": 4000, "quoteMaxToken": 120000, "maxTemperature": 1.2, "charsPointsPrice": 0, "censor": false, "vision": true, "datasetProcess": false, "usedInClassify": true, "usedInExtractFields": true, "usedInToolCall": true, "usedInQueryExtension": true, "toolChoice": true, "functionCall": false, "customCQPrompt": "", "customExtractPrompt": "", "defaultSystemChatPrompt": "", "defaultConfig": {} } ], "vectorModels": [ { "model": "m3e-base", "name": "m3e-base", "charsPointsPrice": 0, "defaultToken": 256, "maxToken": 512, "weight": 100, "defaultConfig": {}, "dbConfig": {}, "queryConfig": {} }, { "model": "text-embedding-ada-002", "name": "Embedding-2", "avatar": "/imgs/model/openai.svg", "charsPointsPrice": 0, "defaultToken": 700, "maxToken": 3000, "weight": 100, "defaultConfig": {}, "dbConfig": {}, "queryConfig": {} }, { "model": "text-embedding-3-large", "name": "text-embedding-3-large", "avatar": "/imgs/model/openai.svg", "charsPointsPrice": 0, "defaultToken": 512, "maxToken": 3000, "weight": 100, "defaultConfig": { "dimensions": 1024 } }, { "model": "embeding3", "name": "embeding3", "avatar": "/imgs/model/chatglm.svg", "charsPointsPrice": 0, "defaultToken": 512, "maxToken": 3000, "weight": 100 } ], "reRankModels": [ { "model": "bge-reranker-v2-m3", "name": "bge-reranker-v2-m3" } ], "audioSpeechModels": [ { "model": "tts-1", "name": "OpenAI TTS1", "charsPointsPrice": 0, "voices": [ { "label": "Alloy", "value": "alloy", "bufferId": "openai-Alloy" }, { "label": "Echo", "value": "echo", "bufferId": "openai-Echo" }, { "label": "Fable", "value": "fable", "bufferId": "openai-Fable" }, { "label": "Onyx", "value": "onyx", "bufferId": "openai-Onyx" }, { "label": "Nova", "value": "nova", "bufferId": "openai-Nova" }, { "label": "Shimmer", "value": "shimmer", "bufferId": "openai-Shimmer" } ] } ], "whisperModel": { "model": "whisper-1", "name": "Whisper1", "charsPointsPrice": 0 } } ``` ### 启动服务 <font style="color:#333333;">在fastgpt目录下执行如下命令:</font> ```plain docker-compose up -d ```
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: HswOAuth/llm_share#38
No description provided.