11-多模态-BLIP #220

New Issue

12535224197cs · 2024-10-17T22:45:56+08:00

12535224197cs commented

2024-10-17 22:45:56 +08:00

基础环境:
NVIDIA GeForce RTX 3080(10240MiB)
step5 任务中使用显存1260MiB
step6 任务中使用显存1888MiB

step1: 下载源码

# 下载源码
git clone https://github.com/salesforce/BLIP.git

step2: 配置环境

cd BLIP/
conda create -n test_blip python=3.8
conda activate test_blip
pip install -r requirements.txt
pip install -U huggingface_hub
设置huggingface⽹站镜像
export HF_HOME=/mnt/wksp/agi2/11/data/
export HF_ENDPOINT=https://hf-mirror.com

step3: 下载模型

# 下载模型bert-base-uncased
huggingface-cli download --resume-download --local-dir-use-symlinks False g
oogle-bert/bert-base-uncased --local-dir bert-base-uncased
# 下载模型Image Captioning模型
https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth
# 下载 VQA 模型
https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_vqa_capfilt_large.pth

step4: 修改模型文件的路径

模型存放的路径

因未将bert-base-uncased模型放在BLIP目录下，在BLIP/models/blip.py文件中修改了模型加载的路径

step5: 使⽤微调的 BLIP 模型（Image Captioning模型）执⾏图像的Image Captioning⽣成

step6: 模型执⾏VQA任务

基础环境: NVIDIA GeForce RTX 3080(10240MiB) step5 任务中使用显存1260MiB step6 任务中使用显存1888MiB ## step1: 下载源码 ``` # 下载源码 git clone https://github.com/salesforce/BLIP.git ``` ## step2: 配置环境 ``` cd BLIP/ conda create -n test_blip python=3.8 conda activate test_blip pip install -r requirements.txt pip install -U huggingface_hub 设置huggingface⽹站镜像 export HF_HOME=/mnt/wksp/agi2/11/data/ export HF_ENDPOINT=https://hf-mirror.com ``` ## step3: 下载模型 ``` # 下载模型bert-base-uncased huggingface-cli download --resume-download --local-dir-use-symlinks False g oogle-bert/bert-base-uncased --local-dir bert-base-uncased # 下载模型Image Captioning模型 https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth # 下载 VQA 模型 https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_vqa_capfilt_large.pth ``` ## step4: 修改模型文件的路径模型存放的路径 ![a.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1729217060175-8ee86729-e2f4-44a2-9214-0ae28577255d.png) 因未将bert-base-uncased模型放在BLIP目录下，在BLIP/models/blip.py文件中修改了模型加载的路径 ![a.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1729175559936-289cd3da-4d8e-43fa-8c70-610b0d1eb097.png) ## step5: 使⽤微调的 BLIP 模型（Image Captioning模型）执⾏图像的Image Captioning⽣成 ![a.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1729175013352-9523f282-4363-46c4-9e2a-49290fcb8854.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) <video src="/attachments/8bfe515c-6dd1-4c5c-a58c-41dbd49ad004" title="blip_base_caption.mov" controls></video> ## step6: 模型执⾏VQA任务 ![a.png](https://cdn.nlark.com/yuque/0/2024/png/44993204/1729175600671-75dcafe5-affa-4358-8bc6-917ccb6483d9.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0) <video src="/attachments/48855728-2814-4cd0-b691-321b4d739d81" title="blip_base_vqa.mov" controls></video>

blip_base_caption.mov

14 MiB

blip_base_vqa.mov

23 MiB

Sign in to join this conversation.