公司购买服务器,咨询GPU配置 #265
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: HswOAuth/llm_course#265
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
老师,你好!请教一个问题。我看训练11B的大模型,通过百度,需要的显存:训练总内存=模型内存+优化器内存+梯度内存+激活内存= 26GB + 12*13GB + 26GB +14.2GB=222.2GB,不知道是不是这样算的,最近我们公司在购买服务器,想咨询下。那正常的电脑根本就无法进行大模型的训练。或者老师推荐一个配置。
可以参考这个配置,根据训练的不同需求,对应不同的消耗显存,一台4卡的A100(40G显存)初步能满足要求,但我们一般训练的话(4K上下文),一般需要4卡80G的A100了