【求助帖】Xtuner训练遇到的问题21536073571cs #135
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: HswOAuth/llm_course#135
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
跟着老师的课想要复盘一下Xtuner的课里的练习:
我的操作步骤如下:
启动一个训练任务,任务分为两个task0 和 task 1.
/Users/louisunec/Desktop/Screenshot 2024-09-27 at 2.06.55 pm.png
按照课件准备环境
/Users/louisunec/Desktop/Screenshot 2024-09-27 at 2.10.32 pm.png
然后按照启动多机多卡训练出现了报错
说是连接不了网
/Users/louisunec/Desktop/Screenshot 2024-09-27 at 2.20.00 pm.png
图是我重新运营了后的报错
请老师帮我解决这个问题?
这里报错显示的是两个notebook的进程建立通信超时了,大概率是命令写错了。
可以把两个notebook中的执行命令都发一下吗?
或者检查下这几个流程,这里以task0作为master举例: