【作业贴】29-25.3.14-function call多机多卡训练(下) #89
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
训练结果如下:


03/22 04:16:09 - mmengine - INFO - Iter(train) [4250/4296] lr: 6.2743e-08 eta: 0:27:58 time: 35.6854 data_time: 0.0198 memory: 6911 loss: 0.2638 tflops: 3.2195 tokens_per_sec: 47.9285
[2025-03-22 04:19:37,131] [WARNING] [stage3.py:1998:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
03/22 04:22:03 - mmengine - INFO - Iter(train) [4260/4296] lr: 3.8886e-08 eta: 0:21:53 time: 35.4227 data_time: 0.0193 memory: 6911 loss: 0.2344 tflops: 2.9454 tokens_per_sec: 43.8467
03/22 04:27:48 - mmengine - INFO - Iter(train) [4270/4296] lr: 2.0707e-08 eta: 0:15:48 time: 34.4737 data_time: 0.0188 memory: 6911 loss: 0.3555 tflops: 2.7921 tokens_per_sec: 41.7145
[2025-03-22 04:29:00,366] [WARNING] [stage3.py:1998:step] 7 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
03/22 04:33:55 - mmengine - INFO - Iter(train) [4280/4296] lr: 8.2093e-09 eta: 0:09:43 time: 36.7315 data_time: 0.0190 memory: 6909 loss: 0.3427 tflops: 2.9707 tokens_per_sec: 44.2234
[2025-03-22 04:38:41,017] [WARNING] [stage3.py:1998:step] 8 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
03/22 04:39:55 - mmengine - INFO - Iter(train) [4290/4296] lr: 1.3919e-09 eta: 0:03:38 time: 36.0362 data_time: 0.0185 memory: 6909 loss: 0.2530 tflops: 2.7508 tokens_per_sec: 41.0541
03/22 04:43:26 - mmengine - INFO - after_train_iter in EvaluateChatHook.
03/22 04:49:58 - mmengine - INFO - Sample output:
<|start_header_id|>system<|end_header_id|>
You are a assistant who can utilize external tools.
[{'name': 'ArxivSearch.get_arxiv_article_information','description': 'Run Arxiv search and get the article meta information.','parameters': [{'name': 'query', 'type': 'STRING', 'description':'the content of search query'}],'required': ['query'],'return_data':[{'name': 'content', 'description': 'a list of 3 arxiv search papers', 'type': 'STRING'}],'parameter_description': 'If you call this tool, you must pass arguments in the JSON format{key: value}, where the key is the parameter name.'}]
To use a tool, please use the following format:
The response after utilizing tools should using the following format:
If you already know the answer, or you do not need to use tools,
please using the following format to reply:
Begin!<|eot_id|><|start_header_id|>user<|end_header_id|>
Please help me search the InternLM2 Technical Report.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Thought:Sure, I can help you with that. Could you please provide me with more details about the InternLM2 Technical Report you want to search for? Such as the author or any specific keywords related to the report.
Action:ArxivSearch.get_arxiv_article_information
Action Input:query=InternLM2 Technical Report<|eot_id|>
03/22 04:49:58 - mmengine - INFO - Saving checkpoint at 4296 iterations
03/22 04:51:04 - mmengine - INFO - after_train in EvaluateChatHook.
03/22 04:57:42 - mmengine - INFO - Sample output:
<|start_header_id|>system<|end_header_id|>
You are a assistant who can utilize external tools.
[{'name': 'ArxivSearch.get_arxiv_article_information','description': 'Run Arxiv search and get the article meta information.','parameters': [{'name': 'query', 'type': 'STRING', 'description':'the content of search query'}],'required': ['query'],'return_data':[{'name': 'content', 'description': 'a list of 3 arxiv search papers', 'type': 'STRING'}],'parameter_description': 'If you call this tool, you must pass arguments in the JSON format{key: value}, where the key is the parameter name.'}]
To use a tool, please use the following format:
The response after utilizing tools should using the following format:
If you already know the answer, or you do not need to use tools,
please using the following format to reply:
Begin!<|eot_id|><|start_header_id|>user<|end_header_id|>
Please help me search the InternLM2 Technical Report.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Thought:Sure, I can help you with that. Could you please provide me with more details about the InternLM2 Technical Report you want to search for? Such as the author or any specific keywords related to the report.
Action:ArxivSearch.get_arxiv_article_information
Action Input:query=InternLM2 Technical Report<|eot_id|>