======= root@zf4909066b964784aa197ec174fed0d1-task0-0:/# python -m fastchat.serve.controller --host 0.0.0.0 2025-01-16 14:27:20 | INFO | controller | args: Namespace(host='0.0.0.0', port=21001, dispatch_method='shortest_queue', ssl=False) 2025-01-16 14:27:20 | ERROR | stderr | INFO: Started server process [57] 2025-01-16 14:27:20 | ERROR | stderr | INFO: Waiting for application startup. 2025-01-16 14:27:20 | ERROR | stderr | INFO: Application startup complete. 2025-01-16 14:27:20 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit) 2025-01-16 14:28:07 | INFO | controller | Register a new worker: http://localhost:21002 2025-01-16 14:28:07 | INFO | controller | Register done: http://localhost:21002, {'model_names': ['glm-4-9b-chat'], 'speed': 1, 'queue_length': 0} 2025-01-16 14:28:07 | INFO | stdout | INFO: 127.0.0.1:43232 - "POST /register_worker HTTP/1.1" 200 OK 2025-01-16 14:28:52 | INFO | controller | Receive heart beat. http://localhost:21002 2025-01-16 14:28:52 | INFO | stdout | INFO: 127.0.0.1:43284 - "POST /receive_heart_beat HTTP/1.1" 200 OK 2025-01-16 14:29:33 | INFO | stdout | INFO: 127.0.0.1:43336 - "POST /list_models HTTP/1.1" 200 OK ====== root@zf4909066b964784aa197ec174fed0d1-task0-0:/# python -m fastchat.serve.model_worker --model-path /dataset/glm-4-9b-chat/ --host 0.0.0.0 --num-gpus 4 --max-gpu-memory 15GiB 2025-01-16 14:27:53 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='/dataset/glm-4-9b-chat/', revision='main', device='cuda', gpus=None, num_gpus=4, max_gpu_memory='15GiB', dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False) 2025-01-16 14:27:53 | INFO | model_worker | Loading the model ['glm-4-9b-chat'] on worker a0d32b20 ... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 0%| | 0/10 [00:00 "model": "glm-4-9b-chat", > "prompt": "Once upon a time", > "max_tokens": 41, > "temperature": 0.5}' Internal Server Error