2025-09-22 13:50:23,762 xinference.core.worker 1589 INFO [request 082e861e-9778-11f0-bad1-0242ac110009] Enter launch_builtin_model, args: , kwargs: model_uid=m3e-large-0,model_name=m3e-large,model_size_in_billions=None,model_format=pytorch,quantization=none,model_engine=vllm,model_type=embedding,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=None,download_hub=None,model_path=None,enable_virtual_env=None,virtual_env_packages=None,envs=None,xavier_config=None /root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] INFO 09-22 13:50:29 [__init__.py:216] Automatically detected platform cuda. 2025-09-22 13:50:31,226 xinference.core.model 2312 INFO Start requests handler. INFO 09-22 13:50:31 [utils.py:328] non-default args: {'task': 'embed', 'disable_log_stats': True, 'model': '/root/.xinference/cache/v2/m3e-large-pytorch-none'} 2025-09-22 13:50:31,573 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:31,574 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:31,574 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:31,575 transformers.configuration_utils 2312 INFO Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } 2025-09-22 13:50:31,575 transformers.models.auto.image_processing_auto 2312 INFO Could not locate the image processor configuration file, will try to use the model config instead. Could not locate the image processor configuration file, will try to use the model config instead. INFO 09-22 13:50:38 [__init__.py:742] Resolved architecture: BertModel 2025-09-22 13:50:38,243 transformers.configuration_utils 2312 WARNING `torch_dtype` is deprecated! Use `dtype` instead! `torch_dtype` is deprecated! Use `dtype` instead! INFO 09-22 13:50:38 [__init__.py:2764] Downcasting torch.float32 to torch.float16. INFO 09-22 13:50:38 [__init__.py:1815] Using max model len 512 INFO 09-22 13:50:38 [arg_utils.py:1639] (Disabling) chunked prefill by default INFO 09-22 13:50:38 [arg_utils.py:1642] (Disabling) prefix caching by default 2025-09-22 13:50:38,481 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:38,481 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:38,482 transformers.configuration_utils 2312 INFO Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } INFO 09-22 13:50:38 [__init__.py:3479] Only "last" pooling supports chunked prefill and prefix caching; disabling both. 2025-09-22 13:50:38,535 transformers.tokenization_utils_base 2312 INFO loading file vocab.txt loading file vocab.txt 2025-09-22 13:50:38,535 transformers.tokenization_utils_base 2312 INFO loading file tokenizer.json loading file tokenizer.json 2025-09-22 13:50:38,535 transformers.tokenization_utils_base 2312 INFO loading file added_tokens.json loading file added_tokens.json 2025-09-22 13:50:38,535 transformers.tokenization_utils_base 2312 INFO loading file special_tokens_map.json loading file special_tokens_map.json 2025-09-22 13:50:38,536 transformers.tokenization_utils_base 2312 INFO loading file tokenizer_config.json loading file tokenizer_config.json 2025-09-22 13:50:38,536 transformers.tokenization_utils_base 2312 INFO loading file chat_template.jinja loading file chat_template.jinja 2025-09-22 13:50:38,566 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:38,566 transformers.configuration_utils 2312 INFO loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json loading configuration file /root/.xinference/cache/v2/m3e-large-pytorch-none/config.json 2025-09-22 13:50:38,567 transformers.configuration_utils 2312 INFO Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } Model config BertConfig { "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "directionality": "bidi", "dtype": "float32", "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "transformers_version": "4.56.1", "type_vocab_size": 2, "uniem_embedding_strategy": "last_mean", "use_cache": true, "vocab_size": 21128 } 2025-09-22 13:50:38,568 transformers.generation.configuration_utils 2312 INFO Generate config GenerationConfig { "bos_token_id": 0, "eos_token_id": 2, "pad_token_id": 0 } Generate config GenerationConfig { "bos_token_id": 0, "eos_token_id": 2, "pad_token_id": 0 } (EngineCore_DP0 pid=2392) INFO 09-22 13:50:38 [core.py:654] Waiting for init message from front-end. (EngineCore_DP0 pid=2392) INFO 09-22 13:50:38 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model='/root/.xinference/cache/v2/m3e-large-pytorch-none', speculative_config=None, tokenizer='/root/.xinference/cache/v2/m3e-large-pytorch-none', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=512, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/root/.xinference/cache/v2/m3e-large-pytorch-none, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, pooler_config=PoolerConfig(pooling_type='CLS', normalize=None, dimensions=None, enable_chunked_processing=None, max_embed_len=None, activation=None, logit_bias=None, softmax=None, step_tag_id=None, returned_token_ids=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] EngineCore failed to start. (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] Traceback (most recent call last): (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__ (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__ (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] self._init_executor() (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] self.collective_rpc("init_device") (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] return func(*args, **kwargs) (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] self.worker.init_device() # type: ignore (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] raise ValueError( (EngineCore_DP0 pid=2392) ERROR 09-22 13:50:40 [core.py:718] ValueError: Free memory on device (4.5/23.57 GiB) on startup is less than desired GPU memory utilization (0.9, 21.21 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. (EngineCore_DP0 pid=2392) Process EngineCore_DP0: (EngineCore_DP0 pid=2392) Traceback (most recent call last): (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=2392) self.run() (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=2392) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core (EngineCore_DP0 pid=2392) raise e (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=2392) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__ (EngineCore_DP0 pid=2392) super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__ (EngineCore_DP0 pid=2392) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=2392) self._init_executor() (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor (EngineCore_DP0 pid=2392) self.collective_rpc("init_device") (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=2392) answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method (EngineCore_DP0 pid=2392) return func(*args, **kwargs) (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device (EngineCore_DP0 pid=2392) self.worker.init_device() # type: ignore (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device (EngineCore_DP0 pid=2392) raise ValueError( (EngineCore_DP0 pid=2392) ValueError: Free memory on device (4.5/23.57 GiB) on startup is less than desired GPU memory utilization (0.9, 21.21 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. (EngineCore_DP0 pid=2392) Exception ignored in: (EngineCore_DP0 pid=2392) Traceback (most recent call last): (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 237, in __del__ (EngineCore_DP0 pid=2392) self.shutdown() (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 76, in shutdown (EngineCore_DP0 pid=2392) worker.shutdown() (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 528, in shutdown (EngineCore_DP0 pid=2392) self.worker.shutdown() (EngineCore_DP0 pid=2392) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 675, in shutdown (EngineCore_DP0 pid=2392) self.model_runner.ensure_kv_transfer_shutdown() (EngineCore_DP0 pid=2392) AttributeError: 'NoneType' object has no attribute 'ensure_kv_transfer_shutdown' 2025-09-22 13:50:41,058 xinference.core.worker 1589 ERROR Failed to load model m3e-large-0 Traceback (most recent call last): File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/worker.py", line 1130, in launch_builtin_model await model_ref.load() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/api.py", line 418, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 564, in __on_receive__ raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__ result = await result File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/model.py", line 387, in load await asyncio.to_thread(self._model.load) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/model/embedding/vllm/core.py", line 64, in load self._model = LLM(model=self._model_path, task="embed", **self._kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 282, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 493, in from_engine_args return engine_cls.from_vllm_config( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 134, in from_vllm_config return cls(vllm_config=vllm_config, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__ self.engine_core = EngineCoreClient.make_client( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 602, in __init__ super().__init__( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__ with launch_core_engines(vllm_config, executor_class, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines wait_for_engine_startup( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: [address=0.0.0.0:39101, pid=2312] Engine core initialization failed. See root cause above. Failed core proc(s): {} 2025-09-22 13:50:41,160 xinference.core.worker 1589 ERROR [request 082e861e-9778-11f0-bad1-0242ac110009] Leave launch_builtin_model, error: [address=0.0.0.0:39101, pid=2312] Engine core initialization failed. See root cause above. Failed core proc(s): {}, elapsed time: 17 s Traceback (most recent call last): File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/worker.py", line 1130, in launch_builtin_model await model_ref.load() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/api.py", line 418, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 564, in __on_receive__ raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__ result = await result File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/model.py", line 387, in load await asyncio.to_thread(self._model.load) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/model/embedding/vllm/core.py", line 64, in load self._model = LLM(model=self._model_path, task="embed", **self._kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 282, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 493, in from_engine_args return engine_cls.from_vllm_config( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 134, in from_vllm_config return cls(vllm_config=vllm_config, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__ self.engine_core = EngineCoreClient.make_client( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 602, in __init__ super().__init__( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__ with launch_core_engines(vllm_config, executor_class, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines wait_for_engine_startup( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: [address=0.0.0.0:39101, pid=2312] Engine core initialization failed. See root cause above. Failed core proc(s): {} 2025-09-22 13:50:41,173 xinference.api.restful_api 1550 ERROR [address=0.0.0.0:39101, pid=2312] Engine core initialization failed. See root cause above. Failed core proc(s): {} Traceback (most recent call last): File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1229, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/api.py", line 418, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 564, in __on_receive__ raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__ result = await result File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1280, in launch_builtin_model await _launch_model() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1215, in _launch_model subpool_address = await _launch_one_model( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1166, in _launch_one_model subpool_address = await worker_ref.launch_builtin_model( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/api.py", line 418, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 564, in __on_receive__ raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__ result = await result File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/worker.py", line 1130, in launch_builtin_model await model_ref.load() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xoscar/api.py", line 418, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 564, in __on_receive__ raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__ result = await result File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/core/model.py", line 387, in load await asyncio.to_thread(self._model.load) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/xinference/model/embedding/vllm/core.py", line 64, in load self._model = LLM(model=self._model_path, task="embed", **self._kwargs) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 282, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 493, in from_engine_args return engine_cls.from_vllm_config( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 134, in from_vllm_config return cls(vllm_config=vllm_config, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__ self.engine_core = EngineCoreClient.make_client( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client return SyncMPClient(vllm_config, executor_class, log_stats) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 602, in __init__ super().__init__( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__ with launch_core_engines(vllm_config, executor_class, File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines wait_for_engine_startup( File "/root/autodl-tmp/envs_dirs/py310-torch-gpu/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: [address=0.0.0.0:39101, pid=2312] Engine core initialization failed. See root cause above. Failed core proc(s): {}