【求助贴】在《基于公司制度RAG》中的demo.py 运行时候报错。 #184

Open
opened 2024-10-12 10:03:18 +08:00 by 11248284577cs · 5 comments

报错位置:image
image

报错:
E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\Scripts\python.exe E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\main.py __embeddings: client=SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) model_name='./BAAI/bge-large-zh-v1.5' cache_folder=None model_kwargs={'device': 'cpu'} encode_kwargs={} query_instruction='为这个句子生成表示以用于检索相关文章:' embed_instruction='' show_progress=False __chat_history: os.listdir(KNOWLEDGE_DIR): ['中国人工智能系列白皮书(1).pdf', '中国人工智能系列白皮书.pdf', '人事管理流程(1).docx', '人事管理流程.docx'] file_path: ./chroma/knowledge/中国人工智能系列白皮书(1).pdf collection_name: 33c66cec79a243d518338e5fd391755c self.__retrievers: {} loader_class: <class 'langchain_community.document_loaders.pdf.PyPDFLoader'> params: {} self.loader: <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x000001E04DBDE5C0> loader: <llm.MyCustomLoader object at 0x000001E04DBDD000> E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py:83: LangChainDeprecationWarning: The class Chromawas deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the langchain-chroma package and should be used instead. To use it runpip install -U langchain-chromaand import asfrom langchain_chroma import Chroma`.
db = Chroma(collection_name=collection_name,
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DB6E320>
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DB6E320>
record_manager.create_schema: None
E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\gradio\analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.44.1, however version 5.0.1 is available, please upgrade.

warnings.warn(
r: {'num_added': 0, 'num_updated': 0, 'num_skipped': 149, 'num_deleted': 0}
ensemble_retriever: retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)] weights=[0.5, 0.5]
collections: [None, '中国人工智能系列白皮书(1).pdf']
file_path: ./chroma/knowledge/中国人工智能系列白皮书.pdf
collection_name: 503dc7d31f89743234dc87e712078441
self.__retrievers: {'33c66cec79a243d518338e5fd391755c': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)], weights=[0.5, 0.5])}
loader_class: <class 'langchain_community.document_loaders.pdf.PyPDFLoader'>
params: {}
self.loader: <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x000001E04DB6E320>
loader: <llm.MyCustomLoader object at 0x000001E04DFD13F0>
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DBDD000>
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DBDD000>
record_manager.create_schema: None
r: {'num_added': 0, 'num_updated': 0, 'num_skipped': 149, 'num_deleted': 0}
ensemble_retriever: retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04DBDE5C0>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001E04DFF6020>)] weights=[0.5, 0.5]
collections: [None, '中国人工智能系列白皮书(1).pdf', '中国人工智能系列白皮书.pdf']
file_path: ./chroma/knowledge/人事管理流程(1).docx
collection_name: fcdd3626ccf286e5b30ffcf77274f518
self.__retrievers: {'33c66cec79a243d518338e5fd391755c': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)], weights=[0.5, 0.5]), '503dc7d31f89743234dc87e712078441': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04DBDE5C0>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001E04DFF6020>)], weights=[0.5, 0.5])}
loader_class: <class 'langchain_community.document_loaders.word_document.UnstructuredWordDocumentLoader'>
params: {}
self.loader: <langchain_community.document_loaders.word_document.UnstructuredWordDocumentLoader object at 0x000001E04FB165C0>
loader: <llm.MyCustomLoader object at 0x000001E04FB14C10>
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DFD13F0>
record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DFD13F0>
record_manager.create_schema: None
Traceback (most recent call last):
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\main.py", line 101, in
collection = gr.Dropdown(choices=llm.load_knowledge(), label="知识库")
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 212, in load_knowledge
self.__retrievers[collection_name] = create_indexes(collection_name, loader, self.__embeddings)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 95, in create_indexes
documents = loader.load()
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 156, in load
return self.lazy_load()
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 152, in lazy_load
return self.loader.load_and_split(self.text_splitter)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_core\document_loaders\base.py", line 64, in load_and_split
docs = self.load()
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
return list(self.lazy_load())
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_community\document_loaders\word_document.py", line 126, in _get_elements
return partition_docx(filename=self.file_path, **self.unstructured_kwargs)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\documents\elements.py", line 593, in wrapper
elements = func(*args, **kwargs)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\file_utils\filetype.py", line 626, in wrapper
elements = func(*args, **kwargs)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\file_utils\filetype.py", line 582, in wrapper
elements = func(*args, **kwargs)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper
elements = func(*args, **kwargs)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 177, in partition_docx
return list(elements)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\lang.py", line 399, in apply_lang_metadata
elements = list(elements)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 410, in _iter_document_elements
yield from self._iter_paragraph_elements(block_item)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 634, in _iter_paragraph_elements
yield from self._classify_paragraph_to_element(item)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 471, in _classify_paragraph_to_element
TextSubCls = self._parse_paragraph_text_for_element_type(paragraph)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 938, in _parse_paragraph_text_for_element_type
if is_possible_narrative_text(text):
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 80, in is_possible_narrative_text
if exceeds_cap_ratio(text, threshold=cap_threshold):
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 276, in exceeds_cap_ratio
if sentence_count(text, 3) > 1:
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 225, in sentence_count
sentences = sent_tokenize(text)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\nlp\tokenize.py", line 30, in sent_tokenize
return sent_tokenize(text)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize_init
.py", line 119, in sent_tokenize
tokenizer = get_punkt_tokenizer(language)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize_init
.py", line 105, in _get_punkt_tokenizer
return PunktTokenizer(language)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\punkt.py", line 1744, in init
self.load_lang(lang)
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:


Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- 'C:\Users\meefy/nltk_data'
- 'E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\nltk_data'
- 'E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\share\nltk_data'
- 'E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\nltk_data'
- 'C:\Users\meefy\AppData\Roaming\nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'


INFO:backoff:Backing off send_request(...) for 0.1s (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Read timed out. (read timeout=15))

Process finished with exit code 1
`

报错位置:![image](/attachments/045ac525-008a-4959-87d7-9bed17ea8d0e) ![image](/attachments/897f7c76-c0d7-421a-834e-e9da5760a376) 报错: `E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\Scripts\python.exe E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\main.py __embeddings: client=SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) model_name='./BAAI/bge-large-zh-v1.5' cache_folder=None model_kwargs={'device': 'cpu'} encode_kwargs={} query_instruction='为这个句子生成表示以用于检索相关文章:' embed_instruction='' show_progress=False __chat_history: os.listdir(KNOWLEDGE_DIR): ['中国人工智能系列白皮书(1).pdf', '中国人工智能系列白皮书.pdf', '人事管理流程(1).docx', '人事管理流程.docx'] file_path: ./chroma/knowledge/中国人工智能系列白皮书(1).pdf collection_name: 33c66cec79a243d518338e5fd391755c self.__retrievers: {} loader_class: <class 'langchain_community.document_loaders.pdf.PyPDFLoader'> params: {} self.loader: <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x000001E04DBDE5C0> loader: <llm.MyCustomLoader object at 0x000001E04DBDD000> E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py:83: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the langchain-chroma package and should be used instead. To use it run `pip install -U langchain-chroma` and import as `from langchain_chroma import Chroma`. db = Chroma(collection_name=collection_name, record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DB6E320> record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DB6E320> record_manager.create_schema: None E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\gradio\analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.44.1, however version 5.0.1 is available, please upgrade. -------- warnings.warn( r: {'num_added': 0, 'num_updated': 0, 'num_skipped': 149, 'num_deleted': 0} ensemble_retriever: retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)] weights=[0.5, 0.5] collections: [None, '中国人工智能系列白皮书(1).pdf'] file_path: ./chroma/knowledge/中国人工智能系列白皮书.pdf collection_name: 503dc7d31f89743234dc87e712078441 self.__retrievers: {'33c66cec79a243d518338e5fd391755c': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)], weights=[0.5, 0.5])} loader_class: <class 'langchain_community.document_loaders.pdf.PyPDFLoader'> params: {} self.loader: <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x000001E04DB6E320> loader: <llm.MyCustomLoader object at 0x000001E04DFD13F0> record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DBDD000> record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DBDD000> record_manager.create_schema: None r: {'num_added': 0, 'num_updated': 0, 'num_skipped': 149, 'num_deleted': 0} ensemble_retriever: retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04DBDE5C0>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001E04DFF6020>)] weights=[0.5, 0.5] collections: [None, '中国人工智能系列白皮书(1).pdf', '中国人工智能系列白皮书.pdf'] file_path: ./chroma/knowledge/人事管理流程(1).docx collection_name: fcdd3626ccf286e5b30ffcf77274f518 self.__retrievers: {'33c66cec79a243d518338e5fd391755c': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04E4B8E20>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DF8B7729E0>)], weights=[0.5, 0.5]), '503dc7d31f89743234dc87e712078441': EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E04DBDE5C0>, search_kwargs={'k': 3}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001E04DFF6020>)], weights=[0.5, 0.5])} loader_class: <class 'langchain_community.document_loaders.word_document.UnstructuredWordDocumentLoader'> params: {} self.loader: <langchain_community.document_loaders.word_document.UnstructuredWordDocumentLoader object at 0x000001E04FB165C0> loader: <llm.MyCustomLoader object at 0x000001E04FB14C10> record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DFD13F0> record_manager: <langchain.indexes._sql_record_manager.SQLRecordManager object at 0x000001E04DFD13F0> record_manager.create_schema: None Traceback (most recent call last): File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\main.py", line 101, in <module> collection = gr.Dropdown(choices=llm.load_knowledge(), label="知识库") File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 212, in load_knowledge self.__retrievers[collection_name] = create_indexes(collection_name, loader, self.__embeddings) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 95, in create_indexes documents = loader.load() File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 156, in load return self.lazy_load() File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\llm.py", line 152, in lazy_load return self.loader.load_and_split(self.text_splitter) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_core\document_loaders\base.py", line 64, in load_and_split docs = self.load() File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load return list(self.lazy_load()) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load elements = self._get_elements() File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\langchain_community\document_loaders\word_document.py", line 126, in _get_elements return partition_docx(filename=self.file_path, **self.unstructured_kwargs) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\documents\elements.py", line 593, in wrapper elements = func(*args, **kwargs) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\file_utils\filetype.py", line 626, in wrapper elements = func(*args, **kwargs) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\file_utils\filetype.py", line 582, in wrapper elements = func(*args, **kwargs) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper elements = func(*args, **kwargs) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 177, in partition_docx return list(elements) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\lang.py", line 399, in apply_lang_metadata elements = list(elements) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 410, in _iter_document_elements yield from self._iter_paragraph_elements(block_item) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 634, in _iter_paragraph_elements yield from self._classify_paragraph_to_element(item) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 471, in _classify_paragraph_to_element TextSubCls = self._parse_paragraph_text_for_element_type(paragraph) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\docx.py", line 938, in _parse_paragraph_text_for_element_type if is_possible_narrative_text(text): File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 80, in is_possible_narrative_text if exceeds_cap_ratio(text, threshold=cap_threshold): File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 276, in exceeds_cap_ratio if sentence_count(text, 3) > 1: File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\partition\text_type.py", line 225, in sentence_count sentences = sent_tokenize(text) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\unstructured\nlp\tokenize.py", line 30, in sent_tokenize return _sent_tokenize(text) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\__init__.py", line 119, in sent_tokenize tokenizer = _get_punkt_tokenizer(language) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\__init__.py", line 105, in _get_punkt_tokenizer return PunktTokenizer(language) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\punkt.py", line 1744, in __init__ self.load_lang(lang) File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\tokenize\punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") File "E:\workspace\ai\AI_学习2-基于公司制度RAG\基于公司制度RAG\venv\lib\site-packages\nltk\data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - 'C:\\Users\\meefy/nltk_data' - 'E:\\workspace\\ai\\AI_学习2-基于公司制度RAG\\基于公司制度RAG\\venv\\nltk_data' - 'E:\\workspace\\ai\\AI_学习2-基于公司制度RAG\\基于公司制度RAG\\venv\\share\\nltk_data' - 'E:\\workspace\\ai\\AI_学习2-基于公司制度RAG\\基于公司制度RAG\\venv\\lib\\nltk_data' - 'C:\\Users\\meefy\\AppData\\Roaming\\nltk_data' - 'C:\\nltk_data' - 'D:\\nltk_data' - 'E:\\nltk_data' ********************************************************************** INFO:backoff:Backing off send_request(...) for 0.1s (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Read timed out. (read timeout=15)) Process finished with exit code 1 `
Author
No description provided.
Author

#11248284577cs

#11248284577cs

下载包

import nltk
nltk.download('punkt_tab')

image

下载包 ``` import nltk nltk.download('punkt_tab') ``` ![image](/attachments/6585f3ee-6438-41e2-a07a-07e1ada4a531)
553 KiB

我是不是把你这个问题解决了,如果没有的话,请联系 => 先知

我是不是把你这个问题解决了,如果没有的话,请联系 => 先知
Author

解决了

解决了
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: HswOAuth/llm_course#184
No description provided.