【求助贴】目前搭建了一个 元数据RAG向量库,但是只对word纯文本做了转向量处理,现在想对表格内容做处理,要如何处理合适嘞,后续也会处理pdf数据, #430
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: HswOAuth/llm_course#430
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
目前是想到了 表格和图表我可以用多模态转为纯文本存储,ocr+多模态的方式,但是 想要一些更具体性的宝贵意见
可以使用功能langchain作为基本方案,使用Unstructured处理图像文档,使用CSVLoader处理表格数据。可以参考langchain的官方文档https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/