请问大模型针对大量的PDF文件分析处理有什么好的方法吗？ #356

New Issue

slxx · 2024-11-06T16:34:01+08:00

slxx commented

请问大模型针对大量的PDF文件分析处理有什么好的方法吗？
大模型是否能对某一行业内的大量的PDF文件（单文件大小可能有两三百页里面或许包含图片）进行批量的学习，能进行精准分解，分析，最后输出该行业内特定格式的相关评价类报告？

请问大模型针对大量的PDF文件分析处理有什么好的方法吗？大模型是否能对某一行业内的大量的PDF文件（单文件大小可能有两三百页里面或许包含图片）进行批量的学习，能进行精准分解，分析，最后输出该行业内特定格式的相关评价类报告？

21970855250cs commented

主要难点在于PDF信息抽取，
可以使用PDF信息抽取工具，再输入到大模型进行总结
比如MinerU工具：https://www.shlab.org.cn/news/5443982
https://github.com/opendatalab/MinerU

主要难点在于PDF信息抽取，可以使用PDF信息抽取工具，再输入到大模型进行总结比如MinerU工具：https://www.shlab.org.cn/news/5443982 https://github.com/opendatalab/MinerU

Sign in to join this conversation.

No Label

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: HswOAuth/llm_course#356