Nougat：科学文档的OCR使用记录

IT业界
2025-07-21 18:53:19

github /facebookresearch/nougat

python环境需要在3.8以上

安装：pip install nougat-ocr

模型默认下载地址：/home/****/.cache/torch/hub/nougat-0.1.0-small

环境安装好之后默认使用cpu

UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http:// .nvidia /Download/index.aspx Alternatively, go to: pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 WARNING:root:No GPU found. Conversion on CPU is very slow.

如果需要使用GPU，则需要重新安装和自己cuda版本对应的torch等，我这边是cuda11.8

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

环境配置好之后即可进行PDF识别

在output目录下会生成.mmd格式的文件

vscode中使用如下插件可以查看mmd格式中的内容，文字可直接复制

3090GPU上

显存占用17368 / 24576M 显存占用17G，16页的PDF 耗时30秒

自己随便写的文字可能识别不了，图片中的文字无法识别

标签：

Nougat：科学文档的OCR使用记录由讯客互联IT业界栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“Nougat：科学文档的OCR使用记录”

上一篇
C#使用纯OpenCvSharp部署yolov8-pose姿

下一篇
71内网安全-域横向网络传输应用层隧道技术