如何在Python中将PDF转换为图像（代码实现示例） _Python将PDF转换为图像示例

本文带你了解如何使用 PyMuPDF 库将 PDF 文件转换为 Python 中每页的单个图像。
如何在Python中将PDF转换为图像？有多种工具可以将 PDF 文件转换为图像，例如Linux 中的pdftoppm。本教程旨在用 Python 开发一个轻量级的命令行工具，将 PDF 文件转换为图像。
Python如何将PDF转换为图像？我们将使用PyMuPDF，这是一种高度通用、可定制的 PDF、XPS 和电子书解释器解决方案，可用于各种应用程序，例如 PDF 渲染器、查看器或工具包。
首先，让我们安装所需的库：

$ pip install PyMuPDF==1.18.9

导入库：

import fitzfrom typing import Tuple import os

让我们定义我们的主要效用函数，完整Python将PDF转换为图像示例如下：

def convert_pdf2img(input_file: str, pages: Tuple = None): """Converts pdf to image and generates a file by page""" # Open the document pdfIn = fitz.open(input_file) output_files = [ ] # Iterate throughout the pages for pg in range(pdfIn.pageCount): if str(pages) != str(None): if str(pg) not in str(pages): continue # Select a page page = pdfIn[ pg] rotate = int(0) # PDF Page is converted into a whole picture 1056*816 and then for each picture a screenshot is taken. # zoom = 1.33333333 -----> Image size = 1056*816 # zoom = 2 ---> 2 * Default Resolution (text is clear, image text is hard to read)= filesize small / Image size = 1584*1224 # zoom = 4 ---> 4 * Default Resolution (text is clear, image text is barely readable) = filesize large # zoom = 8 ---> 8 * Default Resolution (text is clear, image text is readable) = filesize large zoom_x = 2 zoom_y = 2 # The zoom factor is equal to 2 in order to make text clear # Pre-rotate is to rotate if needed. mat = fitz.Matrix(zoom_x, zoom_y).preRotate(rotate) pix = page.getPixmap(matrix=mat, alpha=False) output_file = f"{os.path.splitext(os.path.basename(input_file))[ 0]}_page{pg+1}.png" pix.writePNG(output_file) output_files.append(output_file) pdfIn.close() summary = { "File": input_file, "Pages": str(pages), "Output File(s)": str(output_files) } # Printing Summary print("## Summary ########################################################") print("\n".join("{}:{}".format(i, j) for i, j in summary.items())) print("###################################################################") return output_files

Python如何将PDF转换为图像？上述函数将一个 PDF 文件转换为一系列图像文件。它遍历选定的页面（默认为所有页面），截取当前页面的屏幕截图，并使用writePNG()方法生成图像文件。
你可以更改zoom_x和zoom_y更改缩放系数，随意调整这些参数和rotate变量以满足你的需要。
Python将PDF转换为图像示例：现在让我们使用这个函数：

if __name__ == "__main__": import sys input_file = sys.argv[ 1] convert_pdf2img(input_file)

如何在Python中将PDF转换为图像？让我们在多页 PDF 文件上测试脚本（在此处获取）：

$ python convert_pdf2image.py bert-paper.pdf

输出将如下所示：

## Summary ######################################################## File:bert-paper.pdf Pages:None Output File(s):[ 'bert-paper_page1.png', 'bert-paper_page2.png', 'bert-paper_page3.png', 'bert-paper_page4.png', 'bert-paper_page5.png', 'bert-paper_page6.png', 'bert-paper_page7.png', 'bert-paper_page8.png', 'bert-paper_page9.png', 'bert-paper_page10.png', 'bert-paper_page11.png', 'bert-paper_page12.png', 'bert-paper_page13.png', 'bert-paper_page14.png', 'bert-paper_page15.png', 'bert-paper_page16.png'] ###################################################################

事实上，图像已成功生成：