Python bindings for MuPDF's rendering library.

Overview

PyMuPDF 1.19.3

logo

Release date: December 15, 2021

On PyPI since August 2016: Downloads

Author

Jorj X. McKie, based on original code by Ruikai Liu.

Introduction

PyMuPDF (current version 1.19.3) is a Python binding with support for MuPDF (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.

MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.

With PyMuPDF you can access files with extensions like ".pdf", ".xps", ".oxps", ".cbz", ".fb2" or ".epub". In addition, about 10 popular image formats can also be handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..

In partnership with Artifex, PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the "License and Copyright" section below for additional information.

Usage

For all supported document types (i.e. including images) you can

  • decrypt the document
  • access meta information, links and bookmarks
  • render pages in raster formats (PNG and some others), or the vector format SVG
  • search for text
  • extract text and images
  • convert to other formats: PDF, (X)HTML, XML, JSON, text
  • do OCR (Optical Character Recognition) if Tesseract is installed

To some degree, PyMuPDF can also be used as an image converter: it can read a range of input formats and can produce Portable Network Graphics (PNG), Portable Anymaps (PNM, etc.), Portable Arbitrary Maps (PAM), Adobe Postscript and Adobe Photoshop documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.

For PDF documents, there exists a plethora of additional features: they can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields).

  • Images and fonts can be extracted or inserted.

    You may want to have a look at this cool GUI example script, which lets you insert, delete, replace or re-position images under your visual control.

    If fontTools is installed, subsets can be built for eligible fonts based on their usage in the document. Especially for new PDFs, this can lead to significant file size reductions.

  • Embedded files are fully supported.

  • PDFs can be reformatted to support double-sided printing, posterizing, applying logos or watermarks

  • Password protection is fully supported: decryption, encryption, encryption method selection, permission level and user / owner password setting.

  • Support of the PDF Optional Content concept for images, text and drawings.

  • Low-level PDF structures can be accessed and modified.

  • Command line module "python -m fitz ...". A versatile utility with the following features

    • encryption / decryption / optimization
    • creation of sub-documents
    • document joining
    • image / font extraction
    • full support of embedded files
    • layout-preserving text extraction (all documents)

Have a look at the basic demos, the examples (which contain complete, working programs), and notebooks.

Documentation

Documentation is written using Sphinx and is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a quick start look at the tutorial and the recipes chapters.

  • You can view it online at Read the Docs. This site also provides download options for PDF.
  • The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML from here.
  • Find a Windows help file here.

The latest changelog can be viewed here.

Installation

PyMuPDF requires Python 3.6 or later.

Python wheels exist for Windows (32bit and 64bit), Linux (64bit, Intel and ARM) and Mac OSX (64bit, Intel only), so it can be installed from PyPI in the usual way:

python -m pip install --upgrade pip
python -m pip install --upgrade pymupdf

There are no mandatory external dependencies. However, some optional features become available if additional packages are installed:

  • Pillow for using pillow image output directly from PyMuPDF
  • fontTools for creating font subsets
  • pymupdf-fonts contains some nice fonts for your text output
  • Tesseract-OCR for optical character recognition in images and document pages. Tesseract is separate software, not a Python package. To enable OCR functions in PyMuPDF, the system environment variable "TESSDATA_PREFIX" must be defined and contain the tessdata folder name of the Tesseract installation location

Older wheels - also with support for older Python versions - can be found here and on PyPI.

Starting with v1.18.15, to minimize network traffic we no longer redundantly store wheels in this repository's releases folder. You can find older versions back to v1.9.2 on PyPI. Sources for every release continue to be stored in here.

Other platforms require installation from sources, follow these instructions in the documentation.

Note: If pip cannot find a wheel that is compatible with your platform, it will automatically start an installation from sources - which will fail if MuPDF is not installed on your system.

This repo's folder installation contains several platform-specific source installation scripts contributed by users. You may also find the following Wiki pages useful:

License and Copyright

In order to comply with MuPDF’s dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.

PyMuPDF and MuPDF are now available under both, open-source AGPL and commercial license agreements.

Please read the full text of the AGPL license agreement (which is also included here in file COPYING) to ensure that your use case complies with the guidelines of this license. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.

Artifex is the exclusive commercial licensing agent for MuPDF.

Artifex, the Artifex logo, MuPDF, and the MuPDF logo are registered trademarks of Artifex Software Inc. © 2021 Artifex Software, Inc. All rights reserved.

Contact

Please use the Discussions menu for questions, comments, or asking for help, and submit issues here. If you wish, you can also contact me directly via [email protected].

Owner
Jorj X. McKie
Jorj X. McKie
A bot for PDF for doing Many Things....

Telegram PDF Bot A Telegram bot that can: Compress, crop, decrypt, encrypt, merge, preview, rename, rotate, scale and split PDF files Compare text dif

Mr. Developer 60 Dec 27, 2022
JoplinPdf2Images - Converts a PDF to images in Joplin and adds it to the specified note as a printout

joplinPdf2Images Converts a PDF to images in Joplin and adds it to the specified

Morten Haahr Kristensen 2 Apr 20, 2022
A python library for extracting text from PDFs without losing the formatting of the PDF content.

Multilingual PDF to Text Install Package from Pypi Install it using pip. pip install multilingual-pdf2text The library uses Tesseract which can be ins

Shahrukh Khan 49 Nov 07, 2022
Camelot is a Python library that can help you extract tables from PDFs!

A Python library to extract tabular data from PDFs

1.8k Jan 03, 2023
Performing the following operations using python on PDF.

Python PDF Handling Tutorial Python is a highly versatile language with a huge set of libraries. It is a high level language with simple syntax. Pytho

Prajwol Lamichhane 131 Dec 16, 2022
A backend for mdbook in Python for generating PDF based on Chrome DevTools Protocol.

mdbook-pdf A backend for mdbook written in Python for generating PDF based on Chrome DevTools Protocol. Python library dependency Usage Put mdbook-pdf

Hollow Man 49 Dec 27, 2022
Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Pdf Compressor Telegram Bot A simple pdf size compressing telegram robot. Useful for digital documentation. Mandatory Variables API_HASH - Your A

Madhavan Mi 1 Feb 14, 2022
A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

A bulk html pdf generator. This application can generate PDFs in bulk by using just one click. Screenshots Requirements 🧱 Your system must have the f

Aman Nirala 3 Apr 23, 2022
Svg2pdfgen - Svg To PDF gen with python

Svg2pdfgen - Svg To PDF gen with python

Robert Urbańczyk 3 May 30, 2022
Simple pdf editor while preserving structure and format.

SIMPdf Simple pdf editor while preserving structure and format.

Shashwat Singh 242 Jan 04, 2023
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf

8k Jan 08, 2023
Produce pdf in python backend from simple bootstrap vue frontend and download to browser

vollmacht produce pdf in python backend from simple bootstrap vue frontend and download to browser Frontend in one file with bootstrap-vue (allthough

Otto 1 Nov 08, 2020
this is simple program, that converts pdf file to png

author: a5892731 last update:2021-11-01 version: 1.1 resources: -https://pypi.org/project/pdf2image/ -https://github.com/oschwartz10612/poppler-window

1 Nov 01, 2021
borb is a library for reading, creating and manipulating PDF files in python.

borb is a library for reading, creating and manipulating PDF files in python.

Joris Schellekens 2.9k Jan 01, 2023
Auto Convert PDFs to png files in python

This python tool, which is an application of PyMuPDF module, could auto convert PDFs to png files

Bo-Yu 4 Dec 05, 2021
Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

tratapdf Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza. dependências icc-profiles ghostscript visualizador de PDF

1 Nov 30, 2021
Simple python tool created for downloading PDF.

PDFdownloader Usage Open PDF in full-screen mode Run scan.exe Enter how many pages you want to scan Focus PDF After scanning is done, run merge.exe En

5 Oct 27, 2021
pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

pystitcher pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a mark

Nemo 387 Dec 10, 2022
Python PDF Parser (Not actively maintained). Check out pdfminer.six.

PDFMiner PDFMiner is a text extraction tool for PDF documents. Warning: As of 2020, PDFMiner is not actively maintained. The code still works, but thi

Yusuke Shinyama 4.9k Jan 04, 2023
Pdfencrypt is a tool to encrypt/lock PDFs

Pdfencrypt Pdfencrypt is a tool to encrypt/lock PDFs Installation $ apt update $ apt upgrade $ apt install git $ apt install python $ git clone https:

Anontemitayo 5 Nov 28, 2021