How To Install ocrmypdf on Debian 9
Introduction
In this tutorial we learn how to install ocrmypdf on Debian 9.
What is ocrmypdf
ocrmypdf is:
OCRmyPDF generates a searchable PDF/A file from a regular PDF containing only images, allowing it to be searched.
It uses the Tesseract OCR engine and so supports all the languages that Tesseract does.
Some other main features:
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a lossless operation without rendering vector information
- Keeps file size about the same
- If requested deskews and/or cleans the image before performing OCR
- Validates input and output files
- Provides debug mode to enable easy verification of the OCR results
- Processes pages in parallel when more than one CPU core is available
- Battle-tested on thousands of PDFs, a test suite and continuous integration.
There are three methods to install ocrmypdf on Debian 9. We can use apt-get, apt and aptitude. In the following sections we will describe each method. You can choose one of them.
Install ocrmypdf Using apt-get
Update apt database with apt-get using the following command.
sudo apt-get update
After updating apt database, We can install ocrmypdf using apt-get by running the following command:
sudo apt-get -y install ocrmypdf
Install ocrmypdf Using apt
Update apt database with apt using the following command.
sudo apt update
After updating apt database, We can install ocrmypdf using apt by running the following command:
sudo apt -y install ocrmypdf
Install ocrmypdf Using aptitude
If you want to follow this method, you might need to install aptitude first since aptitude is usually not installed by default on Debian. Update apt database with aptitude using the following command.
sudo aptitude update
After updating apt database, We can install ocrmypdf using aptitude by running the following command:
sudo aptitude -y install ocrmypdf
How To Uninstall ocrmypdf on Debian 9
To uninstall only the ocrmypdf package we can use the following command:
sudo apt-get remove ocrmypdf
Uninstall ocrmypdf And Its Dependencies
To uninstall ocrmypdf and its dependencies that are no longer needed by Debian 9, we can use the command below:
sudo apt-get -y autoremove ocrmypdf
Remove ocrmypdf Configurations and Data
To remove ocrmypdf configuration and data from Debian 9 we can use the following command:
sudo apt-get -y purge ocrmypdf
Remove ocrmypdf configuration, data, and all of its dependencies
We can use the following command to remove ocrmypdf configurations, data and all of its dependencies, we can use the following command:
sudo apt-get -y autoremove --purge ocrmypdf
Dependencies
ocrmypdf have the following dependencies:
- ghostscript
- icc-profiles-free
- liblept5
- python3-pil
- python3-pkg-resources
- python3-reportlab
- python3-ruffus
- python3-ruffus
- qpdf
- tesseract-ocr
- unpaper
- zlib1g
- python3-cffi-backend-api-min
- python3-cffi-backend-api-max
- python3-img2pdf
- python3-pypdf2
- python3
References
Summary
In this tutorial we learn how to install ocrmypdf package on Debian 9 using different package management tools: apt, apt-get and aptitude.