Innovations & Integrations (Community of Practice)

Thursday 13 May 2021

Reduce all pdf file size in a directory with PDFNetPython

# '#' hash and delimiter (“””) symbol in Python script is meant for statements without functions, it serves as an explanation or notes you would like to add-on
# Install Ananconda 3 from https://www.anaconda.com/products/individual
# What is anaconda:- Anaconda® is a package manager, an environment manager, a Python/R data science distribution, and a collection of over 7,500+ open-source packages. Anaconda is free and easy to install, and it offers free community support.
# Install Python3 from https://www.python.org/downloads/
# Goto window command prompt - type 'anaconda' to call out anaconda prompt
# at anaconda prompt - type 'pip install Jupyter' (install Jupyter Notebook)
# at anaconda prompt - type 'pip install PDFNetPython3' (install PDFNetPython3)
# Update all packages - type 'conda update --all' (Update all packages)
# at anaconda prompt - type 'Jupyter Notebook' (Call out Jupyter Notebook in preferred browser e.g. google chrome)
# at Jupyter Notebook - click 'new' > Python3 > copy paste Python Script in blue below into the Jupyter window.
# ------------------------------------

# you can also run scripts in a web browser: https://colab.research.google.com/notebooks/intro.ipynb#scrollTo=IqIkXkLmG7uS

# ------------------------------------
# The script below is aimed to reduce pdf file size in a folder

import site
import sys
from os import *
# to test iteration by printing/showing the items
# import pprint
# pprint is just a python package to print the items nicely line by line
from PDFNetPython3 import *

input_path = 'C:/Directory_Input/'
output_path = 'C:/Directory_Output/'
#input_filename = 'filename'
pdf_files = [pdf for pdf in listdir(input_path)]

# pdf_files is a list
# pprint.pprint (pdf_files) 
# to create a list with all items in the folder and use for loop to iterate the function()

# create a function
def optimise():

    PDFNet.Initialize()
    doc = PDFDoc(input_path + pdf)
    doc.InitSecurityHandler()

    image_settings = ImageSettings()
    image_settings.SetCompressionMode(ImageSettings.e_jpeg)
    image_settings.SetQuality(1)
    image_settings.SetImageDPI(144,96)
    image_settings.ForceRecompression(True)

    Optimizer.Optimize(doc)
    doc.Save(output_path + pdf, SDFDoc.e_linearized)
    doc.Close()

# iterate all items with the created function
for pdf in pdf_files:
    # pdf is items in the pdf_files list
    # pprint.pprint (pdf)
    
    optimise()

Source and reference: 
Related links:

No comments:

Post a Comment