03 - Open Source (EN)03d - Phython (EN)

Merge (append) several PDFs into one (Python on i)

Last Updated on 8 December 2019 by Roberto De Pedrini

Last night I got this request from one of my customers: “I’ve got some PDFs in my IFS, I need to merge them in one file PDF. I found some Java examples googling “merge PDF documents” ( https://www.mcpressonline.com/forum/forum/application-software/general/17858-merge-pdf-documents) but I got some errors. Do you know how to fix or any alternative? “.

I state that I have never learned to program in Java and I have a certain refusal towards Java … I am much more attracted by the new Python languages or even Node.js and so I do some searches on Google and a world opens up to me!

Since only one week has passed since the last Faq400 “Hands-on Open Source & Modernization tools” event on November 26th, I take this opportunity to address the customer’s request using the “new world” Open Source that opens up to IBM i with Python, Node .js etc.

Google gave me different options … I chose one, PyPDF2 it looks like simple and I can find some examples. I’m not a Python expert, I just played with some online courses … but I can do it!

It’s Saturday morning … I’m waiting for the sun to run with a wife and dog, I have about an hour before the sun rises. Have you ever run in the woods in the cold autumn leaves when the first sun melts the night frost quickly? It’s worth it! You won’t believe it but it’s nicer than writing code in Python or RPG.

The frost tells us the thousand moments of the night that have crystallized
(Fabrizio Caramagna)

I have an hour of time, the dog starts to get excited because he sees me dressed for the race (and it’s not a pretty sight, the programmer’s belly is even more highlighted by the yellow Decathlon jersey and the tight racing pants … but the dog doesn’t care and wants to go!).

I need to install Python on the client’s IBM i, download PyPDF2, test everything and make a CL to retrieve it from the “normal” client applications.

Ok, it could work!

Premise: The sources of this post are available on GitHub

The sources of this post are available in this public Github project: https://github.com/Faq400Git/IFS_PDFUnion

Here is step by step the history of the work that sees the activation of SSH, the installation of the ACS bases for Open Source, the installation of Python and the tools to work with PDF … almost child’s play. Let’s see:

(1) Start SSHD Server and setting for Autostart

As soon as I try to connect to the IBM i partition with the management of ACS Open Source Packages I immediately encounter a small problem: the SSHD server is not started and for Open Source it is a prerequisite. I start it immediately with the command STRTCPSVR:

STRTCPSVR SERVER (* SSHD)

Since SSH can always serve and IBM i restarts every Sunday I prefer to set up the SSHD daemon in Autostart: Apro Navigator for i starting from the ACS link and go to set the SSHD server autostart:

  • Start Navigator for i
  • I choose Network – Server – TCP Server
  • I select SSHD
  • From actions I go to “Properties”
  • “Start when TCP / IP starts”

(2) Open Source Package Manager of ACS Access Client Solution

Never having installed the Package Manager for ACS Open Source … I am asked to install it … I do this by following the clear ACS instructions, a very simple procedure if you are at a good level of the IBM i operating system (let’s say from 7.2 onwards!).

(3) Installation of Python3

Finally I can install Python3 … very simple operation with the management of ACS Open Source Packages, it is sufficient to perform the “Open Source Package Management” action, select the “Available Packages” tab, select Python3 and Install.

I do the same thing with the python3-pip package that I need to install the Python packages (in this case PyPDF2).

I thank IBM for having finally adopted RPG and YUM for the management of Open Source packages: the old 5733OPS product was very similar to the other IBM LICPGMs but it didn’t lend itself at all to the world of Open Source where there are continuous updates and every day they come out interesting new things.

Thanks to the laboratory run by Jesse Gorzinski for his excellent work!

The Open Source packages installed with this interface are stored in the “/ QOpenSys / pkgs / bin” folder … generally not reachable with the default path of SSH users: in order to recall Python3 or other Open Source installed in this way, it is convenient to add that folder to the user’s path and can be done only for the current session with these two instructions

PATH = / QOpenSys / pkgs / bin: $ PATH
export PATH

Or insert these two lines directly into the .profile of the user (~ / .profile … generally something like /home/mysuser/.profile). If it does not exist we can create it (in the following case I use it from an SSH environment but we could use EDTF from an IBMi environment

vi ~ / .profile

then we insert the two lines

PATH = / QOpenSys / pkgs / bin: $ PATH
export PATH

or from an IBM i environment
edtf '/home/myUser/.profile'

(4) Access with Putty SSH and installation of PYPDF2 tools for managing PDF with Python

I continue with the installation of the PyPDF2 package for managing PDF with Python using PIP3, the Python Package Manager:

  • I connect with SSH
  • I create a / home / openSource folder
  • I enter the / home / openSource folder
  • I install PyPDF2 using the PIP3 install command
  • I have just installed Python3-pip and it already tells me that I should also update “pip” … I also do this update.
bash
cd / home
mkdir openSource
cd openSource
pip3 install PyPDF2
pip3 install --upgrade pip

(5) Python Script and Merge-Append test of two PDFs in one

I retrieve from Google an example of using PyPDF2 to execute PDF Split and Merge and write my little Python script with some changes … a breeze … load any two pdf files from my pc into an IFS test folder and try the program … some adjustment but in a short time everything works.

Here is the Python script (available on Github https://github.com/Faq400Git/IFS_PDFUnion, copy it from there)

# pdfUnion.py
# Here how to use this script
# python3 /home/openSource/pdfUnion.py / home / openSource / pdfUnion / test / outputfile.pdf PDFNR1.pdf PDFNR2.pdf

# Some import (like / copy and / include in RPG)
import sys
import os
import PyPDF2

 merger = PyPDF2.PdfFileMerger ()
# Here are my input parms
path = sys.argv[1]
out_file = sys.argv[2]
in_files = sys.arg[3:]v # A list of PDFs from the third parm onwards
 
# Set the path for input and output files
os.chdir (path)

# Loop into in_files (two or more files)
for pdf in in_files:
     try:
         #if doc exist then merge
         if os.path.exists (pdf):
             input = PyPDF2.PdfFileReader (open (pdf, 'rb'))
             merger.append ((input))
         else:
             print (f "problem with file {pdf}")

     except:  
        print ("cant merge !! sorry") 

     else:         
        print (f "{pdf} Merged !!!")

# write output file on disk
merger.write (out_file)

# end

I try to recall the script by feeding it the two PDFs just loaded and I get an outfile.pdf which is exactly the union of the two!

 python3 /home/openSource/pdfUnion.py / home / openSource / pdfUnion / test / outputfile.pdf PDFNR1.pdf PDFNR2.pdf 

(6) I prepare a CL to call the Python script … which I will leave to the customer to call from its applications

Here is the CLP that calls the Python script via Qshell (this source is also available on Github https://github.com/Faq400Git/IFS_PDFUnion … copy it from there that there are no font conversion problems like on WordPress posts!)

/ * --------------------------------------------- * /
/ * pdfunion (path outfile infile1 infile2) * /
/ * Call a python script for the union of two pdf * /
/ * in a given IFS folder passed in the path * /
/ * --------------------------------------------- * /
PGM PARM (& path & outfile & infile1 & infile2)
  DCL VAR (& path) TYPE (* CHAR) LEN (030)          
  DCL VAR (& outfile) TYPE (* CHAR) LEN (030)          
  DCL VAR (& infile1) TYPE (* CHAR) LEN (030)          
  DCL VAR (& infile2) TYPE (* CHAR) LEN (030)          
  DCL var (& cmd) type (* char) len (256)
 / * Create call command * /
      CHGVAR VAR (& CMD) VALUE ('/ QOpenSys / pkgs / bin / python3' * BCAT +
                 '/home/openSource/pdfunion/pdfUnion.py' * BCAT% TRIM (& PATH) * BCAT +
                  % TRIM (& OUTFILE) * BCAT% TRIM (& INFILE1) * BCAT% TRIM (& INFILE2))

 / * Call Python script * /
       QSH CMD (& cmd)

ENDPGM 

I try to call my CL

call pdfunion parm ('/ home / openSource / pdfunion / test' 'outfile2.pdf' 'PDFNR1.pdf' 'PDFNR2.pdf')                                                             

It works the first time … great!

Conclusion

I assure you that I lost much more time writing the post, putting it on Github, fixing it and documenting it with respect to the time taken to obtain the result desired by the client.

The project can, of course, be improved … does its job but has no proper error handling … some error messages should be handled by the Python script and then intercepted in the Standard Output by the CL (QIBM_QSH_CMD_OUTPUT or other roads).

What must absolutely be said is that the world of Open Source on IBM i opens a thousand ways to solve problems of this type …. you just have to start working on it.

And what are you doing? Have you tried to do something with Open Source?

I can finally go out for a run with my wife, my dog and my yellow T-shirt that highlights my belly … I put the picture from behind so you can’t see it!

Related Posts
DB2 for i SQL – String Manipulation – POSSTR-LOCATE-LOCATE_IN_STRING (EN)

Introduction Often, in our applications, we need to work with text strings, and DB2 SQL can come in very useful Read more

DB2 for i – FAQ & Howtos (EN)

DB2 Database and SQL ... maybe the most important things on IBM i platform: here's a collection of FAQs, tips Read more

IBM i 7.4 Announcement (En)

Comes directly with the Easter egg this IBM announcement for the news of the IBM i 7.4 version, iNext version Read more

Generated Always Columns (EN)

Introduction "Generated Always Column": are columns, table fields, filled by DB2 engine: something like columns with a default value but Read more

--- Roberto De Pedrini Faq400.com
About author

Founder of Faq400 Srl, IBM Champion, creator of Faq400.com and blog.faq400.com web sites. RPG developer since I was wearing shorts, strong IBM i supporter, I have always tried to share my knowledge with others through forums, events and courses. Now, with my company Faq400 Srl, I help companies to make the most of this great platform IBM i.

Leave a Reply

Your email address will not be published. Required fields are marked *