How to loop through all the other pages in a pdf using python

codeing · May 16, 2023

How to loop through all the other pages and check if the same QR code is present

Code:

import PyPDF2
from pyzbar.pyzbar import decode
from PIL import Image

# Open the PDF file and get the first page
pdf_file = open('file.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)

# Convert the first page to an image
page_image = page.getPixmap().getImage()

# Decode the QR code from the first page image
qr_code = decode(page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
    page = pdf_reader.getPage(i)
    page_image = page.getPixmap().getImage()
    if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
        print(f"The QR code {qr_code} is not present on page {i+1}")
        break
else:
    print("The QR code is present in all pages")
    
pdf_file.close()

In my code the loop section is not working

Traceback (most recent call last):
File "C:\Users\php\PycharmProjects\newforum\new.py", line 19, in <module>
page_image = page.getPixmap().getImage()
^^^^^^^^^^^^^^
AttributeError: 'PageObject' object has no attribute 'getPixmap'

thejz · May 16, 2023

It seems that there's an error in your code where you're trying to call the getPixmap() method on a PageObject object. The getPixmap() method is not available in the PyPDF2 library.

To extract images from PDF pages, you can use the pdf2image library. You need to install the library first by running pip install pdf2image. Once installed, you can modify your code as follows:

import PyPDF2
from pyzbar.pyzbar import decode
from PIL import Image
from pdf2image import convert_from_path

# Open the PDF file and get the first page
pdf_file = open('file.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)

# Convert the first page to an image
images = convert_from_path('file.pdf', first_page=1, last_page=1)
page_image = images[0]

# Decode the QR code from the first page image
qr_code = decode(page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
page = pdf_reader.getPage(i)
images = convert_from_path('file.pdf', first_page=i+1, last_page=i+1)
page_image = images[0]
if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
print(f"The QR code {qr_code} is not present on page {i+1}")
break
else:
print("The QR code is present in all pages")

pdf_file.close()
In the modified code, I've used the convert_from_path() function from pdf2image to convert the PDF pages to images. Make sure you have the necessary dependencies installed for pdf2image to work correctly.

thugbunny · May 20, 2023

The issue here seems to be the method you're trying to use to convert a PDF page into an image. The PyPDF2 library does not have a built-in method for converting PDF pages into images, which is why you're seeing the AttributeError: 'PageObject' object has no attribute 'getPixmap'.

To convert a PDF page into an image, you can use the pdf2image library. Here is how you can modify your code to use it:

Code:

from PyPDF2 import PdfFileReader
from pdf2image import convert_from_path
from pyzbar.pyzbar import decode
from PIL import Image

# Open the PDF file and get the first page
pdf_file = 'file.pdf'
pdf_reader = PdfFileReader(open(pdf_file, 'rb'))

# Convert the first page to an image
images = convert_from_path(pdf_file, dpi=200, first_page=1, last_page=1)
first_page_image = images[0]

# Decode the QR code from the first page image
qr_code = decode(first_page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
    images = convert_from_path(pdf_file, dpi=200, first_page=i+1, last_page=i+1)
    page_image = images[0]

    if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
        print(f"The QR code {qr_code} is not present on page {i+1}")
        break
else:
    print("The QR code is present in all pages")

This modified version of your script should work as expected, provided you have the necessary libraries installed. If not, you can install them using pip:

Code:

pip install PyPDF2 pdf2image pyzbar pillow

Note that pdf2image requires poppler-utils to be installed in the system. You can install it via your package manager. For Ubuntu, you can use:

Code:

sudo apt-get install -y poppler-utils

For Windows, you can download the latest binaries from here. Extract the files into a directory and add that directory to your system PATH.

codeing · May 25, 2023

The code posted earlier is working but the time taken to work is very slow. now if we have to do 45 pages in a loop it takes about 8 minutes.
how can i speed up the loop and run the code fast .

Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
Qr code read	2	May 8, 2023
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
Loop through record in datareader in aspx for textbox other item	1	Sep 26, 2023
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
Is there a way to get a single mode using all the points within a 2D array?	2	Oct 17, 2022
How to not load an insanely big dataset in less than 50 hrs	1	Sep 2, 2023

How to loop through all the other pages in a pdf using python

codeing

thejz

thugbunny

codeing

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads