How to loop through all the other pages in a pdf using python

Joined
May 8, 2023
Messages
5
Reaction score
0
How to loop through all the other pages and check if the same QR code is present

Code:
import PyPDF2
from pyzbar.pyzbar import decode
from PIL import Image

# Open the PDF file and get the first page
pdf_file = open('file.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)

# Convert the first page to an image
page_image = page.getPixmap().getImage()

# Decode the QR code from the first page image
qr_code = decode(page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
    page = pdf_reader.getPage(i)
    page_image = page.getPixmap().getImage()
    if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
        print(f"The QR code {qr_code} is not present on page {i+1}")
        break
else:
    print("The QR code is present in all pages")
    
pdf_file.close()

In my code the loop section is not working

Traceback (most recent call last):
File "C:\Users\php\PycharmProjects\newforum\new.py", line 19, in <module>
page_image = page.getPixmap().getImage()
^^^^^^^^^^^^^^
AttributeError: 'PageObject' object has no attribute 'getPixmap'
 
Joined
May 16, 2023
Messages
1
Reaction score
0
It seems that there's an error in your code where you're trying to call the getPixmap() method on a PageObject object. The getPixmap() method is not available in the PyPDF2 library.

To extract images from PDF pages, you can use the pdf2image library. You need to install the library first by running pip install pdf2image. Once installed, you can modify your code as follows:

import PyPDF2
from pyzbar.pyzbar import decode
from PIL import Image
from pdf2image import convert_from_path

# Open the PDF file and get the first page
pdf_file = open('file.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)

# Convert the first page to an image
images = convert_from_path('file.pdf', first_page=1, last_page=1)
page_image = images[0]

# Decode the QR code from the first page image
qr_code = decode(page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
page = pdf_reader.getPage(i)
images = convert_from_path('file.pdf', first_page=i+1, last_page=i+1)
page_image = images[0]
if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
print(f"The QR code {qr_code} is not present on page {i+1}")
break
else:
print("The QR code is present in all pages")

pdf_file.close()
In the modified code, I've used the convert_from_path() function from pdf2image to convert the PDF pages to images. Make sure you have the necessary dependencies installed for pdf2image to work correctly.
 
Joined
Jan 8, 2023
Messages
27
Reaction score
2
The issue here seems to be the method you're trying to use to convert a PDF page into an image. The PyPDF2 library does not have a built-in method for converting PDF pages into images, which is why you're seeing the AttributeError: 'PageObject' object has no attribute 'getPixmap'.

To convert a PDF page into an image, you can use the pdf2image library. Here is how you can modify your code to use it:
Code:
from PyPDF2 import PdfFileReader
from pdf2image import convert_from_path
from pyzbar.pyzbar import decode
from PIL import Image

# Open the PDF file and get the first page
pdf_file = 'file.pdf'
pdf_reader = PdfFileReader(open(pdf_file, 'rb'))

# Convert the first page to an image
images = convert_from_path(pdf_file, dpi=200, first_page=1, last_page=1)
first_page_image = images[0]

# Decode the QR code from the first page image
qr_code = decode(first_page_image)[0].data.decode('utf-8')

# Loop through all the other pages and check if the same QR code is present
for i in range(1, pdf_reader.getNumPages()):
    images = convert_from_path(pdf_file, dpi=200, first_page=i+1, last_page=i+1)
    page_image = images[0]

    if qr_code not in [code.data.decode('utf-8') for code in decode(page_image)]:
        print(f"The QR code {qr_code} is not present on page {i+1}")
        break
else:
    print("The QR code is present in all pages")

This modified version of your script should work as expected, provided you have the necessary libraries installed. If not, you can install them using pip:
Code:
pip install PyPDF2 pdf2image pyzbar pillow

Note that pdf2image requires poppler-utils to be installed in the system. You can install it via your package manager. For Ubuntu, you can use:
Code:
sudo apt-get install -y poppler-utils

For Windows, you can download the latest binaries from here. Extract the files into a directory and add that directory to your system PATH.
 
Joined
May 8, 2023
Messages
5
Reaction score
0
The code posted earlier is working but the time taken to work is very slow. now if we have to do 45 pages in a loop it takes about 8 minutes.
how can i speed up the loop and run the code fast .
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top