- Joined
- Jan 18, 2023
- Messages
- 1
- Reaction score
- 0
I am trying to create a python function that will transform a pdf file. Specifically I want to create bookmarks based upon text in the pdf. The pdf is a set of medical treatment notes from various dates. This particular set of medical records sets out the treatment date in the header in the form of "Visit date: ##/##/####". So I want to create a bookmark for the treatment records for 01/01/2022 for a bookmark titled 01/01/2022, etc.
This code runs and creates a new pdf file, "NewPDF2.pdf", that is identifical to "NewMedical.pdf". However, there are no bookmarks. Cannot figure out what I am doing wrong.
This code runs and creates a new pdf file, "NewPDF2.pdf", that is identifical to "NewMedical.pdf". However, there are no bookmarks. Cannot figure out what I am doing wrong.
Code:
import PyPDF4
import re
# Open the PDF file for reading
pdf_file = open(r"C:\Users\StanleyDenman\Documents\NewMedical.pdf", 'rb')
pdf_reader = PyPDF4.PdfFileReader(pdf_file)
pdf_writer = PyPDF4.PdfFileWriter()
# Define the regular expression for finding the bookmark locations
regex = re.compile("Visit date: '\b\d{2}/\d{2}/\d{4}\b'")
# Iterate through the pages of the PDF
for i in range(len(pdf_reader.pages)):
page = pdf_reader.getPage(i)
text = page.extractText()
matches = re.findall(regex, text)
pdf_writer.addPage(page)
for match in matches:
# Create a bookmark for each match
bookmark = PyPDF4.pdf.Destination()
bookmark.page = pdf_writer.addpage()
bookmark.title = match.group(1)
# Write the new PDF file with bookmarks
output_file = open('NewPDF2.pdf', 'wb')
pdf_writer.write(output_file)
output_file.close()
pdf_file.close()