Python pyPDF4 code to bookmark pdf based upon date text

Stan Denman · Jan 18, 2023

I am trying to create a python function that will transform a pdf file. Specifically I want to create bookmarks based upon text in the pdf. The pdf is a set of medical treatment notes from various dates. This particular set of medical records sets out the treatment date in the header in the form of "Visit date: ##/##/####". So I want to create a bookmark for the treatment records for 01/01/2022 for a bookmark titled 01/01/2022, etc.

This code runs and creates a new pdf file, "NewPDF2.pdf", that is identifical to "NewMedical.pdf". However, there are no bookmarks. Cannot figure out what I am doing wrong.

Code:

import PyPDF4
import re
 
# Open the PDF file for reading
pdf_file = open(r"C:\Users\StanleyDenman\Documents\NewMedical.pdf", 'rb')
pdf_reader = PyPDF4.PdfFileReader(pdf_file)
pdf_writer = PyPDF4.PdfFileWriter()
 
 
# Define the regular expression for finding the bookmark locations
regex = re.compile("Visit date: '\b\d{2}/\d{2}/\d{4}\b'")
 
# Iterate through the pages of the PDF
for i in range(len(pdf_reader.pages)):
    page = pdf_reader.getPage(i)
    text = page.extractText()
    matches = re.findall(regex, text)
    pdf_writer.addPage(page)
    
 
for match in matches:
   #  Create a bookmark for each match
       bookmark = PyPDF4.pdf.Destination()
       bookmark.page = pdf_writer.addpage()
       bookmark.title = match.group(1)
    
 
# Write the new PDF file with bookmarks
output_file = open('NewPDF2.pdf', 'wb')
pdf_writer.write(output_file)
output_file.close()
pdf_file.close()

Kuncode · Jan 30, 2023

This code appears to be incorrect in several places. Here are some of the issues:

The regular expression to find the visit dates is incorrect. It should be regex = re.compile("Visit date: \d{2}/\d{2}/\d{4}").
The group method should not be used in the line bookmark.title = match.group(1). Instead, you can use the match result directly: bookmark.title = match.
You are creating a new page for each match using pdf_writer.addpage(), but you are not adding any content to it. This could result in an error.
You should also set the bookmark destination using bookmark.dest = pdf_writer.getPage(i).

Here's a corrected version of the code:

Python:

import PyPDF4
import re
 
# Open the PDF file for reading
pdf_file = open(r"C:\Users\StanleyDenman\Documents\NewMedical.pdf", 'rb')
pdf_reader = PyPDF4.PdfFileReader(pdf_file)
pdf_writer = PyPDF4.PdfFileWriter()
 
# Define the regular expression for finding the bookmark locations
regex = re.compile("Visit date: \d{2}/\d{2}/\d{4}")
 
# Iterate through the pages of the PDF
for i in range(len(pdf_reader.pages)):
    page = pdf_reader.getPage(i)
    text = page.extractText()
    matches = re.findall(regex, text)
    pdf_writer.addPage(page)
    
    for match in matches:
        # Create a bookmark for each match
        bookmark = PyPDF4.pdf.Destination()
        bookmark.title = match
        bookmark.dest = pdf_writer.getPage(i)
        pdf_writer.addBookmark(bookmark)
    
# Write the new PDF file with bookmarks
output_file = open('NewPDF2.pdf', 'wb')
pdf_writer.write(output_file)
output_file.close()
pdf_file.close()

How to loop through all the other pages in a pdf using python	3	May 15, 2023
Insert replace text based on a name in other file python script	4	Mar 5, 2025
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
Select Eof extension files based on text list of filenames with if condition	1	May 4, 2022
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Php combine identical lines in text file	4	Oct 11, 2023
Help with code	0	Jun 11, 2022

Python pyPDF4 code to bookmark pdf based upon date text

Stan Denman

Kuncode

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads