Regular expression for different date formats in Python

U

undesputed.hackerz

Hello Developers,

I am a beginner in python and need help with writing a regular expression for date and time to be fetched from some html documents. In the following code I am walking through the html files in a folder called event and printing the headings with h1 tag using beautifulsoup. These html pages also contains different formats of date and time. I want to fetch and display this information as well. Different formats of date in these html documents are:

21 - 27 Nov 2012
1 Dec 2012
30 Nov - 2 Dec 2012
26 Nov 2012

Can someone help me out with fetching these formats from these html documents ?
Here is my code for walking through the files and fetching h1 from those html files:


Code:


import re
import os
from bs4 import BeautifulSoup

for subdir, dirs, files in os.walk("/home/himanshu/event/"):
for fle in files:
path = os.path.join(subdir, fle)
soup = BeautifulSoup(open(path))

print (soup.h1.string)

#Date and Time detection
 
M

Michael Torrie

V

Vlastimil Brom

2012/11/26 said:
Hello Developers,

I am a beginner in python and need help with writing a regular expressionfor date and time to be fetched from some html documents. In the followingcode I am walking through the html files in a folder called event and printing the headings with h1 tag using beautifulsoup. These html pages also contains different formats of date and time. I want to fetch and display thisinformation as well. Different formats of date in these html documents are:

21 - 27 Nov 2012
1 Dec 2012
30 Nov - 2 Dec 2012
26 Nov 2012

Can someone help me out with fetching these formats from these html documents ?
Here is my code for walking through the files and fetching h1 from those html files:


Code:


import re
import os
from bs4 import BeautifulSoup

for subdir, dirs, files in os.walk("/home/himanshu/event/"):
for fle in files:
path = os.path.join(subdir, fle)
soup = BeautifulSoup(open(path))

print (soup.h1.string)

#Date and Time detection

Hi,
the following pattern seems to match all of your examples,

(\d{1,2} )?(Nov|Dec)?( ?- )?(\d{1,2}) (Nov|Dec) (\d{4})

however, it doesn't look like very robust - of course, you have to add
the remaining months' abbreviations and check on the (parts of the)
HTML documents, you are interested in.

hth,
vbr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top