Regular expression problem

A

abranches

Hello everyone.

I'm having a problem when extracting data from HTML with regular
expressions.
This is the source code:

You are ready in the next<br /><span id="counter_jt_minutes"
style="display: inline;"><span id="counter_jt_minutes_value">12</
span>M</span> <span id="counter_jt_seconds" style="display:
inline;"><span id="counter_jt_seconds_value">48</span>S</span>

And I need to get the remaining time. Until here, isn't a problem
getting it, but if the remaining time is less than 60 seconds then the
source becomes something like this:

You are ready in the next<br /><span id="counter_jt_seconds"
style="display: inline;"><span id="counter_jt_seconds_value">36</
span>S</span>

I'm using this regular expression, but the minutes are always None...
You are ready in the next.*?(?:>(\d+)</span>M</span>)?.*?(?:>(\d+)</
span>S</span>)

If I remove the ? from the first group, then it will work, but if
there are only seconds it won't work.
I could resolve this problem in a couple of python lines, but I really
would like to solve it with regular expressions.

Thanks,
Pedro Abranches
 
M

MRAB

Hello everyone.

I'm having a problem when extracting data from HTML with regular
expressions.
This is the source code:

You are ready in the next<br /><span id="counter_jt_minutes"
style="display: inline;"><span id="counter_jt_minutes_value">12</
span>M</span> <span id="counter_jt_seconds" style="display:
inline;"><span id="counter_jt_seconds_value">48</span>S</span>

And I need to get the remaining time. Until here, isn't a problem
getting it, but if the remaining time is less than 60 seconds then the
source becomes something like this:

You are ready in the next<br /><span id="counter_jt_seconds"
style="display: inline;"><span id="counter_jt_seconds_value">36</
span>S</span>

I'm using this regular expression, but the minutes are always None...
You are ready in the next.*?(?:>(\d+)</span>M</span>)?.*?(?:>(\d+)</
span>S</span>)

If I remove the ? from the first group, then it will work, but if
there are only seconds it won't work.
I could resolve this problem in a couple of python lines, but I really
would like to solve it with regular expressions.
Your regex is working like this:

1. Match 'You are ready in the next'.
2. Match an increasing number of characters, starting with none
('.*?').
3. Try to match a pattern ('(?:>...)?') from where the previous step
left off. This doesn't match, but it's optional anyway, so continue to
the next step. (No characters consumed.)
4. Match an increasing number of characters, starting from none
('.*?'). It's this step that consumes the minutes.

It then goes on to match the seconds, and the minutes are always None
as you've found.

I've come up with this regex:

You are ready in the next(?:.*?>(\d+)</span>M</span>)?(?:.*?>(\d+)</
span>S</span>)

Hope that helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top