Python re repetative matching

Rich · Dec 22, 2003

Im new to regex's and cant quite figure out how to get them to work, what
I want is a tuple of all the matches from the regex. Ive simplified my
actual problem and still cant get it to work

Ive so far got this:

print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )

This dose exactly what I want, except it matches both matches each time,
so I end up with a list full tuples each with blank elements.... so close

I also tried my orginal idea

a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
print a.groups()

This matches the number and the first word, so I thought the following
should rematch after the first word and give me what I wanted... but it
dosent for some reason

a = re.match( r'(@\d+)\s+(?

\w+)\s*)', "@5489 heel all and thumb toe" )
print a.groups()

This is my next iteration, still gives me the number (first group) and
only the word (the second match). So I extend it to ...

a = re.match( r'(@\d+)\s+(?

\w+)\s*)*', "@5489 heel all and thumb toe" )
print a.groups()

Now this gives me the number and the last but one word ? WHY!

My logic suggests that this should do what I want... what am I missing,
Ive spent all night trying to figure this out.

Cheers

Rich

Francis Avila · Dec 23, 2003

Rich wrote in message ...

Im new to regex's and cant quite figure out how to get them to work, what
I want is a tuple of all the matches from the regex. Ive simplified my
actual problem and still cant get it to work

For the following answers I assume you only feed one line at a time. (If
this is an unacceptable restriction, things get uglier.)

First, try and think if you need re's. Re's are always last resort. In
this particular case, it seems to me that

s = "@5489 heel all and thumb toe"
s.split(' ', 1)

is all you need. If you need more precision (and the digit sequence is
always 4 chars long), the basic pattern is as follows:

re.split(r'(?<=@\d{4}) (?=.*)', s)

Ive so far got this:
print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )

You need nongrouping parens, and \w+ will split words.

Split to digits and words, discarding nothing:
re.findall(r'(?

\d{4})|(?:.+)', s)

Split each item separately, discarding whitespace.
re.findall(r'(?

\d{4})|(?:\w+)', s)

I also tried my orginal idea

a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
print a.groups()

re.match( r'(@\d+) (.+)', s ).groups()

This matches the number and the first word, so I thought the following
should rematch after the first word and give me what I wanted... but it
dosent for some reason

It doesn't because '\w' means 'words', i.e. [1-9a-zA-Z_]. It doesn't match
spaces, so once it comes up against a space, it stops.

a = re.match( r'(@\d+)\s+(?\w+)\s*)', "@5489 heel all and thumb toe" )
print a.groups()

So you do know about nongrouping parens? Anyway, this doesn't match after
the first word because it only matches words, not spaces.

This is my next iteration, still gives me the number (first group) and
only the word (the second match). So I extend it to ...

a = re.match( r'(@\d+)\s+(?\w+)\s*)*', "@5489 heel all and thumb toe" )
print a.groups()

Now this gives me the number and the last but one word ? WHY!

Because * does not magically make new groups. It seems to me it should
match the last word, though, instead of next-to-last, but I won't think
about it too much because this re is hideous as it is, and shouldn't be
used.

My logic suggests that this should do what I want... what am I missing,
Ive spent all night trying to figure this out.

Your first error was using regular expressions:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions". Now they have two problems.' --Jamie Zawinski,
comp.lang.emacs

Use string methods, especially split().

Also, I am no longer sure whether you want all items/words to be groups
separately, or if you want one group of numbers, and the rest words. Either
one is trivial for string methods:

s.split() for each in a group.
s.split(' ', 1) for only two groups.

However, the first one is impossible for REs (I think) if the number of
groups is variable, and ugly if the number of groups is fixed. The second
one I've done ad nauseum here.

See the RE Howto:
http://www.amk.ca/python/howto/regex/

Also, there's an O'Reilly book "Mastering Regular Expressions" which is said
to be excellent. Also Mertz wrote a "Text Processing with Python" (or
something like that) which is also said to be excellent. Mertz also has a
bunch of online columns on Python, all of which are very good. But my guess
is that you don't really need any of these.

Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Python battle game help	2	Feb 23, 2023
Why is Python telling me variable is local not global?	3	Sep 2, 2023
Re for Apache log file format	4	Oct 8, 2013
re Questions	9	Jan 26, 2014
Python client/server that reads HTML body from server	1	Apr 12, 2023
Python AI chatbot problem, can you help me?	1	Jan 29, 2023
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022

Python re repetative matching

Rich

Francis Avila

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads