Help with regex

R

Robert Dailey

Hey guys,

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?
 
M

MRAB

Robert said:
Hey guys,

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?

The character class \d is equivalent to [0-9], and ',' isn't a special
character so it doesn't need to be escaped:

re.compile(r'FILEVERSION (?:\d+,){3}\d+')
 
A

alex23

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION 1,45,10082,3

Would it be easier to do it without regex? The following is untested
but I would probably do it more like this:

TOKEN = 'FILEVERSION '
for line in file:
if line.startswith(TOKEN):
version = line[len(TOKEN):]
maj, min, rev, other = version.split(',')
break # if there's only one occurance, otherwise do stuff here
 
R

Roman

Hey guys,

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?

Since there cannot be more than one "end of string" you can try this
expression:
re.compile( r'FILEVERSION (?:[0-9]+(,|$)){4}' )
 
R

Robert Dailey

Robert said:
Hey guys,
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:
FILEVERSION #,#,#,#
The # symbol represents any number that can be any length 1 or
greater. Example:
FILEVERSION 1,45,10082,3
The regex should only match the exact above. So far here's what I have
come up with:
re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?

The character class \d is equivalent to [0-9], and ',' isn't a special
character so it doesn't need to be escaped:

     re.compile(r'FILEVERSION (?:\d+,){3}\d+')

But ',' is a special symbol It's used in this way:
{0,3}

This will match the previous regex 0-3 times. Are you sure commas need
not be escaped?

In any case, your suggestions help to clean it up a bit!
 
M

MRAB

Robert said:
Robert said:
Hey guys,
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:
FILEVERSION #,#,#,#
The # symbol represents any number that can be any length 1 or
greater. Example:
FILEVERSION 1,45,10082,3
The regex should only match the exact above. So far here's what I have
come up with:
re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?
The character class \d is equivalent to [0-9], and ',' isn't a special
character so it doesn't need to be escaped:

re.compile(r'FILEVERSION (?:\d+,){3}\d+')

But ',' is a special symbol It's used in this way:
{0,3}

This will match the previous regex 0-3 times. Are you sure commas need
not be escaped?

In any case, your suggestions help to clean it up a bit!

By 'special' I mean ones like '?', '*', '(', etc. ',' isn't special in
that sense.

In fact, the {...} quantifier is special only if it's syntactically
correct, otherwise it's just a literal, eg "a{," and a{} are just
literals.
 
R

Robert Dailey

Hey guys,
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:
FILEVERSION #,#,#,#
The # symbol represents any number that can be any length 1 or
greater. Example:
FILEVERSION 1,45,10082,3
The regex should only match the exact above. So far here's what I have
come up with:
re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?

Since there cannot be more than one "end of string" you can try this
expression:
re.compile( r'FILEVERSION (?:[0-9]+(,|$)){4}' )

I had thought of this but I can't use that either. I have to assume
that someone was silly and put text at the end somewhere, perhaps a
comment. Like so:

FILEVERSION 1,2,3,4 // This is the file version

It would be nice if there was a type of counter for regex. So you
could say 'match only 1 [^,]' or something like that...
 
N

Nobody

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:

(0|[1-9][0-9]*)

is more robust.
 
E

Ethan Furman

Nobody said:
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )


[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:

(0|[1-9][0-9]*)

is more robust.

You make a good point about possibly being undesirable, but I question
the assertion that your solution is /more robust/. If the OP
wants/needs to match numbers even with leading zeroes your /more robust/
version fails.

~Ethan~
 
J

John Machin

Nobody said:
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:
FILEVERSION #,#,#,#
The # symbol represents any number that can be any length 1 or
greater. Example:
FILEVERSION 1,45,10082,3
The regex should only match the exact above. So far here's what I have
come up with:
re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:
   (0|[1-9][0-9]*)
is more robust.

You make a good point about possibly being undesirable, but I question
the assertion that your solution is /more robust/.  If the OP
wants/needs to match numbers even with leading zeroes your /more robust/
version fails.

I'd go further: the OP would probably be better off matching anything
that looked vaguely like an attempt to produce what he wanted e.g.
r"FILEVERSION\s*[0-9,]{3,}" and then taking appropriate action based
on whether that matched a "strictly correct" regex.
 
N

Nobody

[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:

(0|[1-9][0-9]*)

is more robust.

You make a good point about possibly being undesirable, but I question
the assertion that your solution is /more robust/. If the OP
wants/needs to match numbers even with leading zeroes your /more robust/
version fails.

Well, the OP did say:
The regex should only match the exact above.

I suppose that it depends upon the definition of "exact" ;)

More seriously: failing to produce an error when one is called for is also
a bug.

Personally, unless I knew for certain that the rest of the program would
handle leading zeros correctly (e.g. *not* interpreting the number as
octal), I would try to reject it in the parser. It's usually much easier
to determine the cause of an error raised by the parser than if you allow
bogus data to propagate deep into the program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top