How to write this regular expression?

could ildg · May 4, 2005

I need a regular expression to check if a string matches it.
The string consists of one to there parts, each parts is a underline
followed by a number,
and the number of the first part should be 0~31, and numbers of other
parts should be
larger than 31.

The requested re should match the following strings:
_3
_5_33
_21
_29_50
_29_700_700000

And the re shouldn't match the following strings:
_3_5 the number of part 2 is less than 31
_43 the number of part 1 shouldn't be less than 31
_5_43_69_98 there shouldn't be more than 3 parts

How to write the re, please?
Thanks in advance.

George Sakkis · May 4, 2005

This newsgroup is in general very helpful, but there are some
exceptions; one of them is when the problem appears blatantly to be a
homework. Perhaps if you showed that you worked on it and made some
progress, but it's not quite right, someone may help you.

George

could ildg · May 4, 2005

Does it matter whether it is a homework?
Why do you look down upon homework?
Everyone can do his homework well without any problems in your logic?
It's a problem I met. I tried a lot and I can't work it out,
so I came here for help.
I saw someone complained that a question is too lengthy,
and I saw some questions were complained to be unclear,
but I never saw someone waste his time to judge if a question is a
homework. If this is natural, I'll pay attention from now on.

Heiko Wundram · May 4, 2005

Does it matter whether it is a homework?

Yes, it does matter. We're not your CS-class homework monkeys...

We're a
forum of Python programmers who aid each other at thinking about solutions,
we don't present solutions (normally), for a beautiful example of this, see
the thread about finding similarities between two wave files...

But, anyway, as an additional hint: the stuff you need to do _can_ be solved
by an RE (the language you're matching is actually regular if you impose
several restrictions), but I'd rather not do it that way. Programming a small
function which splits the string and then does the appropriate checks (by
using int) should be much easier and faster.

And in case you really need an RE, watch this monster (to match a single term
having numerical value >= 40)...

0*(([1-9][0-9]{2,})|([4-9][0-9]))

Matching numbers >= 31 isn't hard too, I leave this as an exercise to the
reader...

But beware, I'd guess this regex performs rather poorly with
respect to backtracking on erraneous input such as
"00000000000000000000000030"...

--- Heiko.

Antoon Pardon · May 4, 2005

Op 2005-05-04 said:
Does it matter whether it is a homework?

Yes, because if other do your homework for you, you wont
have learned anything from it.

Why do you look down upon homework?

Who says he does. That he is not willing to do your homework
for you, doesn't imply he looks down on it.

Everyone can do his homework well without any problems in your logic?

There is difference in asking for help on how to solve a
problem yourself and asking for the solution.

Peter Hansen · May 4, 2005

could said:
I need a regular expression to check if a string matches it.

Why do you think you need a regular expression?

If another approach that involved no regular expressions worked much
better, would you reject it for some reason?

-Peter

could ildg · May 4, 2005

I can tell you that this is not any homework at all,
I think it by myself.

I like this maillist, it helped me a lot. but some guys as you look
weird.

Yes, because if other do your homework for you, you wont
have learned anything from it.

Who says he does. That he is not willing to do your homework
for you, doesn't imply he looks down on it.

There is difference in asking for help on how to solve a
problem yourself and asking for the solution.

I read the document about re on python tommorow, and when I want to
use it to settle a problem, I found it not so easy, so I raise the question
here. I didn't say how I thought about it, because I don't want the question
to be too long. But a short question doesn't mean that I am too lazy
and I didn't even think about it. If you think I'm a kind of person
you hate to help,
you needn't.

could ildg · May 4, 2005

Thank you.

I just learned how to use re, so I want to find a way to settle it by
using re. I know that split it into pieces will do it quickly.

vkeyboard · May 4, 2005

Personally I'd use groups.

James Stroud · May 4, 2005

I saw someone complained that a question is too lengthy,
and I saw some questions were complained to be unclear,
but I never saw someone waste his time to judge if a question is a
homework. If this is natural, I'll pay attention from now on.

I think by participating in this list, most of the members have felt that they
have agreed to the following unofficial terms and conditions of use:

http://www.catb.org/~esr/faqs/smart-questions.html

The interesting thing is that those who follow the letter most strictly are
usually the best ones to ask. Moreover, most members of this list are usually
looking for any excuse to compose a regular expression. In fact, they
probably come up with an answer before they make any assessments about
homework.

I can tell you that this is not any homework at all,
I think it by myself.

In that case, your question is free game:

r = re.compile(r"_[0-3]\d?(_\d\d?){0,2}")

r.search('_29_700_700000')

Click to expand...

r.search('_29_50')

Click to expand...

r.search('_5_33')

Click to expand...

r.search('_500')

Click to expand...

Click to expand...

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Jeremy Bowers · May 4, 2005

Thank you.

I just learned how to use re, so I want to find a way to settle it by
using re. I know that split it into pieces will do it quickly.

I'll say this; you have two problems, splitting out the numbers and
verifying their conformance to some validity rule.

I strongly recommend treating those two problems separately. While I'm not
willing to guarantee that an RE can't be written for something like ("[A
number A]_[A number B]" such that A < B) in the general case, it won't be
anywhere near as clean or as easy to follow if you just write an RE to
extract the numbers, then verify the constraints in conventional Python.

In that case, if you know in advance that the numbers are guaranteed to be
in that format, I'd just use the regular expression "\d+", and the
"findall" method of the compile expression:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import re
m = re.compile("\d+")
m.findall("344mmm555m1111") ['344', '555', '1111']

Click to expand...

Click to expand...

If you're checking general matching of the parameters you've given, I'd
feel no shame in checking the string against r"^(_\d+){1,3}$" with .match
and then using the above to get the numbers, if you prefer that. (Note
that I believe .match implies the initial ^, but I tend to write it
anyways as a good habit. Explicit better than implicit and all that.)

(I just tried to capture the three numbers by adding a parentheses set
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think so,
so two REs may be required after all.)

could ildg · May 5, 2005

Thank you.

I just learned how to use re, so I want to find a way to settle it by
using re. I know that split it into pieces will do it quickly.

Click to expand...

I'll say this; you have two problems, splitting out the numbers and
verifying their conformance to some validity rule.

I strongly recommend treating those two problems separately. While I'm not
willing to guarantee that an RE can't be written for something like ("[A
number A]_[A number B]" such that A < B) in the general case, it won't be
anywhere near as clean or as easy to follow if you just write an RE to
extract the numbers, then verify the constraints in conventional Python.

In that case, if you know in advance that the numbers are guaranteed to be
in that format, I'd just use the regular expression "\d+", and the
"findall" method of the compile expression:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import re
m = re.compile("\d+")
m.findall("344mmm555m1111") ['344', '555', '1111']

Click to expand...

Click to expand...

If you're checking general matching of the parameters you've given, I'd
feel no shame in checking the string against r"^(_\d+){1,3}$" with .match
and then using the above to get the numbers, if you prefer that. (Note
that I believe .match implies the initial ^, but I tend to write it
anyways as a good habit. Explicit better than implicit and all that.)

(I just tried to capture the three numbers by adding a parentheses set
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think so,
so two REs may be required after all.)

You can capture each number by using group, each group can have a name.

Jeremy Bowers · May 5, 2005

Jeremy said:
Jeremy said:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12) [GCC 3.4.3 (Gentoo Linux
3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2 Type "help", "copyright",
"credits" or "license" for more information.

import re
m = re.compile("\d+")
m.findall("344mmm555m1111")

Click to expand...

['344', '555', '1111']

(I just tried to capture the three numbers by adding a parentheses set
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think
so, so two REs may be required after all.)

Click to expand...

You can capture each number by using group, each group can have a name.

I think you missed out on what I meant:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Can you also get 12 & 34 out of it? (Interesting, as the non-named groups
give you the *first* match....)

I guess I've never wanted this because I usually end up using "findall"
instead, but I could still see this being useful... parsing a function
call, for instance, and getting a tuple of the arguments instead of all of
them at once to be broken up later could be useful.

could ildg · May 5, 2005

Sorry to Jeremy, I send my email derectly to your mailbox just now.

Group is very useful.

Jeremy said:
Jeremy said:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12) [GCC 3.4.3 (Gentoo Linux
3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2 Type "help", "copyright",
"credits" or "license" for more information.
import re
m = re.compile("\d+")
m.findall("344mmm555m1111")
['344', '555', '1111']

(I just tried to capture the three numbers by adding a parenthesesset
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think
so, so two REs may be required after all.)

Click to expand...

Click to expand...

You can capture each number by using group, each group can have a name.

Click to expand...

I think you missed out on what I meant:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Can you also get 12 & 34 out of it? (Interesting, as the non-named groups

Yes, you can extract **anything** you want if you like, to get each number
is easy, the only thing you need to do is to give a name to the number.

import re
str=r"_2_544_44000000"
r=re.compile(r'^(?P<slice1>_(?P<number1>[1-3]?\d))'
'(?P<slice2>_(?P<number2>(3[2-9])|([4-9]\d)|(\d{3,})))?'
'(?P<slice3>_(?P<number3>(3[2-9])|([4-9]\d)|(\d{3,})))?$',re.VERBOSE)
mo=r.match(str)
if mo:
print mo.groupdict()
else:
print "doesn't matche"

The code above will get the following rusult:
{'slice1': '_2', 'slice2': '_544', 'slice3': '_44000000', 'number2':
'544', 'number3': '44000000', 'number1': '2'}

D H · May 5, 2005

Peter said:
Why do you think you need a regular expression?

If another approach that involved no regular expressions worked much
better, would you reject it for some reason?

-Peter

A regular expression will work fine for his problem.
Just match the digits separated by underscores using a regular
expression, then afterward check if the values are valid.

Fredrik Lundh · May 5, 2005

D H said:
A regular expression will work fine for his problem.
Just match the digits separated by underscores using a regular
expression, then afterward check if the values are valid.

you forgot to mention Boo here, Doug. nice IronPython announcement,
btw. the Boo developers must be so proud of you.

</F>

D H · May 9, 2005

Fredrik said:
you forgot to mention Boo here, Doug. nice IronPython announcement,
btw. the Boo developers must be so proud of you.

</F>

You never learn, do you Fredrik. I guess that explains why Boo will
never be mentioned on the python daily site your pythonware business
controls.

Here are some of Fredrik's funnier crazy rants right here:
http://www.oreillynet.com/pub/wlg/6291

Any that you perceive as competition and threatening to your consulting
business really draws out your true nature.

Robert Kern · May 9, 2005

D said:
Fredrik Lundh wrote:

You never learn, do you Fredrik. I guess that explains why Boo will
never be mentioned on the python daily site your pythonware business
controls.

It's called Daily Python-URL not Daily Python-Like-Languages-URL. *That*
explains it. It's not like Pythonware is hiding its relationship.

Here are some of Fredrik's funnier crazy rants right here:
http://www.oreillynet.com/pub/wlg/6291

Funny you should mention that article since I showed that Fredrik's
benchmarks were correctly done (if not diligently-reported) while Uche's
were wrong on both marks.

http://www.oreillynet.com/cs/user/view/cs_msg/51158

Any that you perceive as competition and threatening to your consulting
business really draws out your true nature.

Oy, my head hurts. Take it off-list, both of you. The rest of us don't
care about your bickering.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

D H · May 9, 2005

Robert said:
It's called Daily Python-URL not Daily Python-Like-Languages-URL. *That*
explains it.

google for logix site

ythonware.com He's announced plenty non-python
stuff that is of interest to python users, including plenty of marketing
for his own software.

> It's not like Pythonware is hiding its relationship.

It hides any mention that Fredrik Lundh is behind it, which is deceitful
when he posts any smidgeon of praise his software gets, not admitting he
makes his income off support fees for that same software.

He can try to smear me all he wants if he really thinks that will help
his business.

Funny you should mention that article since I showed that Fredrik's
benchmarks were correctly done (if not diligently-reported) while Uche's
were wrong on both marks.

http://www.oreillynet.com/cs/user/view/cs_msg/51158

Funny how you link to your own post out of context. You must have not
listened to any of the other comments.

Oy, my head hurts. Take it off-list, both of you. The rest of us don't
care about your bickering.

Yet again someone bitches about a thread right after they hypocritically
throw their own little darts into the mix.

Erik Max Francis · May 9, 2005

D said:
Yet again someone bitches about a thread right after they hypocritically
throw their own little darts into the mix.

No one cares. Please take it elsewhere.

What's the best way to write this regular expression?	41	Mar 6, 2012
Pattern Search Regular Expression	20	Jun 15, 2013
Regular Expression for the special character "\|" pipe	7	May 27, 2014
Repeating assertions in regular expression	3	Jan 3, 2012
Regular expression problem	13	Mar 10, 2013
Utility to locate errors in regular expressions	3	May 24, 2013
grimace: a fluent regular expression generator in Python	0	Jul 15, 2013
Regular Expression Help	3	Apr 12, 2009

How to write this regular expression?

could ildg

George Sakkis

could ildg

Heiko Wundram

Antoon Pardon

Peter Hansen

could ildg

could ildg

vkeyboard

James Stroud

Jeremy Bowers

could ildg

Jeremy Bowers

could ildg

D H

Fredrik Lundh

D H

Robert Kern

D H

Erik Max Francis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads