Elegant equivalent to this regex?

  • Thread starter sherifffruitfly
  • Start date
S

sherifffruitfly

Hi all,

I'm new to regex, and hacked this one together. It seems awfully
redundant to me, but it does have the virtue at least of wearing its
meaning on its sleeve.

The task:

(1) match all quoted-comma'd numbers consisting of either 2 or 3
"sections". That is, critters of either form:

" "ddd,ddd,ddd" "
or
" "ddd,ddd" "

(2) Capture all of the digits, leaving the quotes and commas for the
garbage man. For example:

" "123, 456, 789" "
should in some fashion capture
"123456789"

Here's the regex I came up with:

(?<whole>\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3})\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\")

This works fine for me, and getting the desired complete "clean" number
from it is a
triviality.

But I get the feeling that this is the regex-equivalent of baby-talk.
I'd like to know if there's a simpler, more elegant regex matching the
same class of strings, and capturing essentially the same substrings.


Thanks for any insights,

cdj
 
U

usenet

sherifffruitfly said:
(2) Capture all of the digits, leaving the quotes and commas for the
garbage man.

If you just want to strip out the non-numerics why futz with regexps?
Why not just use s/// to get rid of the non-numerics?

my $original = " 123, 456, 789 ' ";
(my $numbers = $original) =~ s/\D//g ;
print $numbers;
 
S

sherifffruitfly

If you just want to strip out the non-numerics why futz with regexps?
Why not just use s/// to get rid of the non-numerics?

my $original = " 123, 456, 789 ' ";
(my $numbers = $original) =~ s/\D//g ;
print $numbers;

Because I want to use regex, please.

If you don't wish to help me with my question, but prefer to answer
only your own instead, that's perfectly fine, of course.
 
J

J. Gleixner

sherifffruitfly said:
Because I want to use regex, please.

If you don't wish to help me with my question, but prefer to answer
only your own instead, that's perfectly fine, of course.

Provide examples of your data, expected results, and what you've
tried. If you post a short example with all of those, it'll
be much easier to help.

The requirements and regular expression you posted don't coincide.
 
S

sherifffruitfly

J. Gleixner said:
Provide examples of your data, expected results, and what you've
tried. If you post a short example with all of those, it'll
be much easier to help.

Just in case I typo'd my regex, here it is again, copy/pasted straight
from Expresso (regex testing/analysis tool):

\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3})\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\"

This is intended for use against csv files - there may be stuff in the
regex that exploits this assumption - I forget.

Sample text:

Oct
2005,6.02,211.9,"1,573,958",31.9,"135,191",722.8676,67.3,19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,"1,573,958",32.2,"135,191",722.8676,67.3,19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,"1,570,805",32.5,"136,005",723.228,66.2,19.2,18.2,19.3,13.3,13.3,,
Jan
2006,6.02,215.6,"1,573,483",32.9,"137,032",723.36,67.1,18.9,17.9,19.2,9.9,9.9,,
Feb
2006,6.02,216.7,"1,577,319",33.2,"137,413",723.4165,67.0,18.8,17.9,19.2,10.3,10.3,,
Mar
2006,6.02,217.2,"1,579,222",33.5,"137,519",723.5606,66.9,18.6,17.6,19.2,12.9,12.9,,
Apr
2006,6.02,218.6,"1,579,587",33.8,"138,393",723.63,66.8,18.5,17.5,19.2,10.8,10.8,,
May
2006,6.02,218.9,"1,578,357",34.2,"138,669",723.687,66.8,18.4,17.3,19.3,12.0,12.0,,
Jun
2006,6.02,218.5,"1,572,273",34.5,"138,963",725.0399,66.7,18.4,17.2,19.3,11.8,11.8,,
Jul
2006,6.03,218.0,"1,563,849",35.1,"139,379",725.1364,66.6,18.3,17.2,19.3,10.1,10.1,,
Aug
2006,6.03,217.3,"1,557,949",35.4,"139,467",725.205,66.5,18.4,17.1,19.4,11.2,11.2,,
Sep
2006,6.04,216.7,"1,549,354",35.8,"139,867",725.3182,66.3,18.4,17.2,19.4,9.5,9.5,,
Oct
2006,6.04,215.6,"1,541,800",35.8,"139,826",725.3855,66.2,18.5,17.2,19.4,10.6,10.6,,

Representative sample of matches/captures:

match <whole> <one> <two>
<three>
"1,573,958" "1,573,958" 1 573
958
"135,191" "135,191" 135 191
(empty)
etc.

In my *ideal* regex, there would be just 1 capture from a given match:
the concatenation of the numerically named capture group values
(1573958 or 135191, in the above). Achieving th effect of this ideal
result is trivial from what my regex *does* provide, however.

The requirements and regular expression you posted don't coincide.

I included "qualifier-words" (e.g., "essentially") in my OP that were
intended to make your statement false. I may have failed. Also I didn't
explicitly state an obvious limitation of my regex: that it only
"works" for quoted-comma'd numbers consisting of either 2 or 3 blocks
(e.g., it fails for numbers in the billions); that suffices for my
needs

A regex that satisfies my ideal situation would be great. But if that's
not possible, simply a more elegant more-or-less-equivalent of my own
would be greatly appreciated. There are many possible ways to
de-pretty-print - I just want an elegant one.

Does that help?

Thanks for responding,

cdj
 
S

sherifffruitfly

Mirco said:
You didn't specify how *exact* is your matching requirement,
eg. if you have data like this:

Yah - the reason I left it vague is because what I *want* is best
described in English as
"de-pretty-printing-numerical-csv-file-entries-that-shouldn't-have-been-pretty-printed-in-the-first-place".

As I expect there to be many ways to skin that particular cat, I didn't
want to unnecessarily lock-in one particular approach or whatever. Does
that make sense?

Thanks,

cdj
 
M

Mumia W. (on aioe)

[...]
(2) Capture all of the digits, leaving the quotes and commas for the
garbage man. For example:

" "123, 456, 789" "
should in some fashion capture
"123456789"
[...]

my $string = "123, 456, 789";
my $num = join('',$string =~ /\d+/g);
print "num = $num\n";
 
J

J. Gleixner

sherifffruitfly said:
Just in case I typo'd my regex, here it is again, copy/pasted straight
from Expresso (regex testing/analysis tool):

\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3})\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\"

OK.. so how are you using it?? Show some actual code.
This is intended for use against csv files - there may be stuff in the
regex that exploits this assumption - I forget.

Sample text:

Oct
2005,6.02,211.9,"1,573,958",31.9,"135,191",722.8676,67.3,19.1,18.1,19.2,18.4,18.4,,
Representative sample of matches/captures:

match <whole> <one> <two>
<three>
"1,573,958" "1,573,958" 1 573
958
"135,191" "135,191" 135 191
(empty)
etc.

In my *ideal* regex, there would be just 1 capture from a given match:
the concatenation of the numerically named capture group values
(1573958 or 135191, in the above). Achieving th effect of this ideal
result is trivial from what my regex *does* provide, however.



I included "qualifier-words" (e.g., "essentially") in my OP that were
intended to make your statement false. I may have failed. Also I didn't
explicitly state an obvious limitation of my regex: that it only
"works" for quoted-comma'd numbers consisting of either 2 or 3 blocks
(e.g., it fails for numbers in the billions); that suffices for my
needs

A regex that satisfies my ideal situation would be great. But if that's
not possible, simply a more elegant more-or-less-equivalent of my own
would be greatly appreciated. There are many possible ways to
de-pretty-print - I just want an elegant one.

Maybe you're simply after what's captured by ()??

$_ = q{"123,456,789"};
if ( /^"(\d{1,3}),(\d{1,3}),?(\d{1,3})?"$/ )
{
print "$1$2$3\n";
}
Does that help?

No.. where's the short script??

To be safe, use one of the CSV modules available from CPAN. (e.g.
Text::CSV::Simple)
Parse data.
Iterate through data, looking at each item/cell.
If the item contains a ',', remove it and if the value is > 999, then
do whatever you want with it.


if ( /,/ )
{
my $entry = $_;
$entry =~ tr/,//d;
print $entry, "\n" if $entry > 999;
}

or if the item matches above expression, then do whatever you want
with $1, $2, and $3.

if ( /^"(\d{1,3}),(\d{1,3}),?(\d{1,3})?"$/ )
{
print "$1$2$3\n";
}

Or, if you're really sure of your data...

my $str =<<EOT;
Oct
2005,6.02,211.9,"1,573,958",31.9,"135,191",722.8676,67.3,19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,"1,573,958",32.2,"135,191",722.8676,67.3,19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,"1,570,805",32.5,"136,005",723.228,66.2,19.2,18.2,19.3,13.3,13.3,,
EOT
$str =~ s/,"(\d{1,3}),(\d{1,3}),?(\d{1,3})?",/,$1$2$3,/g;
print $str;

Oct
2005,6.02,211.9,1573958,31.9,135191,722.8676,67.3,19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,1573958,32.2,135191,722.8676,67.3,19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,1570805,32.5,136005,723.228,66.2,19.2,18.2,19.3,13.3,13.3,,
 
D

DJ Stunks

sherifffruitfly said:
Because I want to use regex, please.

that DOES use a regular expression, but why would you insist on a
particular method anyway? is this a homework assignment? do you want
a solution or don't you?
If you don't wish to help me with my question, but prefer to answer
only your own instead, that's perfectly fine, of course.

no need to get snotty.

-jp
 
J

John Bokma

sherifffruitfly said:
Yah - the reason I left it vague is because what I *want* is best
described in English as
"de-pretty-printing-numerical-csv-file-entries-that-shouldn't-have-been
-pretty-printed-in-the-first-place".

Use a module that reads and parses a CSV file
for each column in a row that contains a pretty-printed number turn it
into a non-pretty printed number.

In fact there was no need to mention a CSV file at all, you should be
smart enough to find the right module for that one. Your problem could be
reduced to:

I have a number that can be written as:

example(s)

How can I turn this into a normal number.

s/,// seems to be the right answer.
 
U

Uri Guttman

[...]
(2) Capture all of the digits, leaving the quotes and commas for the
garbage man. For example:
" "123, 456, 789" "
should in some fashion capture
"123456789"
[...]

MW(a> my $string = "123, 456, 789";
MW(a> my $num = join('',$string =~ /\d+/g);
MW(a> print "num = $num\n";

bah!

my( $num = $string ) =~ tr/0-9//dc ;

uri
 
U

Uri Guttman

u> I think Uri meant
u> (my $num = $string ) =~ tr/0-9//dc ;

yeah. my point was to show tr is better for this than a regex. the my
stuff was just boilerplate code.

and moronzilla needs to be locked up again. why did they let it escape
the loonie bin?

uri
 
J

Jürgen Exner

sherifffruitfly said:
Because I want to use regex, please.

???
/\D/ _is_ a regular expression. If you mean something else by 'regex' then
please explain.

jue
 
F

foo bar baz qux

Purl said:
Benchmark: timing 1000000 iterations of PurlGurl, Uri...
PurlGurl: 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ 534759.36/s(n=1000000)
Uri: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 324675.32/s(n=1000000)

PG still has a pointless obsession with microsecond differences.

Easy for readers to surmise use of lexical declarations on
a global scale reduces script efficiency an average of
thirty-three percent, for this specific case example.

Typically verbose, PG takes 27 words to say "1.87 is 33% less than
3.08".

Maybe PG should have spent more time on the pertinent arithmetic and
less on the otiose wordage.


Soon there may be paranoid accusations and tales of farm life.


Purl Gurl rants include ...

"over these many years regulars in this group have directed at me,
insults, racial slurs, threats of physical violence, threats of death,
have tried a million times to crash my server and, of course, troll me
everyday"
- Purl Gurl, Oct 26 2005


"You of the Perl Community, have successfully given the Perl Community
a reputation of being populated by mentally disturbed people who are
psychotically driven to harass all peoples, children, women, men, the
elderly, all people, about whom you know nothing. "
- Purl Gurl, Jun 27 2004


"... a huge mule's ass I once plowed behind as a child. I looked up,
way up cuz that mule stood eighty hands high"
- Purl Gurl, Oct a6 2005
Really? 80 hands = 80 x 4 inches = 320 inches = 26 feet 8 inches.
You cannot expect us to believe your mule was nearly 27 feet tall.
- Joe Smith, Oct 16 2005

Hmm, arithmetic may be a long standing weakness of PG.


"Use of "our" and "my" for globals serves no purpose. "
- Purl Gurl, Oct 15 2005

#!perl
# Written by Purl Gurl
....
our %in;
- Ken Singleton, Jul 7 2004
 
J

John Bokma

foo bar baz qux said:
Purl Gurl rants include ...

Bokma H4x0r'd my b0x!
You are all male sexists picking on Abigail and me

Or something along those lines :-D
 
S

sherifffruitfly

please explain.

jue

I'm afraid I don't understand what is to be explained. I asked for
merely an improved regex. You (above) questioned my use of regex. I
repeated that I was using regex. Then you ask me to explain. Explain
what, exactly?

If you (or anyone) wish to only answer questions *you* think I *should*
be asking, rather than questions I am *actually* asking, that's fine, I
suppose. I don't see how that requires any explanation from *me*,
however.

Didn't mean for this to become acrimonious - I just wanted a better
regex. I'll leave now.

Thanks anyway,

cdj
 
J

Jürgen Exner

sherifffruitfly said:
Jürgen Exner wrote:
You snipped some vital parts of this discussion:
Someone (usenet@....) suggested:
You replied:
To me this indicates that you are not satisfied with the suggested solution
because it does not use a 'regex'.
Is this correct so far?

Then I replied
/\D/ _is_ a regular expression. If you mean something else by 'regex'
then
please explain.
I'm afraid I don't understand what is to be explained.
I asked for merely an improved regex. You (above) questioned my use of
regex. I
repeated that I was using regex. Then you ask me to explain. Explain
what, exactly?

'Regex' is not a word of the English language. However it is commonly used
as an abbreviation of 'regular expression'.
Assuming that you are using 'regex' with this common meaning, too, then your
comment about the suggested solution is non-sensical because the suggested
solution does use a regular expression. /\D/ is a regular expression,
therefore the solution meets your request for using regular expressions.
The other alternative is that you are _not_ using 'regex' in the common
meaning of 'regular expression', in which case I asked you to please explain
what do you mean with 'regex'.
If you (or anyone) wish to only answer questions *you* think I
*should* be asking, rather than questions I am *actually* asking,
that's fine, I suppose.

But usenet@... was answering your question. He did suggest a solution that
uses a regular expression.
I don't see how that requires any explanation
from *me*, however.

If the suggested solution does not satisfy your needs then you need to
explain why not. It does use a regular expression which everyone assumed you
were looking for. You are saying it doesn't use a 'regex'. So apparently we
are at odds about the meaning of 'regex' and only _you_ can explain what
_you_ meant with 'regex' because apparently it is not the normal meaning
'regular expression'.

jue
 
C

Charlton Wilbur

sff> If you (or anyone) wish to only answer questions *you* think
sff> I *should* be asking, rather than questions I am *actually*
sff> asking, that's fine, I suppose. I don't see how that requires
sff> any explanation from *me*, however.

The intent behind such questions is to help you come up with a better
solution. Regular expressions are a bad tool for many things, and
this may be one such. (I didn't look at your original question, so
don't have an opinion on that.) Imposing an arbitrary constraint
(such as "solutions MUST use regular expressions") without any
justification for that constraint means people will ignore the
constraint, on the assumptions that you have the full Perl toolkit
available and you're most interested in solving the problem clearly.

If there's a reason you *must* use a regular expression, explain what
it is. (But be prepared: "I'm doing this with PHP's preg_replace, so
I can't use HTML::parser," for instance, such as I've seen before,
will get you redirected to a PHP forum.) If there's no reason to
restrict the solution to regular expressions, why not use the simplest
solution? And why get snarky and rude to people who are trying to
help, because you have constraints on the help you're willing to
accept that you can't or won't explain?

Charlton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top