Removing Comma Within Digits

Y

yccheok

Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

By using the following regular expression :-

",(\d{1,3})(,(\d{3}))+,"

and the replacement

",$1$3,"

Here is the result I obtain :-

"A",1890,"A"
"A",1567,"A"
"A",1234,"A"
"A",3,"A"

It seem that, I wish to have one or more $3. How can I specific that
in my code?

Thanks!
 
S

smallpond

Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

By using the following regular expression :-

",(\d{1,3})(,(\d{3}))+,"

and the replacement

",$1$3,"

Here is the result I obtain :-

"A",1890,"A"
"A",1567,"A"
"A",1234,"A"
"A",3,"A"

It seem that, I wish to have one or more $3. How can I specific that
in my code?

Thanks!



Your subject says you want to remove commas within digits
but your example output still has commas within digits.

Removing commas within digits is trivial:

s/(\d),(\d)/$1$2/g;

Guessing what you mean by your question can't be done in
a regex.
 
C

cartercc

Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

while (<DATA>)
{
@line = split/"A",/;
($line[1] = $line[1]) =~ s/,//g;
foreach $el (@line) {$el = '"A",' . $el; }
shift @line;
$line = join ',', @line;
print $line;A
}
__DATA__
"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456
 
J

Jürgen Exner

yccheok said:
Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

First idea
s/(\d),(\d)/$1$2/g;
until I noticed at the last moment that you DON"T want to remove the
commas in the first numerical sequence only.

use warnings; use strict;
while (<DATA>){
if (/([\d,]+)/) {
my $t = $1; $t =~ tr/,//d;
substr($_, 4, length($1)-2) = $t;
}
print $_;
}
__DATA__
"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

jue
 
M

Mirco Wahab

yccheok said:
Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

It seem that, I wish to have one or more $3. How can I specific that
in my code?

You have to extract/find the sequence
in question first, then delete the
commas there ...

....

my $source_text = '
"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456
';

sub nocomma { (my $s=shift) =~ y/,//d; $s }

(my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/mge;

....

Regards

M.
 
Y

yccheok

I should rephrase the topic,

1) Removing comma within a digit number, where that particular digit
must in the middle of two string.


"A",1,234,567,890,"A",123,456

to

"A",1,234,567,890,"A",123,456

the comma within the last 123 and 456 are just delimiter within a csv
file. There is a bug in legacy data, where they located comma in
digits, and place the digits in csv file.

sub nocomma { (my $s=shift) =~ y/,//d; $s }

(my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/
mge;

seems a nice solution. but any way i may eliminate two pass pattern
matching?

yccheok said:
I try to change the following source text

i wish to change to

It seem that, I wish to have one or more $3. How can I specific that
in my code?

You have to extract/find the sequence
in question first, then delete the
commas there ...

...

my $source_text = '
"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456
';

sub nocomma { (my $s=shift) =~ y/,//d; $s }

(my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/mge;

...

Regards

M.
 
S

sln

Hi,

I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

By using the following regular expression :-

",(\d{1,3})(,(\d{3}))+,"

and the replacement

",$1$3,"

Here is the result I obtain :-

"A",1890,"A"
"A",1567,"A"
"A",1234,"A"
"A",3,"A"

It seem that, I wish to have one or more $3. How can I specific that
in my code?

Thanks!

You almost got it with that regexp. Have to replace back the items
that don't change in the substitution. As well you have to itterate
the substitution with a while to reset the posision to the beginning.

Each pass starts from the beginning and basically is stripping out a ','
past the first continuous digits. This all works because of the
delimeters ", and ,". However there are other ways of doing it as well.


sln

-----------------------------
use strict;
use warnings;

# output:
# "A",1234567890,"A",123,456
# "A",1234567,"A",123,456
# "A",1234,"A",123,456
# "A",3,"A",123,456


while (<DATA>)
{
while (s/(",)(\d+),([\d,]*?)(,")/$1$2$3$4/) {}
print $_;
}

__DATA__
"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456
 
S

sln

I should rephrase the topic,

1) Removing comma within a digit number, where that particular digit
must in the middle of two string.


"A",1,234,567,890,"A",123,456

to

"A",1,234,567,890,"A",123,456

the comma within the last 123 and 456 are just delimiter within a csv
file. There is a bug in legacy data, where they located comma in
digits, and place the digits in csv file.

sub nocomma { (my $s=shift) =~ y/,//d; $s }

(my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/
mge;

seems a nice solution. but any way i may eliminate two pass pattern
matching?
Don't top post.
IMO because you have the anchor strings there is no way you can remove
the comma's without a minimum of 2 match operations. Its not an open
repeatible pattern, there is a sub-pattern.

This method is actually pretty good. You may wan't to benchmark if you
are worried about performance. I don't see that as an issue with the
simple thing you are doing here.


sln
 
S

sln

[snip]

Actually, the only way to do it in one pass is something
like this:

s/(",|),?(\d+|)(,".*|)/$1$2$3/g;
 
B

Bart Lateur

yccheok said:
I try to change the following source text

"A",1,234,567,890,"A",123,456
"A",1,234,567,"A",123,456
"A",1,234,"A",123,456
"A",3,"A",123,456

i wish to change to

"A",1234567890,"A",123,456
"A",1234567,"A",123,456
"A",1234,"A",123,456
"A",3,"A",123,456

That's not very consistent. So you want to drop the commas between the
digits for he *first* group of digits, but not for the second? Then your
simplistic approach won't work.
By using the following regular expression :-

",(\d{1,3})(,(\d{3}))+,"

and the replacement

",$1$3,"

What a weird syntax. So I assume your substitution won't actually not be
done in perl, but in a different language, but using regexes. So be it.
Here is the result I obtain :-

"A",1890,"A"
"A",1567,"A"
"A",1234,"A"
"A",3,"A"

It seem that, I wish to have one or more $3. How can I specific that
in my code?

The reason for your result is because you use

(,(\d{3}))+

So that'll match repeated group, but *only capture the last matched
group. Match on 123,456,789 and you'll end up with ",789" for $2 and
"789" for $3.

A possible solution in Perl would be using lookahead and lookbehind,

s/(?<=\d),(?=\d{3}\b)//g;

but this likely won't work in other languages with fewer options for
regexes, and it'll treat the second group of digits+commas the same way,
too. The result is:

"A",1234567890,"A",123456
"A",1234567,"A",123456
"A",1234,"A",123456
"A",3,"A",123456


What you can do is try to match the first group of digits+commas
*between commas* and in that group, drop the commas. Something like:

s/,([\d,]+),/ my $s = $1; $s =~ s(,)()g; ",$s," /e;

or in Javascript, for example

data.replace(/,([\d,]+),/,
function(all, m1) { return ","+m1.replace(/,/g, '')+"," })
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top