Removing Comma Within Digits

Discussion in 'Perl Misc' started by yccheok, Nov 12, 2008.

  1. yccheok

    yccheok Guest

    Hi,

    I try to change the following source text

    "A",1,234,567,890,"A",123,456
    "A",1,234,567,"A",123,456
    "A",1,234,"A",123,456
    "A",3,"A",123,456

    i wish to change to

    "A",1234567890,"A",123,456
    "A",1234567,"A",123,456
    "A",1234,"A",123,456
    "A",3,"A",123,456

    By using the following regular expression :-

    ",(\d{1,3})(,(\d{3}))+,"

    and the replacement

    ",$1$3,"

    Here is the result I obtain :-

    "A",1890,"A"
    "A",1567,"A"
    "A",1234,"A"
    "A",3,"A"

    It seem that, I wish to have one or more $3. How can I specific that
    in my code?

    Thanks!
     
    yccheok, Nov 12, 2008
    #1
    1. Advertising

  2. yccheok

    smallpond Guest

    On Nov 12, 8:30 am, yccheok <> wrote:
    > Hi,
    >
    > I try to change the following source text
    >
    > "A",1,234,567,890,"A",123,456
    > "A",1,234,567,"A",123,456
    > "A",1,234,"A",123,456
    > "A",3,"A",123,456
    >
    > i wish to change to
    >
    > "A",1234567890,"A",123,456
    > "A",1234567,"A",123,456
    > "A",1234,"A",123,456
    > "A",3,"A",123,456
    >
    > By using the following regular expression :-
    >
    > ",(\d{1,3})(,(\d{3}))+,"
    >
    > and the replacement
    >
    > ",$1$3,"
    >
    > Here is the result I obtain :-
    >
    > "A",1890,"A"
    > "A",1567,"A"
    > "A",1234,"A"
    > "A",3,"A"
    >
    > It seem that, I wish to have one or more $3. How can I specific that
    > in my code?
    >
    > Thanks!




    Your subject says you want to remove commas within digits
    but your example output still has commas within digits.

    Removing commas within digits is trivial:

    s/(\d),(\d)/$1$2/g;

    Guessing what you mean by your question can't be done in
    a regex.
     
    smallpond, Nov 12, 2008
    #2
    1. Advertising

  3. yccheok

    cartercc Guest

    On Nov 12, 8:30 am, yccheok <> wrote:
    > Hi,
    >
    > I try to change the following source text
    >
    > "A",1,234,567,890,"A",123,456
    > "A",1,234,567,"A",123,456
    > "A",1,234,"A",123,456
    > "A",3,"A",123,456


    while (<DATA>)
    {
    @line = split/"A",/;
    ($line[1] = $line[1]) =~ s/,//g;
    foreach $el (@line) {$el = '"A",' . $el; }
    shift @line;
    $line = join ',', @line;
    print $line;A
    }
    __DATA__
    "A",1,234,567,890,"A",123,456
    "A",1,234,567,"A",123,456
    "A",1,234,"A",123,456
    "A",3,"A",123,456
     
    cartercc, Nov 12, 2008
    #3
  4. yccheok <> wrote:
    >Hi,
    >
    >I try to change the following source text
    >
    >"A",1,234,567,890,"A",123,456
    >"A",1,234,567,"A",123,456
    >"A",1,234,"A",123,456
    >"A",3,"A",123,456
    >
    >i wish to change to
    >
    >"A",1234567890,"A",123,456
    >"A",1234567,"A",123,456
    >"A",1234,"A",123,456
    >"A",3,"A",123,456


    First idea
    s/(\d),(\d)/$1$2/g;
    until I noticed at the last moment that you DON"T want to remove the
    commas in the first numerical sequence only.

    use warnings; use strict;
    while (<DATA>){
    if (/([\d,]+)/) {
    my $t = $1; $t =~ tr/,//d;
    substr($_, 4, length($1)-2) = $t;
    }
    print $_;
    }
    __DATA__
    "A",1,234,567,890,"A",123,456
    "A",1,234,567,"A",123,456
    "A",1,234,"A",123,456
    "A",3,"A",123,456

    jue
     
    Jürgen Exner, Nov 12, 2008
    #4
  5. yccheok

    Mirco Wahab Guest

    yccheok wrote:
    > Hi,
    >
    > I try to change the following source text
    >
    > "A",1,234,567,890,"A",123,456
    > "A",1,234,567,"A",123,456
    > "A",1,234,"A",123,456
    > "A",3,"A",123,456
    >
    > i wish to change to
    >
    > "A",1234567890,"A",123,456
    > "A",1234567,"A",123,456
    > "A",1234,"A",123,456
    > "A",3,"A",123,456
    >
    > It seem that, I wish to have one or more $3. How can I specific that
    > in my code?


    You have to extract/find the sequence
    in question first, then delete the
    commas there ...

    ....

    my $source_text = '
    "A",1,234,567,890,"A",123,456
    "A",1,234,567,"A",123,456
    "A",1,234,"A",123,456
    "A",3,"A",123,456
    ';

    sub nocomma { (my $s=shift) =~ y/,//d; $s }

    (my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/mge;

    ....

    Regards

    M.
     
    Mirco Wahab, Nov 12, 2008
    #5
  6. yccheok

    yccheok Guest

    I should rephrase the topic,

    1) Removing comma within a digit number, where that particular digit
    must in the middle of two string.


    "A",1,234,567,890,"A",123,456

    to

    "A",1,234,567,890,"A",123,456

    the comma within the last 123 and 456 are just delimiter within a csv
    file. There is a bug in legacy data, where they located comma in
    digits, and place the digits in csv file.

    sub nocomma { (my $s=shift) =~ y/,//d; $s }

    (my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/
    mge;

    seems a nice solution. but any way i may eliminate two pass pattern
    matching?

    On Nov 12, 10:44 pm, Mirco Wahab <-halle.de> wrote:
    > yccheok wrote:
    > > Hi,

    >
    > > I try to change the following source text

    >
    > > "A",1,234,567,890,"A",123,456
    > > "A",1,234,567,"A",123,456
    > > "A",1,234,"A",123,456
    > > "A",3,"A",123,456

    >
    > > i wish to change to

    >
    > > "A",1234567890,"A",123,456
    > > "A",1234567,"A",123,456
    > > "A",1234,"A",123,456
    > > "A",3,"A",123,456

    >
    > > It seem that, I wish to have one or more $3. How can I specific that
    > > in my code?

    >
    > You have to extract/find the sequence
    > in question first, then delete the
    > commas there ...
    >
    > ...
    >
    > my $source_text = '
    > "A",1,234,567,890,"A",123,456
    > "A",1,234,567,"A",123,456
    > "A",1,234,"A",123,456
    > "A",3,"A",123,456
    > ';
    >
    > sub nocomma { (my $s=shift) =~ y/,//d; $s }
    >
    > (my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/mge;
    >
    > ...
    >
    > Regards
    >
    > M.
     
    yccheok, Nov 12, 2008
    #6
  7. yccheok

    Guest

    On Wed, 12 Nov 2008 05:30:23 -0800 (PST), yccheok <> wrote:

    >Hi,
    >
    >I try to change the following source text
    >
    >"A",1,234,567,890,"A",123,456
    >"A",1,234,567,"A",123,456
    >"A",1,234,"A",123,456
    >"A",3,"A",123,456
    >
    >i wish to change to
    >
    >"A",1234567890,"A",123,456
    >"A",1234567,"A",123,456
    >"A",1234,"A",123,456
    >"A",3,"A",123,456
    >
    >By using the following regular expression :-
    >
    >",(\d{1,3})(,(\d{3}))+,"
    >
    >and the replacement
    >
    >",$1$3,"
    >
    >Here is the result I obtain :-
    >
    >"A",1890,"A"
    >"A",1567,"A"
    >"A",1234,"A"
    >"A",3,"A"
    >
    >It seem that, I wish to have one or more $3. How can I specific that
    >in my code?
    >
    >Thanks!


    You almost got it with that regexp. Have to replace back the items
    that don't change in the substitution. As well you have to itterate
    the substitution with a while to reset the posision to the beginning.

    Each pass starts from the beginning and basically is stripping out a ','
    past the first continuous digits. This all works because of the
    delimeters ", and ,". However there are other ways of doing it as well.


    sln

    -----------------------------
    use strict;
    use warnings;

    # output:
    # "A",1234567890,"A",123,456
    # "A",1234567,"A",123,456
    # "A",1234,"A",123,456
    # "A",3,"A",123,456


    while (<DATA>)
    {
    while (s/(",)(\d+),([\d,]*?)(,")/$1$2$3$4/) {}
    print $_;
    }

    __DATA__
    "A",1,234,567,890,"A",123,456
    "A",1,234,567,"A",123,456
    "A",1,234,"A",123,456
    "A",3,"A",123,456
     
    , Nov 12, 2008
    #7
  8. yccheok

    Guest

    On Wed, 12 Nov 2008 07:29:16 -0800 (PST), yccheok <> wrote:

    >I should rephrase the topic,
    >
    >1) Removing comma within a digit number, where that particular digit
    >must in the middle of two string.
    >
    >
    >"A",1,234,567,890,"A",123,456
    >
    >to
    >
    >"A",1,234,567,890,"A",123,456
    >
    >the comma within the last 123 and 456 are just delimiter within a csv
    >file. There is a bug in legacy data, where they located comma in
    >digits, and place the digits in csv file.
    >
    >sub nocomma { (my $s=shift) =~ y/,//d; $s }
    >
    >(my $mod_text = $source_text) =~ s/(?<=",)([^"]+)(?=,")/nocomma($1)/
    >mge;
    >
    >seems a nice solution. but any way i may eliminate two pass pattern
    >matching?
    >

    Don't top post.
    IMO because you have the anchor strings there is no way you can remove
    the comma's without a minimum of 2 match operations. Its not an open
    repeatible pattern, there is a sub-pattern.

    This method is actually pretty good. You may wan't to benchmark if you
    are worried about performance. I don't see that as an issue with the
    simple thing you are doing here.


    sln
     
    , Nov 12, 2008
    #8
  9. yccheok

    Guest

    On Wed, 12 Nov 2008 17:12:10 GMT, wrote:

    >On Wed, 12 Nov 2008 05:30:23 -0800 (PST), yccheok <> wrote:
    >
    >>Hi,
    >>
    >>I try to change the following source text
    >>
    >>"A",1,234,567,890,"A",123,456
    >>"A",1,234,567,"A",123,456
    >>"A",1,234,"A",123,456
    >>"A",3,"A",123,456
    >>
    >>i wish to change to
    >>
    >>"A",1234567890,"A",123,456
    >>"A",1234567,"A",123,456
    >>"A",1234,"A",123,456
    >>"A",3,"A",123,456
    >>
    >>By using the following regular expression :-
    >>
    >>",(\d{1,3})(,(\d{3}))+,"
    >>
    >>and the replacement
    >>
    >>",$1$3,"
    >>
    >>Here is the result I obtain :-
    >>
    >>"A",1890,"A"
    >>"A",1567,"A"
    >>"A",1234,"A"
    >>"A",3,"A"
    >>
    >>It seem that, I wish to have one or more $3. How can I specific that
    >>in my code?
    >>
    >>Thanks!

    >

    [snip]

    Actually, the only way to do it in one pass is something
    like this:

    s/(",|),?(\d+|)(,".*|)/$1$2$3/g;

    >-----------------------------
    >use strict;
    >use warnings;
    >
    ># output:
    ># "A",1234567890,"A",123,456
    ># "A",1234567,"A",123,456
    ># "A",1234,"A",123,456
    ># "A",3,"A",123,456
    >
    >
    >while (<DATA>)
    >{

    s/(",|),?(\d+|)(,".*|)/$1$2$3/g;
    > print $_;
    >}
    >
    >__DATA__
    >"A",1,234,567,890,"A",123,456
    >"A",1,234,567,"A",123,456
    >"A",1,234,"A",123,456
    >"A",3,"A",123,456
    >
    >
     
    , Nov 12, 2008
    #9
  10. yccheok

    Bart Lateur Guest

    yccheok wrote:

    >I try to change the following source text
    >
    >"A",1,234,567,890,"A",123,456
    >"A",1,234,567,"A",123,456
    >"A",1,234,"A",123,456
    >"A",3,"A",123,456
    >
    >i wish to change to
    >
    >"A",1234567890,"A",123,456
    >"A",1234567,"A",123,456
    >"A",1234,"A",123,456
    >"A",3,"A",123,456


    That's not very consistent. So you want to drop the commas between the
    digits for he *first* group of digits, but not for the second? Then your
    simplistic approach won't work.

    >By using the following regular expression :-
    >
    >",(\d{1,3})(,(\d{3}))+,"
    >
    >and the replacement
    >
    >",$1$3,"


    What a weird syntax. So I assume your substitution won't actually not be
    done in perl, but in a different language, but using regexes. So be it.

    >Here is the result I obtain :-
    >
    >"A",1890,"A"
    >"A",1567,"A"
    >"A",1234,"A"
    >"A",3,"A"
    >
    >It seem that, I wish to have one or more $3. How can I specific that
    >in my code?


    The reason for your result is because you use

    (,(\d{3}))+

    So that'll match repeated group, but *only capture the last matched
    group. Match on 123,456,789 and you'll end up with ",789" for $2 and
    "789" for $3.

    A possible solution in Perl would be using lookahead and lookbehind,

    s/(?<=\d),(?=\d{3}\b)//g;

    but this likely won't work in other languages with fewer options for
    regexes, and it'll treat the second group of digits+commas the same way,
    too. The result is:

    "A",1234567890,"A",123456
    "A",1234567,"A",123456
    "A",1234,"A",123456
    "A",3,"A",123456


    What you can do is try to match the first group of digits+commas
    *between commas* and in that group, drop the commas. Something like:

    s/,([\d,]+),/ my $s = $1; $s =~ s(,)()g; ",$s," /e;

    or in Javascript, for example

    data.replace(/,([\d,]+),/,
    function(all, m1) { return ","+m1.replace(/,/g, '')+"," })

    --
    Bart.
     
    Bart Lateur, Nov 14, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jasper
    Replies:
    1
    Views:
    463
    Joe Smith
    Jun 27, 2004
  2. John
    Replies:
    1
    Views:
    383
    zPaul
    Jul 1, 2003
  3. Chris  Chiasson
    Replies:
    6
    Views:
    652
    Richard Tobin
    Nov 14, 2006
  4. Replies:
    20
    Views:
    3,278
    Peter Flynn
    Jun 20, 2009
  5. Daz
    Replies:
    3
    Views:
    123
Loading...

Share This Page