search/replace

Discussion in 'Perl Misc' started by molsted, Mar 30, 2009.

  1. molsted

    molsted Guest

    Hi all
    I'm trying to relpace some strings in a textfile like this:
    &00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M

    I've tried the following:
    s/&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n/
    <Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/
    g;

    with no luck.

    --
    Rene
     
    molsted, Mar 30, 2009
    #1
    1. Advertising

  2. molsted <> wrote:

    > I'm trying to relpace some strings in a textfile like this:
    > &00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M




    Does your data really have caret-M in it or does it instead have
    carriage return-linefeed in it?

    You should write the data in Real Perl Code so that there is no ambiguity.

    Have you seen the Posting Guidelines that are posted here frequently?


    --------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES> <Style:DK> /;

    $_ = "&00Antiques\r\n&00Antiquit<0x00E4>ten\r\n&00Antiquit<0x00E9>s\r\n"
    . "&00Antig<0x00FC>edades\r\n&00Antikviteter\r\n";
    print;
    print "\n";

    my $capture_num = 0;
    s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;
    print;
    --------------------------


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 30, 2009
    #2
    1. Advertising

  3. molsted

    Alex Guest

    molsted wrote:
    > Hi all
    > I'm trying to relpace some strings in a textfile like this:
    > &00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M
    >
    > I've tried the following:
    > s/&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n/
    > <Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/
    > g;


    What does the '^M' mean to you? My editor shows carriage returns as
    '^M', but I see you're search for a carriage return followed by a
    newline. Since not all line breaks are the same (it depends on your
    system), you want to match "at the end of the line". You want to use $
    to match "just before the end of a line" and .*? to chomp off the the
    line breaking characters. This requires using the flags s and m. The
    x-flag permits whitespace in your expression, which improves readability.

    Here's a version that works on my system:

    s/


    ^ & 00 ([^\r\n]+?)$ .*?



    ^ & 00 ([^\r\n]+?)$ .*?



    ^ & 00 ([^\r\n]+?)$ .*?



    ^ & 00 ([^\r\n]+?)$ .*?



    ^ & 00 ([^\r\n]+?)$ .*?



    /<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/msx;


    Note: The white space in "^ & 00 ([^\r\n]+?)$ .*?" is ignored, so it
    really means "^&00([^\r\n]+?)$.*?", which means "At the start of a line,
    match an ampersand, followed by two zeros, followed by any number of
    characters which are not carriage returns or line feeds, just before the
    end of the line".

    HTH!

    --
    Alex
    domain: iki dot fi
    localpart: alext
    email: localpart at domain
     
    Alex, Mar 31, 2009
    #3
  4. molsted

    Alex Guest

    Alex meant to write:
    > s/
    >
    > ^ & 00 ([^\r\n]+?)$ .*?
    > ^ & 00 ([^\r\n]+?)$ .*?
    > ^ & 00 ([^\r\n]+?)$ .*?
    > ^ & 00 ([^\r\n]+?)$ .*?
    > ^ & 00 ([^\r\n]+?)$ .*?
    > /<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/msx;


    And sorry for all the extra lines, which my ng-reader added for me.


    --
    Alex
    domain: iki dot fi
    localpart: alext
    email: localpart at domain
     
    Alex, Mar 31, 2009
    #4
  5. molsted

    molsted Guest

    On 30 Mar., 14:41, Tad J McClellan <> wrote:
    > Does your data really have caret-M in it or does it instead have
    > carriage return-linefeed in it?
    >
    > You should write the data in Real Perl Code so that there is no ambiguity..
    >
    > Have you seen the Posting Guidelines that are posted here frequently?
    >
    > --------------------------
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES> <Style:DK> /;
    >
    > $_ = "&00Antiques\r\n&00Antiquit<0x00E4>ten\r\n&00Antiquit<0x00E9>s\r\n"
    >    . "&00Antig<0x00FC>edades\r\n&00Antikviteter\r\n";
    > print;
    > print "\n";
    >
    > my $capture_num = 0;
    > s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;
    > print;
    > --------------------------


    Hi Tad,
    I haven't seen Posting Guidelines, this my first post to the group,
    can i read them some where?
    I'm going with your suggestion but it only matches the first line.
    However if I put more sequences in the @lang-array it will work.
    How would I overcome that?

    ----------------------------

    #!/usr/bin/perl

    use strict;

    my $fileName=$ARGV[0];

    open(FILE,"$fileName") || die("Cannot Open File");

    my(@fcont) = <FILE>;

    close FILE;

    open(FOUT,">$fileName.txt") || die("Cannot Open File");

    foreach my $line (@fcont) {

    $line =~ s/\r/\r\n/g;

    #### METHOD #1 BEGIN ####

    my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES>
    <Style:DK> /;
    my $capture_num = 0;
    $line =~ s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;

    #### METHOD #1 END ####

    print FOUT $line;
    }
    close FOUT;

    exit 0

    ----------------------------


    --
    Rene
     
    molsted, Apr 1, 2009
    #5
  6. molsted <> wrote:

    > I haven't seen Posting Guidelines, this my first post to the group,
    > can i read them some where?



    http://tinyurl.com/dg27de


    > I'm going with your suggestion but it only matches the first line.



    To analyse the behavior of a pattern match, we need two things:

    1) the pattern that is to be matched
    2) the string that the pattern is to be matched against

    Since we only have access to one of them, we cannot analyse why it
    fails to match.


    > #!/usr/bin/perl
    >
    > use strict;



    use warnings;


    > my $fileName=$ARGV[0];
    >
    > open(FILE,"$fileName") || die("Cannot Open File");



    You should not quote lone variables:

    perldoc -q vars

    What's wrong with always quoting "$vars"?

    You should use the 3-argument form of open() and a lexical filehandle.

    You should include the name of the file in the diag message.

    You should put delimiters around the filename in your diag message.

    You should include the $! variable in the diag messages.


    open my $FILE, '<', $file_name or die "could not open '$file_name' $!";


    > my(@fcont) = <FILE>;



    my @fcont = <$FILE>;

    (but see below)


    > foreach my $line (@fcont) {



    You should not read the entire file into memory if you only need
    one line of the file at a time.

    while ( my $line = <$FILE> ) {


    > $line =~ s/\r/\r\n/g;



    Why are you doing this?

    Is the file a MAC-OS (not OS X) text file?

    It is too late to fix line endings after you have used <> to read "lines".

    You need to fix them *before* applying the <> operator.

    Perhaps by setting the $/ variable to an appropriate value.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Apr 1, 2009
    #6
  7. molsted

    molsted Guest

    On 1 Apr., 14:32, Tad J McClellan <> wrote:
    > To analyse the behavior of a pattern match, we need two things:
    >
    > 1) the pattern that is to be matched


    Sample pattern:
    &00Antiques^M
    &00Antiquit<0x00E4>ten^M
    &00Antiquit<0x00E9>s^M
    &00Antig<0x00FC>edades^M
    &00Antikviteter^M

    Sample output:
    <Style:GB>Antiques
    <Style:DE>Antiquit<0x00E4>ten
    <Style:FR>Antiquit<0x00E9>s
    <Style:ES>Antig<0x00FC>edades
    <Style:DK>Antikviteter

    All on seperate lines. The file is generated on a Windows PC (\r\n),
    my file needs to end up as a UNIX-file on Mac OS X

    The first file had accidently been opened on a Mac, hence the \r end
    of line.

    I hope this clears things a bit up.

    The file is being converted from 1252 to Macroman prior being run
    through script (/usr/bin/iconv -f WINDOWS-1252 -t MACROMAN). However I
    am considdering using 'Text::Iconv' instead.

    --
    Rene
     
    molsted, Apr 2, 2009
    #7
  8. molsted <> wrote:
    > On 1 Apr., 14:32, Tad J McClellan <> wrote:
    >> To analyse the behavior of a pattern match, we need two things:
    >>
    >> 1) the pattern that is to be matched

    >
    > Sample pattern:
    > &00Antiques^M
    > &00Antiquit<0x00E4>ten^M
    > &00Antiquit<0x00E9>s^M
    > &00Antig<0x00FC>edades^M
    > &00Antikviteter^M



    That is NOT the pattern to be matched!

    The pattern to be matched is:

    &00([^\r]+)\r\n

    Those are (meant to be) the strings that the pattern is to be matched against.

    The reason that none of those strings match the pattern is because
    none of those strings contain a carriage return, and the pattern requires
    a carriage return.

    A hex dump, such as from xxd, shows that there are no carriage returns
    in that data. Each lines ends with a caret (ASCII 0x5e), an upper
    case "M" (ASCII 0x4d) and a linefeed (ASCII 0x0a):

    0000000: 2630 3041 6e74 6971 7565 735e 4d0a 2630 &00Antiques^M.&0
    ^^ ^^^^
    0000010: 3041 6e74 6971 7569 743c 3078 3030 4534 0Antiquit<0x00E4
    0000020: 3e74 656e 5e4d 0a26 3030 416e 7469 7175 >ten^M.&00Antiqu
    ^^^^ ^^
    0000030: 6974 3c30 7830 3045 393e 735e 4d0a 2630 it<0x00E9>s^M.&0
    ^^ ^^^^

    If you cannot figure out how to post data with the line endings that
    are actually in your data, then write the data in Real Perl Code.

    (that sounds familiar...)

    instead of

    while ( <FILE> ) {

    put the data into an array and loop over the array:

    my @lines = ( "&00Antiques\r\n", "&00Antiquit<0x00E4>ten\r\n", ...
    foreach ( @lines ) {


    > The file is generated on a Windows PC (\r\n),
    > my file needs to end up as a UNIX-file on Mac OS X



    Then all you need to do is delete all of the carriage returns before
    matching:

    tr/\r//d;

    and change the pattern to not require carriage returns.


    > The first file had accidently been opened on a Mac, hence the \r end
    > of line.



    That explains it then.

    On Linux/OS X the input operator, <>, reads until it finds a newline.

    Since there were no newlines, a single read gets the entire file in one go.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Apr 2, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Blais
    Replies:
    1
    Views:
    382
    Bruno Desthuilliers
    Jun 27, 2006
  2. Greg Ewing
    Replies:
    2
    Views:
    345
    Dieter Maurer
    Jun 29, 2006
  3. Alun
    Replies:
    3
    Views:
    4,515
    Masudur
    Feb 18, 2008
  4. Abby Lee
    Replies:
    5
    Views:
    417
    Abby Lee
    Aug 2, 2004
  5. Prasad S
    Replies:
    2
    Views:
    234
    Dr John Stockton
    Aug 27, 2004
Loading...

Share This Page