regex diffs between perl 5.6.1 and 5.8.0?

Discussion in 'Perl Misc' started by Patrick Flaherty, Aug 15, 2003.

  1. Hi,

    Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
    from a set of files:

    perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis

    I run the same thing under 5.8.0 and it has no effect.

    Doesn't compile or puke. But doesn't remove the garbage chars either.

    From what little I've read, there do appear to be noticable differences between
    pre-5.8 and 5.8.+

    pat
    Patrick Flaherty, Aug 15, 2003
    #1
    1. Advertising

  2. Patrick Flaherty

    Jay Tilton Guest

    Patrick Flaherty <> wrote:

    : Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
    : from a set of files:
    :
    : perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
    :
    : I run the same thing under 5.8.0 and it has no effect.

    Case matters. "\X1a" is not the same thing as "\x1a".
    "\X" in a regex has its own special meaning.

    If that code worked as expected in 5.6.1., it probably shouldn't have.
    The difference in behavior between 5.6.1 and 5.8.0 would be because of
    a bug fix, though I'm not seeing it right away in the delta docs.
    Jay Tilton, Aug 15, 2003
    #2
    1. Advertising

  3. In article <>, Jay Tilton says...
    >
    >Patrick Flaherty <> wrote:
    >
    >: Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
    >: from a set of files:
    >:
    >: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
    >:
    >: I run the same thing under 5.8.0 and it has no effect.
    >
    >Case matters. "\X1a" is not the same thing as "\x1a".
    >"\X" in a regex has its own special meaning.
    >
    >If that code worked as expected in 5.6.1., it probably shouldn't have.
    >The difference in behavior between 5.6.1 and 5.8.0 would be because of
    >a bug fix, though I'm not seeing it right away in the delta docs.
    >



    Thanx Jay,

    Actually my original code _is_ a lower-case x. The upper case in the above was
    some stuff I was experimenting with. So I don't think this is the problem I'm
    having.

    pat
    Patrick Flaherty, Aug 15, 2003
    #3
  4. Patrick Flaherty

    Jay Tilton Guest

    Patrick Flaherty <> wrote:
    : In article <>, Jay Tilton says...
    : >Patrick Flaherty <> wrote:
    : >
    : >: Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
    : >: from a set of files:
    : >:
    : >: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
    : >:
    : >: I run the same thing under 5.8.0 and it has no effect.
    : >
    : >Case matters. "\X1a" is not the same thing as "\x1a".
    : >"\X" in a regex has its own special meaning.
    :
    : Actually my original code _is_ a lower-case x. The upper case in the above was
    : some stuff I was experimenting with. So I don't think this is the problem I'm
    : having.

    Then I'm stumped. As far as that code goes, there should be no
    difference between 5.6.1 and 5.8.0.

    The only reason I can see that the code would not strip \x1a
    characters from the ends of lines is if the lines have no \x1a at
    their ends.

    It's time for a more rigorous regression test and a hard look at your
    data file.

    As a complete WAG, you might investigate binmode(), which became
    significant on all platforms with Perl 5.8.0.
    Jay Tilton, Aug 16, 2003
    #4
  5. In article <>, Jay Tilton says...
    >
    >Patrick Flaherty <> wrote:
    >: In article <>, Jay Tilton says...
    >: >Patrick Flaherty <> wrote:
    >: >
    >: >: Back in 5.6.1, the following succeeded in stripping out all x1a garbage
    >chars
    >: >: from a set of files:
    >: >:
    >: >: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
    >: >:
    >: >: I run the same thing under 5.8.0 and it has no effect.
    >: >
    >: >Case matters. "\X1a" is not the same thing as "\x1a".
    >: >"\X" in a regex has its own special meaning.
    >:
    >: Actually my original code _is_ a lower-case x. The upper case in the above
    >was
    >: some stuff I was experimenting with. So I don't think this is the problem I'm
    >: having.
    >
    >Then I'm stumped. As far as that code goes, there should be no
    >difference between 5.6.1 and 5.8.0.
    >
    >The only reason I can see that the code would not strip \x1a
    >characters from the ends of lines is if the lines have no \x1a at
    >their ends.
    >
    >It's time for a more rigorous regression test and a hard look at your
    >data file.
    >
    >As a complete WAG, you might investigate binmode(), which became
    >significant on all platforms with Perl 5.8.0.
    >


    Hi Jay,

    Well that's very interesting.

    Yes the 1a's are there. This is a file copied from VMS to Windows over
    PATHworks (file sharing software spanning VMS and Windows). The 1a's are a (to
    us) well-known artifact of differences in the file systems on VMS and Windows.

    I check the 1a's by going into Emacs and then going to the bottom of the file. A
    whole bunch of ctrl-Z's (that aren't there when you open the file on VMS).
    Moreover I can use Emacs (on Windows) and open the file with hexl-find-file and
    indeed the ctrl-Z's correspond to 1a's.

    MAYBE A FACTOR: the 5.8 (Perl) that I'm trying to use is on Citrix servers
    (where various flavors of low-level funkiness can happen for programmers).

    Did an experiement. The one-liner still doesn't work on Citrix and with Perl
    5.8. However the following in a script _does work_ (!):

    local $^I = '.bu';
    local @ARGV = glob '*.TXT';
    my $prev_filename;
    while (<>) {
    if ($ARGV ne $prev_filename) {
    print "$ARGV\n";
    print STDOUT "$ARGV\n";
    }
    s/\x1a+$//g;
    print;
    $prev_filename = $ARGV;
    }

    (This adds printing the filename into the first line of the contents since there
    are about 900 of these files that I'm going to then import into iSilo and load
    onto my Palm).

    Obviously I'll use the script for the time being but it would be interesting to
    get to the bottom of why the one-liner (the direct command-line invocation)
    doesn't work.

    I, unfortunately, can't do Perl installs onto our Citrix servers. However I can
    probably ask the systems guys to put varying versions of Perl into some other
    location, leaving the environment variables pointing to the main location
    untouched).

    pat
    Patrick Flaherty, Aug 18, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. E. Robert Tisdale

    C 99 -- C++ 2003 diffs

    E. Robert Tisdale, Dec 14, 2003, in forum: C++
    Replies:
    15
    Views:
    650
    Rob Williscroft
    Dec 14, 2003
  2. E. Robert Tisdale

    C 99 -- C++ 2003 diffs

    E. Robert Tisdale, Dec 14, 2003, in forum: C Programming
    Replies:
    14
    Views:
    582
    Rob Williscroft
    Dec 14, 2003
  3. Andre van der Vlies

    Diffs for HTMLgen

    Andre van der Vlies, Sep 1, 2004, in forum: Python
    Replies:
    1
    Views:
    336
    Tim Roberts
    Sep 3, 2004
  4. guoliang

    what's diffs in the prog?

    guoliang, Mar 16, 2007, in forum: C Programming
    Replies:
    2
    Views:
    286
    Nick Keighley
    Mar 16, 2007
  5. Bruno Desthuilliers
    Replies:
    24
    Views:
    4,027
    Bruno Desthuilliers
    Oct 20, 2008
Loading...

Share This Page