Perl Regex - Hex bytes

Discussion in 'Perl Misc' started by JEB, Nov 26, 2003.

  1. JEB

    JEB Guest

    I am trying to use Perl to rescue some legacy word processor files.
    The files are ascii, except that some control codes use
    bytes in the $80-$ff ranges. I slurp the file into a string for editing.

    Regex can hand the bytes <\x7f, but fails to recognize bytes that are \x80
    or above.

    e.g.,

    /\x03//; works
    /\x81//; doesn't

    Since I thought the problem might be related the adoption of unicode, I've
    tried various things like;

    no encoding;
    use bytes;
    and various forms of encoding;
    etc.

    Nothing helped, but I may not have done it right.

    I'm using Perl 5.8+(whatever the lastest revision is) with Redhat Linux
    8.0.

    Is this something a Perl regex just can't handle?

    JEB
     
    JEB, Nov 26, 2003
    #1
    1. Advertising

  2. JEB wrote:
    >
    >/\x03//; works
    >/\x81//; doesn't


    You're giving too little information.
    Could you post a sample code that demonstrates the problem, along with
    your definition of "doesn't work" ? (warnings, error, expected result vs
    actual result)

    --
    Unpopular is not *NIX
     
    Rafael Garcia-Suarez, Nov 26, 2003
    #2
    1. Advertising

  3. On Wed, 26 Nov 2003, JEB wrote:

    > I'm using Perl 5.8+(whatever the lastest revision is)


    I suspect you're really using 5.8.0 (as opposed to 5.8.1).
    with Redhat Linux
    > 8.0.


    I think that's your clue. Look for utf-8 in your linux locale
    setting. It's confusing Perl 5.8.0 into using unicode mode.

    (And read other discussions and FAQs on this issue).

    Either change your locale setting to remove the reference
    to utf-8 (I'm sure this works); or upgrade to 5.8.1, where this
    coupling between locale and Perl default behaviour was found too
    confusing and has been removed (so I'm told).

    > Is this something a Perl regex just can't handle?


    Wrong diagnosis. Certainly it can handle it.
     
    Alan J. Flavell, Nov 26, 2003
    #3
  4. JEB

    JEB Guest

    "Alan J. Flavell" <> wrote in
    news:p:


    > I think that's your clue. Look for utf-8 in your linux locale
    > setting. It's confusing Perl 5.8.0 into using unicode mode.
    >
    > (And read other discussions and FAQs on this issue).
    >
    > Either change your locale setting to remove the reference
    > to utf-8 (I'm sure this works); or upgrade to 5.8.1, where this
    > coupling between locale and Perl default behaviour was found too
    > confusing and has been removed (so I'm told).




    THANKS for the idea and help.

    Exporting LC_ALL="en_US" in /etc/profile fixed the problem, though in a
    clumsy way. I hope it doesn't create problems elsewhere.

    JEB
     
    JEB, Nov 27, 2003
    #4
  5. JEB

    Ben Morrow Guest

    JEB <> wrote:
    > Exporting LC_ALL="en_US" in /etc/profile fixed the problem, though in a
    > clumsy way. I hope it doesn't create problems elsewhere.


    Installing 5.8.1 will also fix it, without the need to lose your
    Unicode locale. Alternatively, as a temporary fix, you could

    1. Make sure you have /usr/bin/perl5.8.0: if not, copy it from
    /usr/bin/perl
    2. Remove /usr/bin/perl
    3. Create a shell script /usr/bin/perl containing
    #!/bin/sh
    export LC_ALL="en_US.ISO8859-1"
    exec /usr/bin/perl5.8.0 "$@"

    Yes, I think this is a pretty evil hack, too, but if you have problems
    with losing the Unicode locale it may help.

    Ben

    --
    If you put all the prophets, | You'd have so much more reason
    Mystics and saints | Than ever was born
    In one room together, | Out of all of the conflicts of time.
    |----------------+---------------| The Levellers, 'Believers'
     
    Ben Morrow, Nov 27, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    10
    Views:
    6,402
    Neredbojias
    Aug 19, 2005
  2. Bengt Richter
    Replies:
    6
    Views:
    532
    Juha Autero
    Aug 19, 2003
  3. jack
    Replies:
    4
    Views:
    616
  4. tim

    hex string to hex value

    tim, Nov 22, 2005, in forum: Python
    Replies:
    8
    Views:
    19,133
  5. tim
    Replies:
    2
    Views:
    1,595
    Dennis Lee Bieber
    Nov 23, 2005
Loading...

Share This Page