use binary operator on ascii text string

Discussion in 'Perl Misc' started by Sean.Dewis@gmail.com, Jun 23, 2006.

  1. Guest

    Hi everyone

    I'm pretty crap at perl, so I'd appreciate so help from you guys.

    I have a string value held in $body variable.

    What I need to do is manipulate each individual character value in the
    string with OR - "|" and then replace that character with the
    character's new value.

    I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
    things: -

    1) How to go through the string byte by byte and perform the OR 64 on
    it
    2) How to get the character equivalent back into the string in the
    right place

    For example the string is "abcdefg", by (I know it's not true)
    performing OR 64 on each char I want "fghijkl" out.

    Any idea's? Code examples would be appreciated.

    TIA

    Sean
    , Jun 23, 2006
    #1
    1. Advertising

  2. Guest

    wrote:
    > Hi everyone
    >
    > I'm pretty crap at perl,


    And english :)

    > Any idea's? Code examples would be appreciated.
    >


    http://www.perl.com

    and

    #!/usr/bin/perl -w
    # ...
    , Jun 23, 2006
    #2
    1. Advertising

  3. David Squire Guest

    wrote:
    > Hi everyone
    >
    > I'm pretty crap at perl, so I'd appreciate so help from you guys.
    >
    > I have a string value held in $body variable.
    >
    > What I need to do is manipulate each individual character value in the
    > string with OR - "|" and then replace that character with the
    > character's new value.
    >
    > I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
    > things: -
    >
    > 1) How to go through the string byte by byte and perform the OR 64 on
    > it
    > 2) How to get the character equivalent back into the string in the
    > right place
    >
    > For example the string is "abcdefg", by (I know it's not true)
    > performing OR 64 on each char I want "fghijkl" out.


    ??? The characters in "abcdefg" already have bit 7 set on - as the must
    since ord('a') is 97 > 64 (at least in ASCII, and many derived encodings).
    >
    > Any idea's?


    Learn how to use apostrophes correctly, for a start :)

    > Code examples would be appreciated.


    Here's some code that does what I think you want, but as I have
    described above, that is not actually that clear. I bet that there are
    nicer ways to do this too, which others will most likely soon point out :)

    ----

    #!/usr/bin/perl
    use strict;
    use warnings;

    while (my $line = <DATA>) {
    chomp $line;
    my @line_array = split //, $line;
    my @new_line_array = map {$_ | chr(64)} @line_array;
    my $new_line = join '', @new_line_array;
    print "$new_line\n";
    }


    __DATA__
    abcdefg
    1234567687568
    %^&*^*()&^)&^

    ----

    Output:

    abcdefg
    qrstuvwvxwuvx
    e^fj^jhif^if^


    DS
    David Squire, Jun 23, 2006
    #3
  4. David Squire Guest

    David Squire wrote:
    > wrote:
    >> Hi everyone
    >>
    >> I'm pretty crap at perl, so I'd appreciate so help from you guys.
    >>
    >> I have a string value held in $body variable.
    >>
    >> What I need to do is manipulate each individual character value in the
    >> string with OR - "|" and then replace that character with the
    >> character's new value.
    >>
    >> I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
    >> things: -
    >>
    >> 1) How to go through the string byte by byte and perform the OR 64 on
    >> it
    >> 2) How to get the character equivalent back into the string in the
    >> right place


    [snip]

    > Here's some code that does what I think you want, but as I have
    > described above, that is not actually that clear. I bet that there are
    > nicer ways to do this too, which others will most likely soon point out :)


    [snip]

    .... such as this, which explicitly deals with bytes, rather than hoping
    that that is what characters are in the default encoding:

    ----

    #!/usr/bin/perl
    use strict;
    use warnings;

    while (my $line = <DATA>) {
    chomp $line;
    my @line_array = unpack 'C*', $line;
    my @new_line_array = map {$_ | 64} @line_array;
    my $new_line = pack 'C*', @new_line_array;
    print "$new_line\n";
    }

    ----

    DS
    David Squire, Jun 23, 2006
    #4
  5. David Squire <> writes:

    > ??? The characters in "abcdefg" already have bit 7 set on - as the
    > must since ord('a') is 97 > 64


    ??? The value of a bit is 2^position, starting at position 0.

    2^7 = 128.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Jun 23, 2006
    #5
  6. David Squire Guest

    Sherm Pendley wrote:
    > David Squire <> writes:
    >
    >> ??? The characters in "abcdefg" already have bit 7 set on - as the
    >> must since ord('a') is 97 > 64

    >
    > ??? The value of a bit is 2^position, starting at position 0.
    >
    > 2^7 = 128.


    I started counting at 1. The OP stated that he was doing | 64, so the
    bit reffered to was clear in any case.

    DS
    David Squire, Jun 23, 2006
    #6
  7. David Squire <> writes:

    > Sherm Pendley wrote:
    >> David Squire <> writes:
    >>
    >>> ??? The characters in "abcdefg" already have bit 7 set on - as the
    >>> must since ord('a') is 97 > 64

    >> ??? The value of a bit is 2^position, starting at position 0.
    >> 2^7 = 128.

    >
    > I started counting at 1.


    Yes, obviously - that's why I posted the correction. Beginning at one is
    incorrect in any base-n notation, not just binary. For any value of n, the
    value of position x as n^x. That only works when the positions are numbered
    starting with zero.

    It's not a matter of personal preference or opinion, it's part of the math-
    ematical definition of base-n notation.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Jun 23, 2006
    #7
  8. David Squire Guest

    Sherm Pendley wrote:
    > David Squire <> writes:
    >
    >> Sherm Pendley wrote:
    >>> David Squire <> writes:
    >>>
    >>>> ??? The characters in "abcdefg" already have bit 7 set on - as the
    >>>> must since ord('a') is 97 > 64
    >>> ??? The value of a bit is 2^position, starting at position 0.
    >>> 2^7 = 128.

    >> I started counting at 1.

    >
    > Yes, obviously - that's why I posted the correction. Beginning at one is
    > incorrect in any base-n notation, not just binary. For any value of n, the
    > value of position x as n^x. That only works when the positions are numbered
    > starting with zero.
    >
    > It's not a matter of personal preference or opinion, it's part of the math-
    > ematical definition of base-n notation.


    And entirely unrelated to helping with the OP's question. I can just as
    easily say that the value at the nth position is x^(n-1), and then count
    1st, 2nd, 3rd, etc.

    You have again snipped context that made it clear that there was no
    ambiguity in what I posted.

    Choosing to start at 0 is indeed arbitrary - though of course you are
    right about the most common convention.


    DS
    David Squire, Jun 23, 2006
    #8
  9. David Squire <> writes:

    > And entirely unrelated to helping with the OP's question.


    Sorry. I guess I didn't realize I was getting paid for working at this
    help desk and therefore obligated to answer questions.

    > I can just
    > as easily say that the value at the nth position is x^(n-1), and then
    > count 1st, 2nd, 3rd, etc.


    The difference is that I'm talking about an established rule that's been
    widely agreed upon for decades - and that's just within the realm of
    computer science. You, on the other hand, are just making stuff up to
    rationalize your mistakes.

    > You have again snipped context that made it clear that there was no
    > ambiguity in what I posted.


    You're right - It was unambiguously wrong.

    > Choosing to start at 0 is indeed arbitrary


    arbitrary, adj:
    1. Determined by chance, whim, or impulse, and not by necessity, reason,
    or principle: stopped at the first motel we passed, an arbitrary
    choice.
    2. Based on or subject to individual judgment or preference: The diet
    imposes overall calorie limits, but daily menus are arbitrary.
    3. Established by a court or judge rather than by a specific law or
    statute: an arbitrary penalty.
    4. Not limited by law; despotic: the arbitrary rule of a dictator.

    The original decision to start at zero was indeed arbitrary. But that was a
    long time ago. One could just as easily argue that the use of the Arabic
    numerals 1 and 0 are arbitrary.

    Now it's an established convention, and following it is not subject to
    individual judgment or preference, assuming of course that you expect to
    be understood.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Jun 23, 2006
    #9
  10. Mumia W. Guest

    wrote:
    > Hi everyone
    >
    > I'm pretty crap at perl, so I'd appreciate so help from you guys.
    >
    > I have a string value held in $body variable.
    >
    > What I need to do is manipulate each individual character value in the
    > string with OR - "|" and then replace that character with the
    > character's new value.
    >
    > I'm using chr(ord($c) | 64) to get the new value
    > [...]


    Then you're pretty much there. Just use the substitution operator to
    replace each character with the result of the code you have above, and
    you're almost set.

    You'll also have to change $c to the match variable $&, and the
    substitution operator will need the 'g' option (global--go through the
    entire string) and the 'e' option (execute code).
    Mumia W., Jun 23, 2006
    #10
  11. DJ Stunks Guest

    David Squire wrote:
    > wrote:
    > > <snip>
    > > Any idea's?

    >
    > Learn how to use apostrophes correctly, for a start :)


    hahaha. a pet peeve of mine too.

    http://www.angryflower.com/bobsqu.gif

    -jp
    DJ Stunks, Jun 23, 2006
    #11
  12. David Squire Guest

    David Squire wrote:
    > David Squire wrote:
    >> wrote:
    >>> Hi everyone
    >>>
    >>> I'm pretty crap at perl, so I'd appreciate so help from you guys.
    >>>
    >>> I have a string value held in $body variable.
    >>>
    >>> What I need to do is manipulate each individual character value in the
    >>> string with OR - "|" and then replace that character with the
    >>> character's new value.
    >>>
    >>> I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
    >>> things: -
    >>>
    >>> 1) How to go through the string byte by byte and perform the OR 64 on
    >>> it
    >>> 2) How to get the character equivalent back into the string in the
    >>> right place

    >
    > [snip]
    >
    >> Here's some code that does what I think you want, but as I have
    >> described above, that is not actually that clear. I bet that there are
    >> nicer ways to do this too, which others will most likely soon point
    >> out :)

    >
    > [snip]
    >
    > ... such as this, which explicitly deals with bytes, rather than hoping
    > that that is what characters are in the default encoding:
    >
    > ----
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > while (my $line = <DATA>) {
    > chomp $line;
    > my @line_array = unpack 'C*', $line;
    > my @new_line_array = map {$_ | 64} @line_array;
    > my $new_line = pack 'C*', @new_line_array;
    > print "$new_line\n";
    > }
    >
    > ----


    Well, I might as well give the last (?) in the series, following Mumia's
    suggestion:

    ----

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $mask = 64;
    while (<DATA>) {
    s/(.)/chr(ord($1) | $mask)/eg;
    print;
    }


    __DATA__
    abcdefg
    1234567687568
    %^&*^*()&^)&^

    ----

    Output:

    abcdefg
    qrstuvwvxwuvx
    e^fj^jhif^if^


    .... though I still prefer the explicit byte-wise one above.


    Cheers,

    DS
    David Squire, Jun 24, 2006
    #12
  13. Ben Morrow Guest

    Quoth David Squire <>:
    >
    > Well, I might as well give the last (?) in the series, following Mumia's
    > suggestion:


    while (<DATA>) {
    print $_ | (chr(64) x length);
    }

    :)

    Ben

    --
    I must not fear. Fear is the mind-killer. I will face my fear and
    I will let it pass through me. When the fear is gone there will be
    nothing. Only I will remain.
    Frank Herbert, 'Dune'
    Ben Morrow, Jun 24, 2006
    #13
  14. wrote:
    >
    > I'm pretty crap at perl, so I'd appreciate so help from you guys.
    >
    > I have a string value held in $body variable.
    >
    > What I need to do is manipulate each individual character value in the
    > string with OR - "|" and then replace that character with the
    > character's new value.
    >
    > I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
    > things: -
    >
    > 1) How to go through the string byte by byte and perform the OR 64 on
    > it
    > 2) How to get the character equivalent back into the string in the
    > right place
    >
    > For example the string is "abcdefg", by (I know it's not true)
    > performing OR 64 on each char I want "fghijkl" out.
    >
    > Any idea's? Code examples would be appreciated.


    $body =~ s/(.)/ $1 | "\x40" /seg;



    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Jun 24, 2006
    #14
  15. Sherm Pendley wrote:
    > David Squire <> writes:
    >> Sherm Pendley wrote:
    >>> David Squire <> writes:
    >>>> ??? The characters in "abcdefg" already have bit 7 set on - as the
    >>>> must since ord('a') is 97 > 64
    >>> ??? The value of a bit is 2^position, starting at position 0.
    >>> 2^7 = 128.

    >>
    >> I started counting at 1.

    >
    > Yes, obviously - that's why I posted the correction. Beginning at one is
    > incorrect in any base-n notation, not just binary. For any value of n, the
    > value of position x as n^x. That only works when the positions are numbered
    > starting with zero.
    >
    > It's not a matter of personal preference or opinion, it's part of the math-
    > ematical definition of base-n notation.


    But base-n notation is not the only notation in use. For example, the
    RFCs describing the IP protocol (RFC 791 etc.) count bits from the MSB
    to the LSB. They also start at zero, so if that convention is used on
    bytes, bit 0 has the value 128, bit 1 has the value 64, etc. So David
    could have said that the characters already have bit 1 set on and
    confused the hell out of everyone :).

    I have seen numbering from 1..n (from either direction) instead of
    0..n-1, too, but I'm too lazy to look for a widely known example. (But
    if you read mathematical papers you will notice that many prefer to use
    indexes starting at 1, even if it makes the formulas (formulae?) more
    complicated because they have to write (i-1) instead of i all the time.

    I don't care much as long as it is consistent. What really annoys me are
    people who start counting at zero but claim that "zeroth" is not an
    English word, so they use "the seventh bit" and "bit 6" interchangeably.

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
    Peter J. Holzer, Jun 24, 2006
    #15
  16. Peter J. Holzer wrote:
    > I don't care much as long as it is consistent. What really annoys me
    > are people who start counting at zero but claim that "zeroth" is not
    > an English word, so they use "the seventh bit" and "bit 6"
    > interchangeably.


    Sure as hell confusing.
    But the first element in a Perl array happens to be the element with the
    index 0.

    Guess it's just something you have to get used to.

    jue
    Jürgen Exner, Jun 24, 2006
    #16
  17. David Squire wrote:
    > ... such as this, which explicitly deals with bytes, rather than hoping
    > that that is what characters are in the default encoding:
    >
    > ----
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > while (my $line = <DATA>) {
    > chomp $line;
    > my @line_array = unpack 'C*', $line;
    > my @new_line_array = map {$_ | 64} @line_array;
    > my $new_line = pack 'C*', @new_line_array;
    > print "$new_line\n";
    > }


    I don't think this is a good idea, as it depends on whether $line is
    stored as bytes or as UTF-8 internally, which shouldn't make any
    semantic difference.

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
    Peter J. Holzer, Jun 24, 2006
    #17
  18. David Squire Guest

    Peter J. Holzer wrote:
    > David Squire wrote:
    >> ... such as this, which explicitly deals with bytes, rather than hoping
    >> that that is what characters are in the default encoding:
    >>
    >> ----
    >>
    >> #!/usr/bin/perl
    >> use strict;
    >> use warnings;
    >>
    >> while (my $line = <DATA>) {
    >> chomp $line;
    >> my @line_array = unpack 'C*', $line;
    >> my @new_line_array = map {$_ | 64} @line_array;
    >> my $new_line = pack 'C*', @new_line_array;
    >> print "$new_line\n";
    >> }

    >
    > I don't think this is a good idea, as it depends on whether $line is
    > stored as bytes or as UTF-8 internally, which shouldn't make any
    > semantic difference.


    It was not clear to me from the OP what the actual application was. I
    guess I suspect that bit masking is more likely to be applied to bytes
    of data than characters...

    .... now, had he been masking with 32, I could imagine that this was a
    hacky way to convert things to lowercase.


    DS
    David Squire, Jun 24, 2006
    #18
  19. unpack 'C' (was: use binary operator on ascii text string)

    David Squire wrote:
    > Peter J. Holzer wrote:
    >> David Squire wrote:
    >>> my @line_array = unpack 'C*', $line;

    >>
    >> I don't think this is a good idea, as it depends on whether $line is
    >> stored as bytes or as UTF-8 internally, which shouldn't make any
    >> semantic difference.

    >
    > It was not clear to me from the OP what the actual application was. I
    > guess I suspect that bit masking is more likely to be applied to bytes
    > of data than characters...


    Yes, but I would still argue that the "bytes" in $line are what you get
    by splitting it into "characters", not by using unpack 'C*'.

    (In fact, I'm not sure if the behaviour of unpack 'C*' is correct - the
    docs aren't clear and it does violate the principle of least
    astonishment).

    Consider this script:

    #!/usr/bin/perl
    use warnings;
    use strict;

    my $x = "\x{FC}";
    utf8::upgrade($x);
    my $y = "\x{FC}";

    print "\$x and \$y are", ($x eq $y ? "" : " not"), " equal\n";

    my @x = unpack 'C*', $x;

    print "\$x is_utf8: ", utf8::is_utf8($x), "\n";
    for (@x) { print "$_\n" }

    my @y = unpack 'C*', $y;

    print "\$y is_utf8: ", utf8::is_utf8($y), "\n";
    for (@y) { print "$_\n" }
    __END__

    With perl, v5.8.4 built for i386-linux-thread-multi, it prints:

    $x and $y are equal
    $x is_utf8: 1
    195
    188
    $y is_utf8:
    252

    So while perl thinks that $x and $y are equal, unpacking them with C*
    yields different results. I don't think this should be the case, as it
    can introduce hard-to-find bugs if a string of (0..255) is for some
    reason stored as UTF-8.

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
    Peter J. Holzer, Jun 24, 2006
    #19
  20. David Squire Guest

    Re: unpack 'C'

    Peter J. Holzer wrote:
    > David Squire wrote:
    >> Peter J. Holzer wrote:
    >>> David Squire wrote:
    >>>> my @line_array = unpack 'C*', $line;
    >>> I don't think this is a good idea, as it depends on whether $line is
    >>> stored as bytes or as UTF-8 internally, which shouldn't make any
    >>> semantic difference.

    >> It was not clear to me from the OP what the actual application was. I
    >> guess I suspect that bit masking is more likely to be applied to bytes
    >> of data than characters...

    >
    > Yes, but I would still argue that the "bytes" in $line are what you get
    > by splitting it into "characters", not by using unpack 'C*'.


    Well, to me a byte is a byte is a byte: 8 bits. I agree that the OP's
    example used a line of text as the example, so using unpack 'C*' is not
    a good idea.

    > (In fact, I'm not sure if the behaviour of unpack 'C*' is correct - the
    > docs aren't clear and it does violate the principle of least
    > astonishment).


    I don't think the docs are that unclear. In perlfunc#pack it says:

    "C An unsigned char value. Only does bytes. See U for Unicode."

    I agree that calling this a char, and using the mnemonic 'C' is
    potentially confusing in today's world of multiple multi-byte character
    sets.

    So, if I want bytes, that's what I would use. Mind you, I would only be
    doing this for something like a bit-based set representation, not when I
    was playing with characters intended to represent text (which may or may
    not be stored as bytes).


    Regards,

    DS
    David Squire, Jun 24, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marc Schellens
    Replies:
    8
    Views:
    3,006
    John Harrison
    Jul 15, 2003
  2. Albert Tu
    Replies:
    2
    Views:
    646
    Bengt Richter
    Jan 25, 2005
  3. TOXiC
    Replies:
    5
    Views:
    1,247
    TOXiC
    Jan 31, 2007
  4. sathyashrayan

    ascii binary string into an integer

    sathyashrayan, Apr 13, 2005, in forum: C Programming
    Replies:
    5
    Views:
    539
    Walter Roberson
    Apr 14, 2005
  5. Claude Yih

    Binary or Ascii Text?

    Claude Yih, Mar 31, 2006, in forum: C Programming
    Replies:
    31
    Views:
    886
    Dik T. Winter
    Apr 17, 2006
Loading...

Share This Page