use binary operator on ascii text string

Sean.Dewis · Jun 23, 2006

Hi everyone

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
things: -

1) How to go through the string byte by byte and perform the OR 64 on
it
2) How to get the character equivalent back into the string in the
right place

For example the string is "abcdefg", by (I know it's not true)
performing OR 64 on each char I want "fghijkl" out.

Any idea's? Code examples would be appreciated.

TIA

Sean

krakle · Jun 23, 2006

[email protected] said:
Hi everyone

I'm pretty crap at perl,

And english

Any idea's? Code examples would be appreciated.

http://www.perl.com

and

#!/usr/bin/perl -w
# ...

David Squire · Jun 23, 2006

Hi everyone

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
things: -

1) How to go through the string byte by byte and perform the OR 64 on
it
2) How to get the character equivalent back into the string in the
right place

For example the string is "abcdefg", by (I know it's not true)
performing OR 64 on each char I want "fghijkl" out.

??? The characters in "abcdefg" already have bit 7 set on - as the must
since ord('a') is 97 > 64 (at least in ASCII, and many derived encodings).

Any idea's?

Learn how to use apostrophes correctly, for a start

Code examples would be appreciated.

Here's some code that does what I think you want, but as I have
described above, that is not actually that clear. I bet that there are
nicer ways to do this too, which others will most likely soon point out

----

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my @line_array = split //, $line;
my @new_line_array = map {$_ | chr(64)} @line_array;
my $new_line = join '', @new_line_array;
print "$new_line\n";
}

__DATA__
abcdefg
1234567687568
%^&*^*()&^)&^

----

Output:

abcdefg
qrstuvwvxwuvx
e^fj^jhif^if^

DS

David Squire · Jun 23, 2006

David said:
Hi everyone

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
things: -

1) How to go through the string byte by byte and perform the OR 64 on
it
2) How to get the character equivalent back into the string in the
right place

Click to expand...

[snip]

Here's some code that does what I think you want, but as I have
described above, that is not actually that clear. I bet that there are
nicer ways to do this too, which others will most likely soon point out

[snip]

.... such as this, which explicitly deals with bytes, rather than hoping
that that is what characters are in the default encoding:

----

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my @line_array = unpack 'C*', $line;
my @new_line_array = map {$_ | 64} @line_array;
my $new_line = pack 'C*', @new_line_array;
print "$new_line\n";
}

Sherm Pendley · Jun 23, 2006

David Squire said:
??? The characters in "abcdefg" already have bit 7 set on - as the
must since ord('a') is 97 > 64

??? The value of a bit is 2^position, starting at position 0.

2^7 = 128.

sherm--

David Squire · Jun 23, 2006

Sherm said:
??? The value of a bit is 2^position, starting at position 0.

2^7 = 128.

I started counting at 1. The OP stated that he was doing | 64, so the
bit reffered to was clear in any case.

DS

Sherm Pendley · Jun 23, 2006

David Squire said:
I started counting at 1.

Yes, obviously - that's why I posted the correction. Beginning at one is
incorrect in any base-n notation, not just binary. For any value of n, the
value of position x as n^x. That only works when the positions are numbered
starting with zero.

It's not a matter of personal preference or opinion, it's part of the math-
ematical definition of base-n notation.

sherm--

David Squire · Jun 23, 2006

Sherm said:
Yes, obviously - that's why I posted the correction. Beginning at one is
incorrect in any base-n notation, not just binary. For any value of n, the
value of position x as n^x. That only works when the positions are numbered
starting with zero.

It's not a matter of personal preference or opinion, it's part of the math-
ematical definition of base-n notation.

And entirely unrelated to helping with the OP's question. I can just as
easily say that the value at the nth position is x^(n-1), and then count
1st, 2nd, 3rd, etc.

You have again snipped context that made it clear that there was no
ambiguity in what I posted.

Choosing to start at 0 is indeed arbitrary - though of course you are
right about the most common convention.

DS

Sherm Pendley · Jun 23, 2006

David Squire said:
And entirely unrelated to helping with the OP's question.

Sorry. I guess I didn't realize I was getting paid for working at this
help desk and therefore obligated to answer questions.

I can just
as easily say that the value at the nth position is x^(n-1), and then
count 1st, 2nd, 3rd, etc.

The difference is that I'm talking about an established rule that's been
widely agreed upon for decades - and that's just within the realm of
computer science. You, on the other hand, are just making stuff up to
rationalize your mistakes.

You have again snipped context that made it clear that there was no
ambiguity in what I posted.

You're right - It was unambiguously wrong.

Choosing to start at 0 is indeed arbitrary

arbitrary, adj:
1. Determined by chance, whim, or impulse, and not by necessity, reason,
or principle: stopped at the first motel we passed, an arbitrary
choice.
2. Based on or subject to individual judgment or preference: The diet
imposes overall calorie limits, but daily menus are arbitrary.
3. Established by a court or judge rather than by a specific law or
statute: an arbitrary penalty.
4. Not limited by law; despotic: the arbitrary rule of a dictator.

The original decision to start at zero was indeed arbitrary. But that was a
long time ago. One could just as easily argue that the use of the Arabic
numerals 1 and 0 are arbitrary.

Now it's an established convention, and following it is not subject to
individual judgment or preference, assuming of course that you expect to
be understood.

sherm--

Mumia W. · Jun 23, 2006

Hi everyone

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value
[...]

Then you're pretty much there. Just use the substitution operator to
replace each character with the result of the code you have above, and
you're almost set.

You'll also have to change $c to the match variable $&, and the
substitution operator will need the 'g' option (global--go through the
entire string) and the 'e' option (execute code).

DJ Stunks · Jun 23, 2006

David said:
Learn how to use apostrophes correctly, for a start

hahaha. a pet peeve of mine too.

http://www.angryflower.com/bobsqu.gif

-jp

David Squire · Jun 24, 2006

David said:
David said:

Hi everyone

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
things: -

1) How to go through the string byte by byte and perform the OR 64 on
it
2) How to get the character equivalent back into the string in the
right place

Click to expand...

[snip]

Here's some code that does what I think you want, but as I have
described above, that is not actually that clear. I bet that there are
nicer ways to do this too, which others will most likely soon point
out

Click to expand...

[snip]

... such as this, which explicitly deals with bytes, rather than hoping
that that is what characters are in the default encoding:

----

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my @line_array = unpack 'C*', $line;
my @new_line_array = map {$_ | 64} @line_array;
my $new_line = pack 'C*', @new_line_array;
print "$new_line\n";
}

----

Well, I might as well give the last (?) in the series, following Mumia's
suggestion:

----

#!/usr/bin/perl
use strict;
use warnings;

my $mask = 64;
while (<DATA>) {
s/(.)/chr(ord($1) | $mask)/eg;
print;
}

__DATA__
abcdefg
1234567687568
%^&*^*()&^)&^

----

Output:

abcdefg
qrstuvwvxwuvx
e^fj^jhif^if^

.... though I still prefer the explicit byte-wise one above.

Cheers,

DS

Ben Morrow · Jun 24, 2006

Quoth David Squire said:
Well, I might as well give the last (?) in the series, following Mumia's
suggestion:

while (<DATA>) {
print $_ | (chr(64) x length);
}

Ben

John W. Krahn · Jun 24, 2006

I'm pretty crap at perl, so I'd appreciate so help from you guys.

I have a string value held in $body variable.

What I need to do is manipulate each individual character value in the
string with OR - "|" and then replace that character with the
character's new value.

I'm using chr(ord($c) | 64) to get the new value, but I'm stuck on two
things: -

1) How to go through the string byte by byte and perform the OR 64 on
it
2) How to get the character equivalent back into the string in the
right place

For example the string is "abcdefg", by (I know it's not true)
performing OR 64 on each char I want "fghijkl" out.

Any idea's? Code examples would be appreciated.

$body =~ s/(.)/ $1 | "\x40" /seg;

John

Peter J. Holzer · Jun 24, 2006

Sherm said:
Yes, obviously - that's why I posted the correction. Beginning at one is
incorrect in any base-n notation, not just binary. For any value of n, the
value of position x as n^x. That only works when the positions are numbered
starting with zero.

It's not a matter of personal preference or opinion, it's part of the math-
ematical definition of base-n notation.

But base-n notation is not the only notation in use. For example, the
RFCs describing the IP protocol (RFC 791 etc.) count bits from the MSB
to the LSB. They also start at zero, so if that convention is used on
bytes, bit 0 has the value 128, bit 1 has the value 64, etc. So David
could have said that the characters already have bit 1 set on and
confused the hell out of everyone

.

I have seen numbering from 1..n (from either direction) instead of
0..n-1, too, but I'm too lazy to look for a widely known example. (But
if you read mathematical papers you will notice that many prefer to use
indexes starting at 1, even if it makes the formulas (formulae?) more
complicated because they have to write (i-1) instead of i all the time.

I don't care much as long as it is consistent. What really annoys me are
people who start counting at zero but claim that "zeroth" is not an
English word, so they use "the seventh bit" and "bit 6" interchangeably.

hp

Jürgen Exner · Jun 24, 2006

Peter said:
I don't care much as long as it is consistent. What really annoys me
are people who start counting at zero but claim that "zeroth" is not
an English word, so they use "the seventh bit" and "bit 6"
interchangeably.

Sure as hell confusing.
But the first element in a Perl array happens to be the element with the
index 0.

Guess it's just something you have to get used to.

jue

Peter J. Holzer · Jun 24, 2006

David said:
... such as this, which explicitly deals with bytes, rather than hoping
that that is what characters are in the default encoding:

----

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my @line_array = unpack 'C*', $line;
my @new_line_array = map {$_ | 64} @line_array;
my $new_line = pack 'C*', @new_line_array;
print "$new_line\n";
}

I don't think this is a good idea, as it depends on whether $line is
stored as bytes or as UTF-8 internally, which shouldn't make any
semantic difference.

hp

David Squire · Jun 24, 2006

Peter said:
I don't think this is a good idea, as it depends on whether $line is
stored as bytes or as UTF-8 internally, which shouldn't make any
semantic difference.

It was not clear to me from the OP what the actual application was. I
guess I suspect that bit masking is more likely to be applied to bytes
of data than characters...

.... now, had he been masking with 32, I could imagine that this was a
hacky way to convert things to lowercase.

DS

Peter J. Holzer · Jun 24, 2006

David said:
It was not clear to me from the OP what the actual application was. I
guess I suspect that bit masking is more likely to be applied to bytes
of data than characters...

Yes, but I would still argue that the "bytes" in $line are what you get
by splitting it into "characters", not by using unpack 'C*'.

(In fact, I'm not sure if the behaviour of unpack 'C*' is correct - the
docs aren't clear and it does violate the principle of least
astonishment).

Consider this script:

#!/usr/bin/perl
use warnings;
use strict;

my $x = "\x{FC}";
utf8::upgrade($x);
my $y = "\x{FC}";

print "\$x and \$y are", ($x eq $y ? "" : " not"), " equal\n";

my @x = unpack 'C*', $x;

print "\$x is_utf8: ", utf8::is_utf8($x), "\n";
for (@x) { print "$_\n" }

my @y = unpack 'C*', $y;

print "\$y is_utf8: ", utf8::is_utf8($y), "\n";
for (@y) { print "$_\n" }
__END__

With perl, v5.8.4 built for i386-linux-thread-multi, it prints:

$x and $y are equal
$x is_utf8: 1
195
188
$y is_utf8:
252

So while perl thinks that $x and $y are equal, unpacking them with C*
yields different results. I don't think this should be the case, as it
can introduce hard-to-find bugs if a string of (0..255) is for some
reason stored as UTF-8.

hp

David Squire · Jun 24, 2006

Peter said:
Yes, but I would still argue that the "bytes" in $line are what you get
by splitting it into "characters", not by using unpack 'C*'.

Well, to me a byte is a byte is a byte: 8 bits. I agree that the OP's
example used a line of text as the example, so using unpack 'C*' is not
a good idea.

(In fact, I'm not sure if the behaviour of unpack 'C*' is correct - the
docs aren't clear and it does violate the principle of least
astonishment).

I don't think the docs are that unclear. In perlfunc#pack it says:

"C An unsigned char value. Only does bytes. See U for Unicode."

I agree that calling this a char, and using the mnemonic 'C' is
potentially confusing in today's world of multiple multi-byte character
sets.

So, if I want bytes, that's what I would use. Mind you, I would only be
doing this for something like a bit-based set representation, not when I
was playing with characters intended to represent text (which may or may
not be stored as bytes).

Regards,

DS

translating ascii to binary	5	Sep 17, 2008
Iframe link overlapping text	4	Jan 18, 2021
reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
String#split regex \W on non-ASCII text	1	Nov 9, 2010
Html data exchange help	0	Jan 2, 2020
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
DBD::Oracle, Unicode, non-UTF8-non-ASCII strings	0	Jul 23, 2009
[ask] String to Binary or oct	0	Dec 10, 2011

use binary operator on ascii text string

Sean.Dewis

krakle

David Squire

David Squire

Sherm Pendley

David Squire

Sherm Pendley

David Squire

Sherm Pendley

Mumia W.

DJ Stunks

David Squire

Ben Morrow

John W. Krahn

Peter J. Holzer

Jürgen Exner

Peter J. Holzer

David Squire

Peter J. Holzer

David Squire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads