Obtaining length of binary string

P

Perl User

Hi,
Here's my problem:

I read encrypted data from a server and save it in a variable. Now, I need
to send this back to the server along with a number that tells how many
bytes long the binary data is.

I've tried
$x = read_data_from_server($args);
$y = length $x;
send_data_to_server($x,$y);

I am not sure if the length function is the right way to count the number
of bytes in the string $x. Can someone please show me how to do this?

Thanks a lot!
 
B

Ben Morrow

Quoth Perl User said:
Hi,
Here's my problem:

I read encrypted data from a server and save it in a variable. Now, I need
to send this back to the server along with a number that tells how many
bytes long the binary data is.

I've tried
$x = read_data_from_server($args);
$y = length $x;
send_data_to_server($x,$y);

I am not sure if the length function is the right way to count the number
of bytes in the string $x. Can someone please show me how to do this?

It is, provided you've told perl that your data is binary not textual.
Make sure you use binmode on the socket filehandle.

Ben
 
B

Brian McCauley

Perl said:
I read encrypted data from a server and save it in a variable. Now, I
need to send this back to the server along with a number that tells how
many bytes long the binary data is.

I've tried
$x = read_data_from_server($args);
$y = length $x;
send_data_to_server($x,$y);

I am not sure if the length function is the right way to count the
number of bytes in the string $x.

It is right since you say it's a binary string. If it were a text
string then length() would return the number of characters.

If you want to force length() to count bytes even in text strings:

use bytes;
 
J

Joe Smith

Perl said:
I am not sure if the length function is the right way to count the
number of bytes in the string $x.

The length() function returns the number of characters in the string.
If you're not using Unicode, the number of bytes is the same as
the number of characters.
-Joe
 
S

Shawn Corey

Joe said:
The length() function returns the number of characters in the string.
If you're not using Unicode, the number of bytes is the same as
the number of characters.
-Joe

If you are using Perl 5.8+ and the string is is_utf8 (see perldoc
Encode) the length returns the number of characters, not the number of
bytes.

--- Shawn

#!/usr/bin/perl

use strict;
use warnings;

my $s = "\x{2022}"; # Unicode for a bullet character

print length($s), "\n";

__END__
 
J

Jürgen Exner

Joe said:
The length() function returns the number of characters in the string.
If you're not using Unicode, the number of bytes is the same as
the number of characters.

That's wrong. Any MBCS or DBCS uses more than one byte for a single
character.

jue
 
J

Joe Smith

Jürgen Exner said:
That's wrong. Any MBCS or DBCS uses more than one byte for a single
character.

I was not aware of any non-Unicode version of perl that does
multibyte character sets.
-Joe
 
J

Jürgen Exner

Joe said:
I was not aware of any non-Unicode version of perl that does
multibyte character sets.

I am not sure what a "unicode version of perl" would be, but if the OP sends
lets say Japanese characters in Windows-932 back to the server, then he is
using a DBCS where each character is two bytes long and which has nothing to
do with Unicode.

jue
 
A

Alan J. Flavell

I am not sure what a "unicode version of perl" would be,

Presumably one of the recent versions which have native Unicode
support... (5.6 or later, whatever)
but if the OP sends lets say Japanese characters in Windows-932 back
to the server, then he is using a DBCS where each character is two
bytes long and which has nothing to do with Unicode.

But then Perl itself has no concept of "character" in such a piece
of data, and cannot meaningfully be asked to count "characters".

It has to be up to the programmer to count their own characters, if
the only definition of "characters" is some specification external to
Perl.

Or else they use an encode layer, or explicit encoding function, to
convert the external character encoding into native Perl (unicode)
characters, and use Perl's own functions on the result.

Doesn't that seem reasonable?
 
J

Jürgen Exner

Alan J. Flavell wrote:
[Very reasonable arguments snipped]
Doesn't that seem reasonable?
Absolutely.

And this is the statement I still don't agree with.

jue
 
B

Ben Morrow

Quoth "Jürgen Exner said:
And this is the statement I still don't agree with.

OK, we're in a Perl group, so let's rewrite that as

If you're not using Perl's Unicode support, the number of bytes is the
same as the number of characters, according to Perl.

I am fairly sure that is what the OP meant by it, and also that you will
not disagree with it.

Ben
 
A

Alan J. Flavell

As I read it, Joe is using the term "characters" from Perl's point of
view.
And this is the statement I still don't agree with.

Well, if "characters" are defined externally to Perl, and Perl is only
given the binary bytes and not told how to interpret them, we can
hardly expect Perl to count characters for us. Seems reasonable to
me.

I think we're all saying the same thing - just in different terms.
 
P

Perl User

It is, provided you've told perl that your data is binary not textual.
Make sure you use binmode on the socket filehandle.

Thanks Ben and everyone else for helping me out! I realized that I was not
using the "use bytes" pragma at one particular place in the program, and
this was causing it to fail.
 
A

A. Sinan Unur

It is right since you say it's a binary string. If it were a text
string then length() would return the number of characters.

If you want to force length() to count bytes even in text strings:

use bytes;

On the other hand, the following might be better in that it would allow the
OP to selectively decide whether he wants characters or bytes to be
counted.

use strict;
use warnings;

use bytes ();

my $s = "\x{2022}";

print bytes::length($s), "\n", length($s);

Sinan
 
B

Ben Morrow

Quoth Perl User said:
Thanks Ben and everyone else for helping me out! I realized that I was not
using the "use bytes" pragma at one particular place in the program, and
this was causing it to fail.

Don't do that! It doesn't do what you think it does (well, it doesn't do
what you want here, anyway). Use binmode on the filehandle (which you
should be doing anyway) and length will give the byte-length.

Ben
 
B

Ben Morrow

Quoth "A. Sinan Unur said:
On the other hand, the following might be better in that it would allow the
OP to selectively decide whether he wants characters or bytes to be
counted.

use strict;
use warnings;

use bytes ();

my $s = "\x{2022}";

print bytes::length($s), "\n", length($s);

This will give you the length in bytes of "\x{2022}" represented in
perl's internal character encoding. THIS IS NOT A USEFUL VALUE. You
should not know or care how perl represents characters internally: if
you mark the data as binary (with binmode) then Perl will return the
byte-length.

IMNSHO the fact that binary and textual strings are not adequately
distinguished is a flaw in perl's Unicode implementation, although I can
see it was mostly done for backwards compatibility so it's excusable.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top