How to get length of string? length() problems

M

Mitchua

Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most
documentation I found said "length() returns the number of chars" however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?

Thanks for your help.
Mitchua
 
M

Mitchua

Mitchua said:
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most
documentation I found said "length() returns the number of chars" however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?

Would something like sprintf("%20s", $sentence[0]) work to crop the sentence
to 20 characters?

--Mitchua
 
R

Rich

Mitchua said:
Mitchua said:
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most
documentation I found said "length() returns the number of chars"
however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?

Would something like sprintf("%20s", $sentence[0]) work to crop the
sentence to 20 characters?

--Mitchua

perldoc -f length:

"length EXPR
length Returns the length in characters of the value of EXPR..."


BUT length() returns the length in bytes when the bytes pragma is used, eg:

$x = chr(400);
print "Length is ", length $x, "\n"; # "Length is 1"
printf "Contents are %vd\n", $x; # "Contents are 400"
{
use bytes;
print "Length is ", length $x, "\n"; # "Length is 2"
printf "Contents are %vd\n", $x; # "Contents are 198.144"
}

perldoc bytes for more info.

Cheers,
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I
use length($sentence[0]) I get values in the hundreds for small
sentences. Most documentation I found said "length() returns the
number of chars" however, some said "length() returns the number of
bytes". To get the number of chars in this case, can I just divide by
8 or something?

Only if your characters are 8 bytes wide!

Do you have an example of input data that exhibits this length()
discrepancy? Can you include the output of something like:

print "[[[$string]]] ", length($string), "\n";

- --
Eric
$_ = reverse sort qw p ekca lre Js reh ts
p, $/.r, map $_.$", qw e p h tona e; print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPxTBu2PeouIeTNHoEQIJcgCeNrC1lDNYKBtdGsL5Bw0bxdIM2BMAnRAr
vTZutckih5KT81pj/63k5mDZ
=1LLa
-----END PGP SIGNATURE-----
 
M

Mitchua

Eric J. Roode said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I
use length($sentence[0]) I get values in the hundreds for small
sentences. Most documentation I found said "length() returns the
number of chars" however, some said "length() returns the number of
bytes". To get the number of chars in this case, can I just divide by
8 or something?

Only if your characters are 8 bytes wide!

Do you have an example of input data that exhibits this length()
discrepancy?

Checkout Rich's reply. My problem was that I was using length($sentence)
instead of length $sentence. Once I changed that, it was all good. Thanks
for the reply.

Mitchua
 
E

Eric J. Roode

Checkout Rich's reply. My problem was that I was using
length($sentence) instead of length $sentence. Once I changed that,
it was all good. Thanks for the reply.

Hmmm. I fail to see how that could possibly make a difference. But hey,
whatever works is good.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top