Question on the length of a Scalar

Discussion in 'Perl Misc' started by sln@netherlands.com, Oct 22, 2008.

  1. Guest

    I just have a simple question.

    When I call the length function on a scalar, is it read directly
    (ie: already know its length), or does it traverse the string
    counting its characters until it hits a nul terminator?

    As an example, which one of these would be a more efficient test?
    I'm not saying these constructs hold any practicality, its just to
    test the nature of length.

    my $str = 'Start';
    my $cnt = 1;

    # method 1
    while (length ($str) )
    {
    $str .= (sprintf "more %d", $cnt);
    $str = '' if ( $cnt % 10000000 == 0);
    $cnt++;
    }

    # method 2
    while (defined $str )
    {
    $str .= (sprintf "more %d", $cnt);
    $str = undef if ( $cnt % 10000000 == 0);
    $cnt++;
    }

    Thanks!
     
    , Oct 22, 2008
    #1
    1. Advertising

  2. Guest

    wrote:
    > I just have a simple question.
    >
    > When I call the length function on a scalar, is it read directly
    > (ie: already know its length), or does it traverse the string
    > counting its characters until it hits a nul terminator?


    The first one. It can't just traverse for a nul, because in Perl a nul is
    a legal character to be in the middle of a string.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Oct 22, 2008
    #2
    1. Advertising

  3. <> wrote:

    > When I call the length function on a scalar, is it read directly
    > (ie: already know its length),



    Yes.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Oct 22, 2008
    #3
  4. Tim Greer Guest

    wrote:

    > I just have a simple question.
    >
    > When I call the length function on a scalar, is it read directly
    > (ie: already know its length), or does it traverse the string
    > counting its characters until it hits a nul terminator?
    >


    In Perl,

    $string = "this and that and junk to ending here";

    and

    $string = "this\nthat\r\nother\lfand whatever else
    here
    and here
    and
    here";

    will see the string from "^this ... all the way to ... here$" regarding
    its total length.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Oct 22, 2008
    #4
  5. wrote:
    > wrote:
    >> When I call the length function on a scalar, is it read directly
    >> (ie: already know its length), or does it traverse the string
    >> counting its characters until it hits a nul terminator?


    Neither, nor.
    First the string representation of that scalar is computed, then the
    length of that string representation returned.

    jue
     
    Jürgen Exner, Oct 23, 2008
    #5
  6. Dr.Ruud Guest

    Jürgen Exner schreef:
    >> sln:


    >>> When I call the length function on a scalar, is it read directly
    >>> (ie: already know its length), or does it traverse the string
    >>> counting its characters until it hits a nul terminator?

    >
    > Neither, nor.
    > First the string representation of that scalar is computed, then the
    > length of that string representation returned.


    s/(?<=computed)/ or selected/

    (because the string representation can already be available)

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Oct 23, 2008
    #6
  7. On 2008-10-22 21:31, <> wrote:
    > wrote:
    >> I just have a simple question.
    >>
    >> When I call the length function on a scalar, is it read directly
    >> (ie: already know its length), or does it traverse the string
    >> counting its characters until it hits a nul terminator?

    >
    > The first one.


    That depends on the string. If it is a byte string, you are right. If it
    is a scalar string, the string must be traversed to count the
    characters. However, the result seems to be cached - it makes almost no
    difference whether I call length on a (long) string once or 1000 times
    (tested with perl 5.8.8 and 5.10.0).

    hp
     
    Peter J. Holzer, Oct 25, 2008
    #7
  8. On 2008-11-06 06:25, Joe Smith <> wrote:
    > Peter J. Holzer wrote:
    >> On 2008-10-22 21:31, <> wrote:
    >>> wrote:
    >>>> I just have a simple question.
    >>>>
    >>>> When I call the length function on a scalar, is it read directly
    >>>> (ie: already know its length), or does it traverse the string
    >>>> counting its characters until it hits a nul terminator?
    >>> The first one.

    >>
    >> That depends on the string. If it is a byte string, you are right.

    >
    > No. A scalar string in Perl has its length stored in the typeglob
    > as part of the variable's metadata.


    That's the length in bytes, not the length in characters. In the case of
    a byte string that's identical, but for a character string it is not (a
    character may be more than one byte).


    > perl
    > $string = "=" x 80 . "\000" . "_" x 80 . "\000";
    > die if length $string != 80+1+80+1;
    > print "The string, with embedded nulls, is ", length $string, " bytes.\n";
    > $string .= "\x{1234}";
    > print "Adding one UTF-8 character, the length is now ", length $string, ".\n";
    > ^D
    > The string, with embedded nulls, is 162 bytes.
    > Adding one UTF-8 character, the length is now 163.


    What is this example meant to demonstrate?


    >> If it is a scalar string, the string must be traversed to count the
    >> characters.


    Sorry for the typo. That should have read "character string", not
    "scalar string".


    > Nope. Perl knows exactly how many characters are in a string at all
    > times.
    >
    >> However, the result seems to be cached

    >
    > There is no "seems to be" about it. The size of a scalar is stored as
    > part of the typeglob where it can be accessed _without_ traversing the
    > string.


    Definitely not. I tested it, and the *first* time length is called on
    any string, it takes linear time - which is imho a clear indication that
    it does not know how many characters there are and needs to count them.
    However, on subsequent calls, the result is returned immediately, so the
    result of the previous call must be cached somewhere. I don't see it in
    the output of Devel::peek::Dump, so I guess it isn't stored in the SV.

    hp
     
    Peter J. Holzer, Nov 8, 2008
    #8
  9. On 2008-11-08 08:05, Peter J. Holzer <> wrote:
    > Definitely not. I tested it, and the *first* time length is called on
    > any string, it takes linear time - which is imho a clear indication that
    > it does not know how many characters there are and needs to count them.
    > However, on subsequent calls, the result is returned immediately, so the
    > result of the previous call must be cached somewhere. I don't see it in
    > the output of Devel::peek::Dump, so I guess it isn't stored in the SV.


    Correction: It is visible:

    "a" x 10 . "¤":

    SV = PV(0x9c3a730) at 0x9c56290
    REFCNT = 1
    FLAGS = (PADMY,POK,pPOK,UTF8)
    PV = 0x9cb79e8 "aaaaaaaaaa\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}"]
    CUR = 13
    LEN = 16

    Length in bytes (13) is correct. No length in characters.

    After calling length wie have a "magic" field which contains the length in
    characters:

    SV = PVMG(0x9c857c8) at 0x9c56290
    REFCNT = 1
    FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
    IV = 0
    NV = 0
    PV = 0x9cb79e8 "aaaaaaaaaa\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}"]
    CUR = 13
    LEN = 16
    MAGIC = 0x9c5b278
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = 11

    After adding another character, the magic field is still there but invalid:

    V = PVMG(0x88e57c8) at 0x88b6290
    REFCNT = 1
    FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
    IV = 0
    NV = 0
    PV = 0x89179e8 "aaaaaaaaaa\342\202\254\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}\x{20ac}"]
    CUR = 16
    LEN = 20
    MAGIC = 0x88bb278
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = -1

    hp
     
    Peter J. Holzer, Nov 8, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    5
    Views:
    2,749
    Eric J. Roode
    Jul 17, 2003
  2. =?Utf-8?B?SG96aQ==?=
    Replies:
    1
    Views:
    6,969
    Ken Cox [Microsoft MVP]
    Jun 2, 2004
  3. Sam
    Replies:
    3
    Views:
    14,111
    Karl Seguin
    Feb 17, 2005
  4. Clint Olsen
    Replies:
    6
    Views:
    364
    Jeff 'japhy' Pinyan
    Nov 13, 2003
  5. Mark

    Replace scalar in another scalar

    Mark, Jan 27, 2005, in forum: Perl Misc
    Replies:
    4
    Views:
    168
    Arndt Jonasson
    Jan 27, 2005
Loading...

Share This Page