number of starting tabs

Discussion in 'Perl Misc' started by George Mpouras, Aug 16, 2012.

  1. I want to count the number of staring tabs of a string. Is there any better
    way than

    my $var = ' foo ';
    (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    print $tabs; # 2
     
    George Mpouras, Aug 16, 2012
    #1
    1. Advertising

  2. George Mpouras

    J. Gleixner Guest

    On 08/16/12 15:07, George Mpouras wrote:
    > I want to count the number of staring tabs of a string. Is there any
    > better way than
    >
    > my $var = ' foo ';
    > (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    > print $tabs; # 2


    Avoid doing a substitute/replace and just capture them:

    my ( $tabs ) = $var =~ /^(\t+)/;
    print "Number of tabs:", length( $tabs ), "\n";
     
    J. Gleixner, Aug 16, 2012
    #2
    1. Advertising

  3. "George Mpouras" <> writes:
    > I want to count the number of staring tabs of a string. Is there any
    > better way than
    >
    > my $var = ' foo ';
    > (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    > print $tabs; # 2


    The code below prints the number of leading tabs in the first
    positional argument ($ARGV[0]), based on using the @-array

    ----------
    $tabs = 0;
    $ARGV[0] =~ /[^\t]/ and $tabs = $-[0];

    print("$tabs starting tabs\n");
     
    Rainer Weikusat, Aug 16, 2012
    #3
  4. $tabs = 0;
    $ARGV[0] =~ /[^\t]/ and $tabs = $-[0];

    print("$tabs starting tabs\n");


    this is impressive.
     
    George Mpouras, Aug 16, 2012
    #4
  5. George Mpouras

    charley% Guest

    On Thursday, August 16, 2012 4:07:03 PM UTC-4, George Mpouras wrote:
    > I want to count the number of staring tabs of a string. Is there any better
    >
    > way than
    >
    >
    >
    > my $var = ' foo ';
    >
    > (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    >
    > print $tabs; # 2


    In addition to the other ways, here is a way to do it also.

    my $var = " four tabs";
    my $tabs = $var =~ /^\t+/ ? $+[0] : 0;

    Chris
     
    charley%, Aug 17, 2012
    #5
  6. "George Mpouras" <> writes:
    > $tabs = 0;
    > $ARGV[0] =~ /[^\t]/ and $tabs = $-[0];
    >
    > print("$tabs starting tabs\n");
    >
    >
    > this is impressive.


    It shouldn't be. I happened to know that these variables existed,
    hence, I searched pervar(1) for the first which suited the intended
    purpose and used it. I'm not even sure if there's a technical reason
    to prefer this over the @+/ ?: based approach shown in another
    posting. In compiled code, assignments are cheap and conditional
    branches aren't but this might not be true for Perl (and I haven't
    checked it so far).
     
    Rainer Weikusat, Aug 17, 2012
    #6
  7. George Mpouras

    Tim McDaniel Guest

    In article <>,
    Ben Morrow <> wrote:
    >
    >Quoth "George Mpouras" <>:
    >> I want to count the number of staring tabs of a string. Is there any better
    >> way than
    >>
    >> my $var = ' foo ';
    >> (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    >> print $tabs; # 2

    >
    > my $tabs = () = $var =~ /\G\t/g;


    In perl 5.8.8 and 5.14.2, you don't need \G. I don't understand \G,
    but I think //g handles keeping matches distinct.

    my $tabs = () = $var =~ /\t/g;

    Does anyone find it weird that I giggled at seeing the suggestion?
    Yeah it works and the semantics are defined, but it still looks so,
    uh, ....

    --
    Tim McDaniel,
     
    Tim McDaniel, Aug 17, 2012
    #7
  8. (Tim McDaniel) writes:
    > In article <>,
    > Ben Morrow <> wrote:
    >>
    >>Quoth "George Mpouras" <>:
    >>> I want to count the number of staring tabs of a string. Is there any better
    >>> way than
    >>>
    >>> my $var = ' foo ';
    >>> (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    >>> print $tabs; # 2

    >>
    >> my $tabs = () = $var =~ /\G\t/g;

    >
    > In perl 5.8.8 and 5.14.2, you don't need \G. I don't understand \G,
    > but I think //g handles keeping matches distinct.
    >
    > my $tabs = () = $var =~ /\t/g;


    The \G is needed because without it, the pattern will match the first
    substring of tabs anywhere in $var, not just at the start. Even with
    \G, it can be made to do this with using a suitable 'other global
    match' in front of it:

    ---------------
    $var = "aa\taa";

    $var =~ /aa/g;
    print $l = () = $var =~ /\G\t/g, "\n";
    ---------------

    Meaning, this is not only nothing but a contorted way to make Perl to
    an intermediate assignmen the matched part of the string implicitly
    but additionally, the result depends on the context of the operation:
    It doesn't return the length of the leading sequence of tabs but the
    length of the sequence of tabs starting at the current matching
    position which happens to be zero in this example. This is actually as
    ineffcient as it is ugly (here supposed to mean 'a witticism existing
    for the sake of itself' -- stuff like this may be entertaining when a
    standup-comedian presents but it has no place in code).
     
    Rainer Weikusat, Aug 17, 2012
    #8
  9. Rainer Weikusat <> writes:

    [...]

    > ---------------
    > $var = "aa\taa";
    >
    > $var =~ /aa/g;
    > print $l = () = $var =~ /\G\t/g, "\n";
    > ---------------


    [...]

    > This is actually as ineffcient as it is ugly (here supposed to mean
    > 'a witticism existing for the sake of itself' -- stuff like this may
    > be entertaining when a standup-comedian presents but it has no place
    > in code).


    Quote which has to be quoted in this context:

    Debugging is twice as hard as writing the code in the first
    place. Therefore, if you write the code as cleverly as
    possible, you are, by definition, not smart enough to debug
    it.
    [B. Kernighan, possibly paraphrased]
     
    Rainer Weikusat, Aug 17, 2012
    #9
  10. George Mpouras

    Tim McDaniel Guest

    In article <>,
    Rainer Weikusat <> wrote:
    > (Tim McDaniel) writes:
    >> In article <>,
    >> Ben Morrow <> wrote:
    >>>
    >>>Quoth "George Mpouras" <>:
    >>>> I want to count the number of staring tabs of a string. Is there any better
    >>>> way than
    >>>>
    >>>> my $var = ' foo ';
    >>>> (my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
    >>>> print $tabs; # 2
    >>>
    >>> my $tabs = () = $var =~ /\G\t/g;

    >>
    >> In perl 5.8.8 and 5.14.2, you don't need \G. I don't understand \G,
    >> but I think //g handles keeping matches distinct.
    >>
    >> my $tabs = () = $var =~ /\t/g;

    >
    >The \G is needed because without it, the pattern will match the first
    >substring of tabs anywhere in $var, not just at the start.


    Actually, it would match tabs anywhere in the string. I had
    overlooked the requirement of "starting tabs" and not just "tabs".
    My apologies.

    --
    Tim McDaniel,
     
    Tim McDaniel, Aug 17, 2012
    #10
  11. Ben Morrow <> writes:

    [...]

    > For the specific case of a single-character string, /^\t*/ followed
    > by measuring the length of the matched section (in any of the ways
    > already posted) is probably a better solution.


    If you know the sensible solution, as this very strongly suggests, why
    don't you post it?

    ------------
    print($ARGV[0] =~ /^\t*/ && $+[0], " leading tabs\n");
    ------------

    Instead of search for the first character which is not a tab or
    searching for a non-empty sequence of leading tabs and having to
    special-case 'no leading tabs' somehow in both cases, returning the length of a
    possibly empty sequence of leading tabs always yields the correct
    value directly.
     
    Rainer Weikusat, Aug 17, 2012
    #11
  12. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:
    >> Ben Morrow <> writes:
    >>
    >> > For the specific case of a single-character string, /^\t*/ followed
    >> > by measuring the length of the matched section (in any of the ways
    >> > already posted) is probably a better solution.

    >>
    >> If you know the sensible solution, as this very strongly suggests, why
    >> don't you post it?
    >>
    >> ------------
    >> print($ARGV[0] =~ /^\t*/ && $+[0], " leading tabs\n");
    >> ------------

    >
    > At least two people have already posted solutions equivalent to
    > that;


    At least two similar solution where posted but this one is better, as
    was explained in the text you deleted.

    > I was looking for something more general. I would prefer
    >
    > my ($tabs) = $var =~ /^(\t*)/;
    > say length $tabs;
    >
    > since I try to avoid @+, @- and $N where possible, but that's purely
    > a matter of taste.


    That's not 'purely a matter of taste'. The following two pieces of
    Perl code are equivalent insofar the final value of $tabs is
    concerned:

    -----
    $tabs = $ARGV[0] =~ /^\t*/ && $+[0];
    -----

    -----
    ($tabs) = $ARGV[0] =~ /^(\t*)/;
    $tabs = length($tabs);
    -----

    But they utilize different methods of calculating this value and while
    the first requires perl (5.10.1) to perform nine basic operations, the
    second needs fourteen, not the least because it reimplements a feature
    the perl regex engine already provides in a relatively clumsy way in
    Perl: There's no point in copying the substring or even just capturing
    it if only the number of characters are supposed to be counted.

    The 'matter of taste' doesn't matter here because this is not a work
    of art. It is a set of instructions supposed to cause a computer to
    perform a calculation, or, more correctly, it matters only of
    aesthetic preferences trump technical considerations.
     
    Rainer Weikusat, Aug 17, 2012
    #12
  13. Rainer Weikusat <> writes:
    > "George Mpouras" <> writes:
    >> $tabs = 0;
    >> $ARGV[0] =~ /[^\t]/ and $tabs = $-[0];
    >>
    >> print("$tabs starting tabs\n");
    >>
    >>
    >> this is impressive.

    >
    > It shouldn't be.


    Especially since it is broken :): $tabs will be zero if the examined
    string contains nothing but \t characters.
     
    Rainer Weikusat, Aug 18, 2012
    #13
  14. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:


    [...]

    >> -----
    >> $tabs = $ARGV[0] =~ /^\t*/ && $+[0];
    >> -----
    >>
    >> -----
    >> ($tabs) = $ARGV[0] =~ /^(\t*)/;
    >> $tabs = length($tabs);
    >> -----
    >>
    >> But they utilize different methods of calculating this value and while
    >> the first requires perl (5.10.1) to perform nine basic operations, the
    >> second needs fourteen, not the least because it reimplements a feature
    >> the perl regex engine already provides


    [...]

    > As I have said many times before, if you are concerned about that level
    > of efficiency Perl is almost certainly the wrong language to be using in
    > the first place.


    The statement "If you are concerned about the way perl executes
    Perl-code, you shouldn't be using Perl" doesn't seem to make much
    sense to me: I'm concerned about this precisely because I use Perl and
    I'm (for hopefully obvious reasons) interested in being able to use it
    for anything where technical concerns, execution speed of the code
    being among them, don't require taking a much more time-consuming
    'other route'. The perl VM is a tool I'm employing to solve technical
    problems and the more I know about this tool the more effectively can
    I use it.

    > The first rule of optimisation is 'Don't'.


    'Optimization' is a mathematical term and it means 'finding an optimal
    solution to a certain problem'. It doesn't really have a clearly
    defined meaning when being applied to programming. Chances are that I
    agree with your opinion for the definition of 'optimization' you
    happen to have in mind. But that would be a different question.

    >> The 'matter of taste' doesn't matter here because this is not a work
    >> of art. It is a set of instructions supposed to cause a computer to
    >> perform a calculation, or, more correctly, it matters only of
    >> aesthetic preferences trump technical considerations.

    >
    > All forms of writing, in natural or artificial languages, should be
    > considered a work of art at some level. (Incidentally, this is the
    > principle upon which the idea of copyright in computer programs is
    > based.)


    The principle upon which the idea of 'copyright' (or 'patentability
    of') computer programs is based is "There's serious money to be made
    here and competion in the marketplact is bad for maximizing ROI." Pro
    forma, it rests on the assumption that code would be overwhelmingly
    the result of 'individual creative expression'. Expressions like the
    first one quoted above rightfully cast some doubt on this
    concept. They're more akin to mathematical formulas which can't be
    copyrighted (or patented, at least in theory), because they are
    discovered and not invented.

    > While material technical considerations are more important than
    > questions of aesthetics, in this case, unless the code in question is
    > part of an inner loop you have previously determined is causing a
    > significant performance problem, there is no *material* technical
    > difference between the two.


    Unless the Titanic sank, there's no reason to assume it ever would.
     
    Rainer Weikusat, Aug 19, 2012
    #14
  15. George Mpouras

    C.DeRykus Guest

    On Friday, August 17, 2012 1:33:07 PM UTC-7, Rainer Weikusat wrote:
    > Ben Morrow <> writes:
    >
    > > Quoth Rainer Weikusat <>:

    >
    > >> Ben Morrow <> writes:

    >
    > >>

    >
    > >> > For the specific case of a single-character string, /^\t*/ followed

    >
    > >> > by measuring the length of the matched section (in any of the ways

    >
    > >> > already posted) is probably a better solution.

    >
    > >>

    >
    > >> If you know the sensible solution, as this very strongly suggests, why

    >
    > >> don't you post it?

    >
    > >>

    >
    > >> ------------

    >
    > >> print($ARGV[0] =~ /^\t*/ && $+[0], " leading tabs\n");

    >
    > >> ------------

    >
    > >

    >
    > > At least two people have already posted solutions equivalent to

    >
    > > that;

    >
    >
    >
    > At least two similar solution where posted but this one is better, as
    >
    > was explained in the text you deleted.
    >
    >
    >
    > > I was looking for something more general. I would prefer

    >
    > >

    >
    > > my ($tabs) = $var =~ /^(\t*)/;

    >
    > > say length $tabs;

    >
    > >

    >
    > > since I try to avoid @+, @- and $N where possible, but that's purely

    >
    > > a matter of taste.

    >
    >
    >
    > That's not 'purely a matter of taste'. The following two pieces of
    >
    > Perl code are equivalent insofar the final value of $tabs is
    >
    > concerned:
    >
    >
    >
    > -----
    >
    > $tabs = $ARGV[0] =~ /^\t*/ && $+[0];
    >
    > -----
    >
    >
    >
    > -----
    >
    > ($tabs) = $ARGV[0] =~ /^(\t*)/;
    >
    > $tabs = length($tabs);
    >
    > -----
    >
    >
    >
    > But they utilize different methods of calculating this value and while
    >
    > the first requires perl (5.10.1) to perform nine basic operations, the
    >
    > second needs fourteen, not the least because it reimplements a feature
    >
    > the perl regex engine already provides in a relatively clumsy way in
    >
    > Perl: There's no point in copying the substring or even just capturing
    >
    > it if only the number of characters are supposed to be counted.
    > ...


    I think the copy could be avoided though:

    $tabs++ while $var =~ /\G\t/g;

    --
    Charles DeRykus
     
    C.DeRykus, Aug 21, 2012
    #15
  16. "C.DeRykus" <> writes:
    > On Friday, August 17, 2012 1:33:07 PM UTC-7, Rainer Weikusat wrote:
    >> > Quoth Rainer Weikusat <>:


    [...]

    >> -----
    >>
    >> $tabs = $ARGV[0] =~ /^\t*/ && $+[0];
    >>
    >> -----
    >>
    >> -----
    >>
    >> ($tabs) = $ARGV[0] =~ /^(\t*)/;
    >>
    >> $tabs = length($tabs);
    >>
    >> -----
    >>
    >> But they utilize different methods of calculating this value and while
    >> the first requires perl (5.10.1) to perform nine basic operations, the
    >> second needs fourteen, not the least because it reimplements a feature
    >> the perl regex engine already provides in a relatively clumsy way in
    >> Perl: There's no point in copying the substring or even just capturing
    >> it if only the number of characters are supposed to be counted.

    >
    > I think the copy could be avoided though:
    >
    > $tabs++ while $var =~ /\G\t/g;


    Leading remark: One the machine where I tested this, the absolute
    difference are in he 1E-7 range which implies that this is a
    scientific problem of some interest (to certain people, at least :) but
    each of the three variants is as suitable for any practical problem
    where less than a couple of hundredthousands of inputs need to be
    processed as the two others.

    Regarding the last one: One should expect this to be distinctively
    worse than the other two because even more 'algorithmic work' is
    performed in Perl-code. And that was actually the result I got:
    Averaged over for runs, the first ran at about 1.05 times the speed of
    the second and at about 1.46 times the speed of the third
    [second-to-third 1.39).

    Test program
    ---------------
    use Benchmark;

    my $in = "\t\t\t\tbla";
    my $t;

    timethese(-5,
    {
    copy => sub {
    ($t) = $in =~ /^(\t*)/;
    return length($t);
    },

    count => sub {
    pos($in) = 0;

    ++$t while $in =~ /\G\t/g;
    return $t;
    },

    calc => sub {
    return $in =~ /^\t*/ && $+[0];
    }});
     
    Rainer Weikusat, Aug 22, 2012
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. windandwaves

    starting an image name with a number

    windandwaves, Oct 1, 2005, in forum: HTML
    Replies:
    13
    Views:
    621
    Neredbojias
    Oct 2, 2005
  2. qwweeeit
    Replies:
    2
    Views:
    649
    qwweeeit
    Dec 14, 2005
  3. rantingrick

    Tabs -vs- Spaces: Tabs should have won.

    rantingrick, Jul 16, 2011, in forum: Python
    Replies:
    95
    Views:
    1,836
    Roy Smith
    Jul 19, 2011
  4. John Kopanas
    Replies:
    2
    Views:
    294
    Gregory Brown
    Jan 29, 2007
  5. Nick Bo
    Replies:
    6
    Views:
    123
    Mark Thomas
    Sep 29, 2008
Loading...

Share This Page