number of starting tabs

G

George Mpouras

I want to count the number of staring tabs of a string. Is there any better
way than

my $var = ' foo ';
(my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
print $tabs; # 2
 
J

J. Gleixner

I want to count the number of staring tabs of a string. Is there any
better way than

my $var = ' foo ';
(my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
print $tabs; # 2

Avoid doing a substitute/replace and just capture them:

my ( $tabs ) = $var =~ /^(\t+)/;
print "Number of tabs:", length( $tabs ), "\n";
 
R

Rainer Weikusat

George Mpouras said:
I want to count the number of staring tabs of a string. Is there any
better way than

my $var = ' foo ';
(my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;
print $tabs; # 2

The code below prints the number of leading tabs in the first
positional argument ($ARGV[0]), based on using the @-array
 
G

George Mpouras

$tabs = 0;
$ARGV[0] =~ /[^\t]/ and $tabs = $-[0];

print("$tabs starting tabs\n");


this is impressive.
 
C

charley%pulsenet.com

I want to count the number of staring tabs of a string. Is there any better

way than



my $var = ' foo ';

(my $tabs = $var) =~s/^(\t*).*$/$1/; $tabs = length $tabs;

print $tabs; # 2

In addition to the other ways, here is a way to do it also.

my $var = " four tabs";
my $tabs = $var =~ /^\t+/ ? $+[0] : 0;

Chris
 
R

Rainer Weikusat

George Mpouras said:
$tabs = 0;
$ARGV[0] =~ /[^\t]/ and $tabs = $-[0];

print("$tabs starting tabs\n");


this is impressive.

It shouldn't be. I happened to know that these variables existed,
hence, I searched pervar(1) for the first which suited the intended
purpose and used it. I'm not even sure if there's a technical reason
to prefer this over the @+/ ?: based approach shown in another
posting. In compiled code, assignments are cheap and conditional
branches aren't but this might not be true for Perl (and I haven't
checked it so far).
 
T

Tim McDaniel

my $tabs = () = $var =~ /\G\t/g;

In perl 5.8.8 and 5.14.2, you don't need \G. I don't understand \G,
but I think //g handles keeping matches distinct.

my $tabs = () = $var =~ /\t/g;

Does anyone find it weird that I giggled at seeing the suggestion?
Yeah it works and the semantics are defined, but it still looks so,
uh, ....
 
R

Rainer Weikusat

In perl 5.8.8 and 5.14.2, you don't need \G. I don't understand \G,
but I think //g handles keeping matches distinct.

my $tabs = () = $var =~ /\t/g;

The \G is needed because without it, the pattern will match the first
substring of tabs anywhere in $var, not just at the start. Even with
\G, it can be made to do this with using a suitable 'other global
match' in front of it:

---------------
$var = "aa\taa";

$var =~ /aa/g;
print $l = () = $var =~ /\G\t/g, "\n";
---------------

Meaning, this is not only nothing but a contorted way to make Perl to
an intermediate assignmen the matched part of the string implicitly
but additionally, the result depends on the context of the operation:
It doesn't return the length of the leading sequence of tabs but the
length of the sequence of tabs starting at the current matching
position which happens to be zero in this example. This is actually as
ineffcient as it is ugly (here supposed to mean 'a witticism existing
for the sake of itself' -- stuff like this may be entertaining when a
standup-comedian presents but it has no place in code).
 
R

Rainer Weikusat

[...]
---------------
$var = "aa\taa";

$var =~ /aa/g;
print $l = () = $var =~ /\G\t/g, "\n";
---------------
[...]

This is actually as ineffcient as it is ugly (here supposed to mean
'a witticism existing for the sake of itself' -- stuff like this may
be entertaining when a standup-comedian presents but it has no place
in code).

Quote which has to be quoted in this context:

Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as
possible, you are, by definition, not smart enough to debug
it.
[B. Kernighan, possibly paraphrased]
 
T

Tim McDaniel

The \G is needed because without it, the pattern will match the first
substring of tabs anywhere in $var, not just at the start.

Actually, it would match tabs anywhere in the string. I had
overlooked the requirement of "starting tabs" and not just "tabs".
My apologies.
 
R

Rainer Weikusat

[...]
For the specific case of a single-character string, /^\t*/ followed
by measuring the length of the matched section (in any of the ways
already posted) is probably a better solution.

If you know the sensible solution, as this very strongly suggests, why
don't you post it?

------------
print($ARGV[0] =~ /^\t*/ && $+[0], " leading tabs\n");
------------

Instead of search for the first character which is not a tab or
searching for a non-empty sequence of leading tabs and having to
special-case 'no leading tabs' somehow in both cases, returning the length of a
possibly empty sequence of leading tabs always yields the correct
value directly.
 
R

Rainer Weikusat

Ben Morrow said:
At least two people have already posted solutions equivalent to
that;

At least two similar solution where posted but this one is better, as
was explained in the text you deleted.
I was looking for something more general. I would prefer

my ($tabs) = $var =~ /^(\t*)/;
say length $tabs;

since I try to avoid @+, @- and $N where possible, but that's purely
a matter of taste.

That's not 'purely a matter of taste'. The following two pieces of
Perl code are equivalent insofar the final value of $tabs is
concerned:

-----
$tabs = $ARGV[0] =~ /^\t*/ && $+[0];
-----

-----
($tabs) = $ARGV[0] =~ /^(\t*)/;
$tabs = length($tabs);
-----

But they utilize different methods of calculating this value and while
the first requires perl (5.10.1) to perform nine basic operations, the
second needs fourteen, not the least because it reimplements a feature
the perl regex engine already provides in a relatively clumsy way in
Perl: There's no point in copying the substring or even just capturing
it if only the number of characters are supposed to be counted.

The 'matter of taste' doesn't matter here because this is not a work
of art. It is a set of instructions supposed to cause a computer to
perform a calculation, or, more correctly, it matters only of
aesthetic preferences trump technical considerations.
 
R

Rainer Weikusat

Rainer Weikusat said:
George Mpouras said:
$tabs = 0;
$ARGV[0] =~ /[^\t]/ and $tabs = $-[0];

print("$tabs starting tabs\n");


this is impressive.

It shouldn't be.

Especially since it is broken :): $tabs will be zero if the examined
string contains nothing but \t characters.
 
R

Rainer Weikusat

Ben Morrow said:
Quoth Rainer Weikusat <[email protected]>:
[...]
-----
$tabs = $ARGV[0] =~ /^\t*/ && $+[0];
-----

-----
($tabs) = $ARGV[0] =~ /^(\t*)/;
$tabs = length($tabs);
-----

But they utilize different methods of calculating this value and while
the first requires perl (5.10.1) to perform nine basic operations, the
second needs fourteen, not the least because it reimplements a feature
the perl regex engine already provides
[...]

As I have said many times before, if you are concerned about that level
of efficiency Perl is almost certainly the wrong language to be using in
the first place.

The statement "If you are concerned about the way perl executes
Perl-code, you shouldn't be using Perl" doesn't seem to make much
sense to me: I'm concerned about this precisely because I use Perl and
I'm (for hopefully obvious reasons) interested in being able to use it
for anything where technical concerns, execution speed of the code
being among them, don't require taking a much more time-consuming
'other route'. The perl VM is a tool I'm employing to solve technical
problems and the more I know about this tool the more effectively can
I use it.
The first rule of optimisation is 'Don't'.

'Optimization' is a mathematical term and it means 'finding an optimal
solution to a certain problem'. It doesn't really have a clearly
defined meaning when being applied to programming. Chances are that I
agree with your opinion for the definition of 'optimization' you
happen to have in mind. But that would be a different question.
All forms of writing, in natural or artificial languages, should be
considered a work of art at some level. (Incidentally, this is the
principle upon which the idea of copyright in computer programs is
based.)

The principle upon which the idea of 'copyright' (or 'patentability
of') computer programs is based is "There's serious money to be made
here and competion in the marketplact is bad for maximizing ROI." Pro
forma, it rests on the assumption that code would be overwhelmingly
the result of 'individual creative expression'. Expressions like the
first one quoted above rightfully cast some doubt on this
concept. They're more akin to mathematical formulas which can't be
copyrighted (or patented, at least in theory), because they are
discovered and not invented.
While material technical considerations are more important than
questions of aesthetics, in this case, unless the code in question is
part of an inner loop you have previously determined is causing a
significant performance problem, there is no *material* technical
difference between the two.

Unless the Titanic sank, there's no reason to assume it ever would.
 
C

C.DeRykus

Ben Morrow said:
Quoth Rainer Weikusat <[email protected]>:

At least two people have already posted solutions equivalent to



At least two similar solution where posted but this one is better, as

was explained in the text you deleted.


I was looking for something more general. I would prefer

my ($tabs) = $var =~ /^(\t*)/;
say length $tabs;

since I try to avoid @+, @- and $N where possible, but that's purely
a matter of taste.



That's not 'purely a matter of taste'. The following two pieces of

Perl code are equivalent insofar the final value of $tabs is

concerned:



-----

$tabs = $ARGV[0] =~ /^\t*/ && $+[0];

-----



-----

($tabs) = $ARGV[0] =~ /^(\t*)/;

$tabs = length($tabs);

-----



But they utilize different methods of calculating this value and while

the first requires perl (5.10.1) to perform nine basic operations, the

second needs fourteen, not the least because it reimplements a feature

the perl regex engine already provides in a relatively clumsy way in

Perl: There's no point in copying the substring or even just capturing

it if only the number of characters are supposed to be counted.
...

I think the copy could be avoided though:

$tabs++ while $var =~ /\G\t/g;
 
R

Rainer Weikusat

C.DeRykus said:
[...]
-----

$tabs = $ARGV[0] =~ /^\t*/ && $+[0];

-----

-----

($tabs) = $ARGV[0] =~ /^(\t*)/;

$tabs = length($tabs);

-----

But they utilize different methods of calculating this value and while
the first requires perl (5.10.1) to perform nine basic operations, the
second needs fourteen, not the least because it reimplements a feature
the perl regex engine already provides in a relatively clumsy way in
Perl: There's no point in copying the substring or even just capturing
it if only the number of characters are supposed to be counted.

I think the copy could be avoided though:

$tabs++ while $var =~ /\G\t/g;

Leading remark: One the machine where I tested this, the absolute
difference are in he 1E-7 range which implies that this is a
scientific problem of some interest (to certain people, at least :) but
each of the three variants is as suitable for any practical problem
where less than a couple of hundredthousands of inputs need to be
processed as the two others.

Regarding the last one: One should expect this to be distinctively
worse than the other two because even more 'algorithmic work' is
performed in Perl-code. And that was actually the result I got:
Averaged over for runs, the first ran at about 1.05 times the speed of
the second and at about 1.46 times the speed of the third
[second-to-third 1.39).

Test program
---------------
use Benchmark;

my $in = "\t\t\t\tbla";
my $t;

timethese(-5,
{
copy => sub {
($t) = $in =~ /^(\t*)/;
return length($t);
},

count => sub {
pos($in) = 0;

++$t while $in =~ /\G\t/g;
return $t;
},

calc => sub {
return $in =~ /^\t*/ && $+[0];
}});
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top