RFC: utils.pm

T

Tore Aursand

Hi!

Please excuse my English. It's only my secondary language. :)

Over the time as a Perl programmer, I've gathered small bits of code here
and there. The last few weeks I've tried to structure all this code and
put in a module which I - so far - call 'utils.pm'. Thus, in my scripts,
I tend to begin like this:

#!/usr/bin/perl
#
use strict;
use warnings;
use utils;

The name could - of course - have been more propriate (if possible), but
to me 'utils' seems like a good one. :)

The main question is: Although I know that the Perl core should stick to
a minimum, why isn't a module like this included? It covers most of the
traps that newbies encounters, and it offers experienced programmers to be
even more lazy when programming.

Information about the module's functions:

STRINGS
ltrim( $value ) - Removes leading whitespace from a string.
rtrim( $value ) - Removes trailing whitespace from a string.
trim( $value ) - Combines ltrim() and rtrim().
squish( $value ) - Removes (all the) whitespace from a string.
split_csv( $value ) - Easy CSV splitting. (*)
random_string( $length ) - Generates a random string $length
characters long.

CASTING
as_string( $value, [$default] ) - Always returns a defined value,
optionally $default if $value
isn't defined.
as_int( $value ) - Always returns $value as an integer.
as_decimal( $value, [$decimals] ) - Always returns $value as a
decimal number with $decimals
numbers after the decimal point.
as_boolean( $value ) - Always returns $value as a boolena value (ie.
TRUE/1 or FALSE/0).
as_date( $value ) - Always returns $value as a date (YYYY-MM-DD).
as_time( $value ) - Always returns $value as a time (HH:MM:SS).
as_datetime( $value ) - Always returns $value as a datetime (which
means combining as_date() and as_time()).

VALIDATION
Each of the CASTING functions also have a is_* function, which returns
TRUE/1 or FALSE/0 depending on wether the input argument conforms to
the datatype.

NUMBERS
round( $value ) - Rounds a number to the nearest integer.
random_number( $min, $max ) - Returns a random number in the range
$min to $max.
format_number( $value, $separator ) - Formats a number with a given
separator; 1234 becomes 1,234.

ARRAYS
unique( $arrayref ) - Returns only the unique elements in $array.
intersection( $arrayref1, $arrayref2 ) - Computes the intersection
of two array references.
union( $arrayref1, $arrayref2 ) - Computes the union of two array
references.
shuffle( $arrayref ) - Returns the elements shuffled randomly.

DATES
now_year()
now_month()
now_day() - These three returns current the year, month and day.
now_date() - Combines the three above.
now_hour()
now_minute()
now_seconds() - These three returns the current hour, minute and
second.
now_time() - Combines the three above.
now() - Combines now_date() and now_time(); 'YYYY-MM-DD hh:mm:ss'.
is_leap_year( $year ) - Returns TRUE/1 if $year is a leap year.
day_of_week( $date ) - Returns the day of the week for $date.
day_of_year( $date ) - Returns the day of the year for $date.
days_in_month( $year, $month ) - Returns the number of days in the
given year/month.
days_in_year( $year ) - Returns 365 or 366 depending on wether the
year is a leap year or not.

(*) These methods could possibly generate warnings/info about standalone
modules which one should preferrably use.

Well. These are the majority of the functions I've gathered. There are a
few more, but never mind them for now. All the functions are written in
pure Perl, so no modules - of course - or "anything else" is required.

I find these functions very valuable when programming. Whenever I need to
check if a date from the user is valid, I just:

unless ( is_date($date) ) {
# Error
}

I could also combine them. Let's say that the user has misunderstood the
"input the current date" question and also entered the current time;

my $date = '2004-02-11 12:34:56'; # example input
$date = as_date( $date ); # $date is now '2004-02-11'
unless ( is_date($date) ) {
# Error
}

Stupid examples, but...

This is a RFC - Request For Comment - so I want to hear what you all
thing? Should this be a "swiss army" module in the Perl core? I think
so.

Just my 0.5 cents. :)
 
T

Tassilo v. Parseval

Also sprach Tore Aursand:
Please excuse my English. It's only my secondary language. :)

Over the time as a Perl programmer, I've gathered small bits of code here
and there. The last few weeks I've tried to structure all this code and
put in a module which I - so far - call 'utils.pm'. Thus, in my scripts,
I tend to begin like this:

#!/usr/bin/perl
#
use strict;
use warnings;
use utils;

The name could - of course - have been more propriate (if possible), but
to me 'utils' seems like a good one. :)

The main question is: Although I know that the Perl core should stick to
a minimum, why isn't a module like this included? It covers most of the
traps that newbies encounters, and it offers experienced programmers to be
even more lazy when programming.

I think it defeats a little bit the purpose of a Perl distribution as an
all-purpose means. Some of the functions you suggest would be totally
meaningless to me. Mostly the date-related functions. Within the past
three years I didn't have to deal with dates more than, say, three times
or so. I'd be slightly taken aback to find them in the Perl core while
at the same time not finding the functionality that I use on a daily
base and for which I had to write my own code/use a module (such as
playing back MP3s).

Also, I'd find the availability of such micro-functions in the core
offending. Maybe it's because I suffer more from hubris than others, but
I really like to reinvent these small wheels in my programs each time.
No matter how insignificant the task can be, there's always room for
slight variations in the eventual solution.

Not to forget that I think that these utility functions would have a bad
effect on beginners. I learnt a lot of my Perl because I had to ltrim
whitespaces myself. First I used a combination of 'substr' and 'index',
('rindex' for an rtrim). Then I learnt about regexpes. And so on.

The real reasons though for not including them is indeed the size
argument. A recent Perl distribution is really big and the amount of
code in it (that has to be maintained) scary.

Finally, this would make Perl look so PHPish. It hurts on the eyes. Perl
does not have something like a class library such as Ruby. Since in Ruby
everything is an object, you automatically have namespaces such as
String or so. That means that you don't have to add string-specific
functions into the CORE:: namespace and so you keep the amount of
pollution low.

Tassilo
 
T

Tore Aursand

[...]

The main question is: Although I know that the Perl core should stick
to a minimum, why isn't a module like this included? It covers most of
the traps that newbies encounters, and it offers experienced
programmers to be even more lazy when programming.
I think it defeats a little bit the purpose of a Perl distribution as an
all-purpose means. Some of the functions you suggest would be totally
meaningless to me. Mostly the date-related functions. Within the past
three years I didn't have to deal with dates more than, say, three times
or so. I'd be slightly taken aback to find them in the Perl core while
at the same time not finding the functionality that I use on a daily
base and for which I had to write my own code/use a module (such as
playing back MP3s).

I guess we all want modules which solves the current task we're working
on, right? :)
Also, I'd find the availability of such micro-functions in the core
offending. Maybe it's because I suffer more from hubris than others, but
I really like to reinvent these small wheels in my programs each time.

Really? It's the complete opposite for me: I don't want to reinvent the
wheel each time, and I certainly don't want to include heaps of modules
for doing small, repetetive things.

I am sure that there are modules out there which, combined, does
everything that I listed, but I don't want to use _many_ modules to do
such small things. I want common functions in one module.

I understand that "common functions" might be subjective, but I also think
that most of the functions I listed is something that most of the Perl
programmers do "by hand" - or reinvent - almost every day.
Not to forget that I think that these utility functions would have a bad
effect on beginners.

In one way, yes. In another way, no; A lot of the questions asked in
this particular newsgroup are questions that already answered in the FAQ.
I think the major reason why all these questions pops up now and then (far
too often, if you ask me), is that the "unavailability" of simple
functions to solve the problems.

Why do actually beginners - or experienced programmers, for that sake -
need to know how to do everything (...) manually? I don't care how trim()
works, as long as it works. I don't care how the index() function in Perl
is implemented, as long as it works.
The real reasons though for not including them is indeed the size
argument. A recent Perl distribution is really big and the amount of
code in it (that has to be maintained) scary.

I agree on this one. The core size of Perl should always be a minimum,
but I don't think that a module like this one would have much impact on
the size. :)
Finally, this would make Perl look so PHPish. It hurts on the eyes.

Haha. :)
 
B

Ben Morrow

Tore Aursand said:
Over the time as a Perl programmer, I've gathered small bits of code here
and there. The last few weeks I've tried to structure all this code and
put in a module which I - so far - call 'utils.pm'. Thus, in my scripts,
I tend to begin like this:

#!/usr/bin/perl
#
use strict;
use warnings;
use utils;

The name could - of course - have been more propriate (if possible), but
to me 'utils' seems like a good one. :)

Lowercase top-level names are reserved for pragmata. Call it Utils. It
is usually considered good practice not to export stuff unless asked.
The main question is: Although I know that the Perl core should stick to
a minimum, why isn't a module like this included? It covers most of the
traps that newbies encounters, and it offers experienced programmers to be
even more lazy when programming.

Information about the module's functions:

STRINGS
ltrim( $value ) - Removes leading whitespace from a string.

$value =~ s/^\s*//;
rtrim( $value ) - Removes trailing whitespace from a string.

$value =~ s/\s*$//;
trim( $value ) - Combines ltrim() and rtrim().

$value =~ s/^\s*(.*?)\s*$/$1/;
or some such.
squish( $value ) - Removes (all the) whitespace from a string.

$values =~ s/\s+//g;
split_csv( $value ) - Easy CSV splitting. (*)

As you say, there are modules to do this. Or

@values = split /,/, $values;
random_string( $length ) - Generates a random string $length
characters long.

This is a little harder... also *much* less useful.
CASTING
as_string( $value, [$default] ) - Always returns a defined value,
optionally $default if $value
isn't defined.

Eh what? You don't *need* to cast in Perl.

$value || $default

or defined($value) ? $value : $default

or (Perl6 :) $value // $default
as_int( $value ) - Always returns $value as an integer.

err.. int($value)
Or POSIX::floor, or POSIX::ceil.
as_decimal( $value, [$decimals] ) - Always returns $value as a
decimal number with $decimals
numbers after the decimal point.
sprintf

as_boolean( $value ) - Always returns $value as a boolena value (ie.
TRUE/1 or FALSE/0).
!!$value

as_date( $value ) - Always returns $value as a date (YYYY-MM-DD).
as_time( $value ) - Always returns $value as a time (HH:MM:SS).
as_datetime( $value ) - Always returns $value as a datetime (which
means combining as_date() and as_time()).
POSIX::strftime

VALIDATION
Each of the CASTING functions also have a is_* function, which returns
TRUE/1 or FALSE/0 depending on wether the input argument conforms to
the datatype.

Ummm... everything is (can be) a string.
Everything is either true or false.
is_int is simply $value =~ /^\d+$/.
Dates can and should be handled by your favourite date-time parsing
module: the code is non-trivial, so should be reused.
I can't think offhand how I'd do is_decimal, probably because I've
never had occasion to.
NUMBERS
round( $value ) - Rounds a number to the nearest integer.

int($value + 0.5) is usually good enough.
random_number( $min, $max ) - Returns a random number in the range
$min to $max.

(rand * ($max - $min)) + $min
format_number( $value, $separator ) - Formats a number with a given
separator; 1234 becomes 1,234.

Less trivial... is probably better done by something locale-aware.
ARRAYS
unique( $arrayref ) - Returns only the unique elements in $array.
intersection( $arrayref1, $arrayref2 ) - Computes the intersection
of two array references.
union( $arrayref1, $arrayref2 ) - Computes the union of two array
references.

These should all be in a module called Set::Util... feel free to write
it.
shuffle( $arrayref ) - Returns the elements shuffled randomly.
List::Util::shuffle

DATES
now_year()
now_month()
now_day() - These three returns current the year, month and day.
now_date() - Combines the three above.
now_hour()
now_minute()
now_seconds() - These three returns the current hour, minute and
second.
now_time() - Combines the three above.
now() - Combines now_date() and now_time(); 'YYYY-MM-DD hh:mm:ss'.

POSIX::strftime again, with 'localtime time'.
is_leap_year( $year ) - Returns TRUE/1 if $year is a leap year.
day_of_week( $date ) - Returns the day of the week for $date.
day_of_year( $date ) - Returns the day of the year for $date.
days_in_month( $year, $month ) - Returns the number of days in the
given year/month.
days_in_year( $year ) - Returns 365 or 366 depending on wether the
year is a leap year or not.

I'm *quite* sure one of the many date-time modules handles this already.
Well. These are the majority of the functions I've gathered. There are a
few more, but never mind them for now. All the functions are written in
pure Perl, so no modules - of course - or "anything else" is
required.

This is *not* an advantage. Code should be reused where possible. It
is entirely sensible to have a module (say, Local::Utils) that loads
half-a-dozen modules you commonly use and exports a few trivial
functions; this will naturally reflect your own use of Perl and not be
of any use to anyone else.

Ben
 
P

Peter Hickman

Nothing in this package does anything that can't be done in Perl already
or in the exceptional cases isn't available with greater functionality
in existing, function specific, packages (ie Date::Calc).

In truth is looks like training wheels for basic programmers.

A perl programmer will know what $value =~ s/^\s*//; will do but unless
they come from a basic background then they will not know what
ltrim($value) does and having gone to the trouble of learning all your
wrappers will have learnt nothing new.
 
T

Tassilo v. Parseval

Also sprach Tore Aursand:
[...]

The main question is: Although I know that the Perl core should stick
to a minimum, why isn't a module like this included? It covers most of
the traps that newbies encounters, and it offers experienced
programmers to be even more lazy when programming.
Also, I'd find the availability of such micro-functions in the core
offending. Maybe it's because I suffer more from hubris than others, but
I really like to reinvent these small wheels in my programs each time.

Really? It's the complete opposite for me: I don't want to reinvent the
wheel each time, and I certainly don't want to include heaps of modules
for doing small, repetetive things.

Yes, this was a purely subjective point of mine. Think of a bunch of
people (the perl porters) and each one of them has a different set of
objections.

Actually, the way decisions are made among the perl developers is quite
interesting. There is no ruling authority who eventually says that
things are going to be done this or that way (maybe the pumpking of a
branch, but he seldom uses his authority). There are no polls either.
It's just informal discussion. After a while (when all things have been
said against or in favour something) an unspoken and silent agreement
has been reached. In 90% of the cases I was eventually convinced that
the decision that was made was the right one even when I initially had a
different opinion on the matter. It's almost mystic. ;-)
I agree on this one. The core size of Perl should always be a minimum,
but I don't think that a module like this one would have much impact on
the size. :)

Haha, you are too optimistic, I am afraid. :) If such functions ever
made it into the core, clearly they would be heavily optimized. In the
end, they'd be written in C (note that there is no core function that is
written in pure Perl). And once you are in the realms of C, seemingly
trivial things get annoying. For instance, every string function would
have to be utf8-safe. And so on and so forth.

Tassilo
 
U

Uri Guttman

TA> STRINGS
TA> ltrim( $value ) - Removes leading whitespace from a string.
TA> rtrim( $value ) - Removes trailing whitespace from a string.
TA> trim( $value ) - Combines ltrim() and rtrim().

all too trivial for subs.

TA> squish( $value ) - Removes (all the) whitespace from a string.

too trivial for sub. tr/\n\r\t //d;

TA> split_csv( $value ) - Easy CSV splitting. (*)
TA> random_string( $length ) - Generates a random string $length
TA> characters long.

what char set does this use?


TA> CASTING
TA> as_string( $value, [$default] ) - Always returns a defined value,
TA> optionally $default if $value
TA> isn't defined.
TA> as_int( $value ) - Always returns $value as an integer.

who needs casting in perl?

TA> as_decimal( $value, [$decimals] ) - Always returns $value as a
TA> decimal number with $decimals
TA> numbers after the decimal point.

sprintf

TA> as_boolean( $value ) - Always returns $value as a boolena value (ie.
TA> TRUE/1 or FALSE/0).

bah, perl doesn't need booleans.

TA> as_date( $value ) - Always returns $value as a date
TA> (YYYY-MM-DD).

and what is the input format? strftime is good enough

TA> as_time( $value ) - Always returns $value as a time
TA> (HH:MM:SS).

ditto

TA> as_datetime( $value ) - Always returns $value as a datetime (which
TA> means combining as_date() and as_time()).

TA> VALIDATION
TA> Each of the CASTING functions also have a is_* function, which returns
TA> TRUE/1 or FALSE/0 depending on wether the input argument conforms to
TA> the datatype.


TA> NUMBERS
TA> round( $value ) - Rounds a number to the nearest integer.
TA> random_number( $min, $max ) - Returns a random number in the range
TA> $min to $max.

number or integer?

TA> format_number( $value, $separator ) - Formats a number with a given
TA> separator; 1234 becomes 1,234.

TA> ARRAYS
TA> unique( $arrayref ) - Returns only the unique elements in $array.
TA> intersection( $arrayref1, $arrayref2 ) - Computes the intersection
TA> of two array references.
TA> union( $arrayref1, $arrayref2 ) - Computes the union of two array
TA> references.
TA> shuffle( $arrayref ) - Returns the elements shuffled randomly.

TA> DATES
TA> now_year()
TA> now_month()
TA> now_day() - These three returns current the year, month and day.
TA> now_date() - Combines the three above.
TA> now_hour()
TA> now_minute()
TA> now_seconds() - These three returns the current hour, minute and
TA> second.
TA> now_time() - Combines the three above.
TA> now() - Combines now_date() and now_time(); 'YYYY-MM-DD hh:mm:ss'.

hmm, what if you called now_hour just before and now_day just after you
cross midnight? BUG!! those should all take a time() value as input so
you can force them all to use the exact same time.

TA> is_leap_year( $year ) - Returns TRUE/1 if $year is a leap year.

and what if you have only time()? localtime does this

TA> day_of_week( $date ) - Returns the day of the week for $date.
TA> day_of_year( $date ) - Returns the day of the year for $date.

what is date's format?

TA> days_in_month( $year, $month ) - Returns the number of days in the
TA> given year/month.
TA> days_in_year( $year ) - Returns 365 or 366 depending on wether the
TA> year is a leap year or not.

you have just reinvented (poorly) another date/time module. like
templates, everyone has to do one in their lifetime.

TA> Well. These are the majority of the functions I've gathered. There are a
TA> few more, but never mind them for now. All the functions are written in
TA> pure Perl, so no modules - of course - or "anything else" is required.

TA> I find these functions very valuable when programming. Whenever I need to
TA> check if a date from the user is valid, I just:

TA> unless ( is_date($date) ) {

does that handle all possible date formats? i kinda doubt it. see the
newish suite in the DateTime:: namespace.

TA> This is a RFC - Request For Comment - so I want to hear what you all
TA> thing? Should this be a "swiss army" module in the Perl core? I think
TA> so.

no way it should be in the core. you have api issues and bugs. you don't
handle all possible legal inputs or outputs.

uri
 
T

thumb_42

Tore Aursand said:
I find these functions very valuable when programming. Whenever I need to
check if a date from the user is valid, I just:

unless ( is_date($date) ) {
# Error
}

I think it's a good idea, but as you've said earlier, no one wants to
slurp in a boatload of modules to do basic things.

Basic things is kind of a relative term, for me, the date stuff would be
really handy at times, other times I wouldn't use it at all. The
random_string() I'd almost never use. Someone else might have a LOT of use
for something like that. I'd find it useful to add a routine that converts a
24/hr time into a 12/hr am/pm time, someone else would think it'd be great if
it provided an 'input()' function that automatically handled IO flushing and
a prompt.

I guess my view is that a module or collection of code like that is really
the sort of thing that would do well in a plain text file, people who want
it just copy and season to taste.

Thats kind of what I do sometimes. I know some where some place I've done
XYZ, but.. I don't need the whole module much less the hassles of keeping 2
copies with different versions etc... I also use 'perldoc -m Module'
just to figure out how someone else did something, then I'll implement just
that one function modified for whatever I need.

Jamie
 
T

Tore Aursand

Haha, you are too optimistic, I am afraid. :) If such functions ever
made it into the core, clearly they would be heavily optimized. In the
end, they'd be written in C (note that there is no core function that is
written in pure Perl).

My fault entirely: I did _not_ mean the Perl *core*, but the set of
standard modules shipped with Perl. Doh. Can't blame that I haven't
slept too little in the last few days, either. :)

All the functions I listed are pure Perl at the moment. I'm not too
familiar with C, but I understand that "XS programming" is something to
investigate further.
 
T

Tore Aursand

As you say, there are modules to do this. Or

@values = split /,/, $values;

Not quite as powerful as this one;

my @array = ();
push( @array, $+ ) while $string =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push( @array, undef ) if ( substr($string, -1, 1) eq ',' );
This is a little harder... also *much* less useful.

Maybe, but I find myself constantly using this one when I want to create
cookies. Nice to have. :)
as_string( $value, [$default] ) - Always returns a defined value,
optionally $default if $value
isn't defined.
Eh what? You don't *need* to cast in Perl.

$value || $default

This only returns $default if $value isn't true. So if we have this
function:

sub as_string {
my $value = shift;
my $default = shift;
return $value || $default;
}

What happens when you pass 0 to this function?
or defined($value) ? $value : $default

Better, but I'm tired to writing this god damn line each time I want to
make sure that a value is defined. :)

My 'as_string' function will only return the $default value if a) it is
given, and b) if the input $value is undefined.

When it comes to "casting". Maybe it's not the correct word to use, but
when dealing with ie. CGI parameters, I find my functions useful. Here's
an example;

my $firstname = as_string( $cgi->param('firstname') );
my $lastname = as_string( $cgi->param('lastname') );
my $age = as_int( $cgi->param('age') );
err.. int($value)

What if $value isn't a number? Ah. A warning occurs, of course. I don't
want warnings. 'as_int' clearly says that I want _any_ value passed to it
returned as an integer. If that's not possible, return 0.
Or POSIX::floor, or POSIX::ceil.

I know about them. I just didn't want to deal with more than one module,
so I created my own functions instead. :)
as_decimal( $value, [$decimals] ) - Always returns $value as a
decimal number with $decimals
numbers after the decimal point.

Excactly what 'as_decimal' uses, except that I prefer writing

my $value = as_decimal( 1.23456789, 2 );

instead of

my $value = sprintf( '%.2f', 1.23456789 );

Yeah, but 'as_boolean' takes care of "other" boolean values to;

sub as_boolean {
my $value = as_string( shift );
return ( $value =~ m,^1|y|yes|on|true$,i ) ? 1 : 0;
}

I needed this function when dealing with output from an application where
a true value could be almost anything.

No, I didn't write that program, and it wasn't in Perl (or any other
interpreted language), so I couldn't change the output.
POSIX::strftime

'as_date', 'as_time' and 'as_datetime' are - of course - a lot easier to
use than 'strftime'. :)

I don't use the 'strftime' much, and every time I had to use it, I had to
look it up in the documentation. I don't want to do that. I'm lazy. :)
Ummm... everything is (can be) a string.

Yes, but not everything can be everything _else_ than a string. Let's
look at my CGI parameter example again. Instead of writing

my $nr = (defined $cgi->param('nr')) ? $cgi->param('nr') : 0;
unless ( $nr >= 0 && $nr <= 12 ) {
# Error
}

I want to write

my $nr = as_int( $cgi->param('nr') );
unless ( is_int($nr, 0, 12) ) {
# Error
}
is_int is simply $value =~ /^\d+$/.

No. Take a look at this:

my @values = ( 2, -2, +2, '2', '-2', '+2' );
foreach ( @values ) {
print "Is $_ an integer? ";
( /^\d+$/ ) ? print 'Yes' : print 'No';
print "\n";
}

Of course, one could argue about wether a stringified '-2' (or '+2')
really is an integer, but functions like these are nice to have if you
want to _avoid_ errors:

# $value appears from somewhere, possibly from a stupid user :)
if ( is_int($value) ) {
# Ok, we can use this value
$value = as_int( $value ); # 'Cast' it to integer
}
else {
# Sorry, but $value isn't even close to being an integer
}
Dates can and should be handled by your favourite date-time parsing
module: the code is non-trivial, so should be reused.

I guess you're right. I created many of them (the functions) "by hand",
though, and tested them agains Date::Calc and Date::Manip. Man, that was
one useful learning session. :)
I can't think offhand how I'd do is_decimal, probably because I've
never had occasion to.

I once (...) encountered a problem with a web form where the user had to
register some data, and he/she _had_ to enter a number as a decimal. That
was to make sure that the data entered was as accurate as possible, or
something like that;

sub is_decimal {
my $value = as_string( shift );
return ( $value =~ m,^[-+]?(\d+)?\.\d+$, ) ? 1 : 0;
}
int($value + 0.5) is usually good enough.

Yeah, excapt when do want to round negative numbers as well;

return ( $value > 0 ) ? int($value + 0.5) : int($value - 0.5);
(rand * ($max - $min)) + $min

Right, but what if only one of $min or $max is known?
Less trivial... is probably better done by something locale-aware.

Who says that a 'Utils' module can't be locale-aware? :)
These should all be in a module called Set::Util... feel free to write
it.

It isn't in a module already? Doesn't it fit into List::Util?


--
Tore Aursand <[email protected]>
"Scientists are complaining that the new "Dinosaur" movie shows
dinosaurs with lemurs, who didn't evolve for another million years.
They're afraid the movie will give kids a mistaken impression. What
about the fact that the dinosaurs are singing and dancing?" -- Jay
Leno
 
T

Tassilo v. Parseval

Also sprach Tore Aursand:
My fault entirely: I did _not_ mean the Perl *core*, but the set of
standard modules shipped with Perl. Doh. Can't blame that I haven't
slept too little in the last few days, either. :)

All the functions I listed are pure Perl at the moment. I'm not too
familiar with C, but I understand that "XS programming" is something to
investigate further.

No doubt you should do that. Also, XS is a good opportunity to learn or
refresh your C.

I guess some of your proposed string functions are good candidates for
XSization. Since you say that you use them often, you'll gain a little
bit of speed in your scripts if you come up with a not too unreasonable
C implementation.

Tassilo
 
U

Uri Guttman

TA> When it comes to "casting". Maybe it's not the correct word to use, but
TA> when dealing with ie. CGI parameters, I find my functions useful. Here's
TA> an example;

TA> my $firstname = as_string( $cgi->param('firstname') );
TA> my $lastname = as_string( $cgi->param('lastname') );

cgi params are always strings. nothing but strings can be passed to a
web server.

TA> my $age = as_int( $cgi->param('age') );

my $age = $cgi->param('age') + 0 ;

not needed. as soon as you use it as an number it becomes a number. and
having perl do that is faster than a sub call.

TA> What if $value isn't a number? Ah. A warning occurs, of course. I don't
TA> want warnings. 'as_int' clearly says that I want _any_ value passed to it
TA> returned as an integer. If that's not possible, return 0.

that is not the meaning of 'as_int'. call it verify_int then. and you
can trap or ignore the warning. and checking if it is an int is trivial:

my $int = $param =~ /\D/ ? 0 : $param ;

TA> sub as_boolean {
TA> my $value = as_string( shift );
TA> return ( $value =~ m,^1|y|yes|on|true$,i ) ? 1 : 0;
TA> }

TA> I needed this function when dealing with output from an application where
TA> a true value could be almost anything.

that isn't almost anything. it doesn't handle undef or the null string
which are false. in fact it only tests for YOUR true values which are
not universal. not useful in a general purpose module.

TA> 'as_date', 'as_time' and 'as_datetime' are - of course - a lot easier to
TA> use than 'strftime'. :)

and as i said, they have a bug. they don't take time() as an argument so
there can be skewed times between calls.

TA> I don't use the 'strftime' much, and every time I had to use it, I had to
TA> look it up in the documentation. I don't want to do that. I'm lazy. :)

but strftime is correct which is lazier

TA> Yes, but not everything can be everything _else_ than a string. Let's
TA> look at my CGI parameter example again. Instead of writing

TA> my $nr = (defined $cgi->param('nr')) ? $cgi->param('nr') : 0;
TA> unless ( $nr >= 0 && $nr <= 12 ) {
TA> # Error
TA> }

TA> I want to write

TA> my $nr = as_int( $cgi->param('nr') );
TA> unless ( is_int($nr, 0, 12) ) {
TA> # Error
TA> }

TA> No. Take a look at this:

TA> my @values = ( 2, -2, +2, '2', '-2', '+2' );
TA> foreach ( @values ) {
TA> print "Is $_ an integer? ";
TA> ( /^\d+$/ ) ? print 'Yes' : print 'No';
TA> print "\n";
TA> }

so prefix a [+-]? to it.

and call it verify_int.

and better yet, wrap it so it does all of that in one sub with the cgi
by subclassing:

sub CGI::verify_int {

my( $self, $param ) = @_ ;

my $cgi_val = $self->param( $param ) ;

return unless defined $cgi_val ;

return unless $cgi_val =~ /^\s*[-+]?\d+$/ ;

return $cgi_val + 0 ;
}

better and simpler to use than your code.

my $int = $cgi->verify_int( 'foo' ) ;

TA> I guess you're right. I created many of them (the functions) "by hand",
TA> though, and tested them agains Date::Calc and Date::Manip. Man, that was
TA> one useful learning session. :)

and you couldn't have possibly covered all date formats.

uri
 
T

Tore Aursand

cgi params are always strings. nothing but strings can be passed to a
web server.

Of course, but the values of 'firstname' and/or 'lastname' can be
undefined. It's so much simpler - and faster - to let a function take
care of making sure that $firstname and $lastname is defined.
my $age = $cgi->param('age') + 0 ;

Except when you want to do range checking and setting the value to a
default value if the input argument is out of range. 'as_int()', and a
lot of the other 'as_*' functions take care of that;

my $value = as_int( $input, $min, $max, $default );
that is not the meaning of 'as_int'. call it verify_int then.

Ok, maybe it should be renamed. I just though 'as_int()' made sense,
'cause no matter what I send as arguments to that function, I want an
integer back. :)
that isn't almost anything. it doesn't handle undef or the null string
which are false.

Yes, it does. 'as_string(shift)' makes sure that $value is defined. And
the function only returns true if $value matches the regular expression.
A blank value is - of course - false.
in fact it only tests for YOUR true values which are not universal. not
useful in a general purpose module.

That is - of course - correct.
and as i said, they have a bug. they don't take time() as an argument so
there can be skewed times between calls.

Hmm. Pardon my English, but what do you really mean here? The three
functions above can be used like this (with 'as_date()' as example):

my $date = as_date( '2004-02-11' );
my $date = as_date( '2004-02-11 12:34' );
# and other variations of the above
my $date = as_date( 2004, 02, 11 );
my $date = as_date( time );
[...]
and call it verify_int.

and better yet, wrap it so it does all of that in one sub with the cgi
by subclassing:
[...]

Ah - forget about CGI. I just used it as an example. :) Most of my work
involves creating CGI scripts (and/or mod_perl-based web applications).
[...]
and you couldn't have possibly covered all date formats.

"There can be only one..." :)


--
Tore Aursand <[email protected]>
"Writing is a lot like sex. At first you do it because you like it.
Then you find yourself doing it for a few close friends and people you
like. But if you're any good at all, you end up doing it for money."
-- Unknown
 
T

Tore Aursand

No doubt you should do that. Also, XS is a good opportunity to learn or
refresh your C.

Yeah. Does anyone have related documents for me (apart from perlxs and
perlxstut)?
I guess some of your proposed string functions are good candidates for
XSization. Since you say that you use them often, you'll gain a little
bit of speed in your scripts if you come up with a not too unreasonable
C implementation.

Preemptive optimization is the root of all evil. :) The functions I
listed are - of course - subject to optimization at some time, but I don't
need them to be fast. I need them to do my work faster (and better).
 
U

Uri Guttman

TA> Hmm. Pardon my English, but what do you really mean here? The three
TA> functions above can be used like this (with 'as_date()' as example):

i meant the now* funcs. they call time internally and can be skewed
between neighboring calls. see my first followup.

uri
 
B

Ben Morrow

Tore Aursand said:
As you say, there are modules to do this. Or

@values = split /,/, $values;

Not quite as powerful as this one;

my @array = ();
push( @array, $+ ) while $string =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
^ ^
Unnecessary
| ([^,]+),?
| ,
}gx;
push( @array, undef ) if ( substr($string, -1, 1) eq ',' );

use Text::Balanced qw/extract_multiple extract_delimited/;

extract_multiple $string,
[ sub { extract_delimited $_[0], '"' }, qr/[^,]*/ ],
undef, 1;

Or, there are modules to do it.
Maybe, but I find myself constantly using this one when I want to create
cookies. Nice to have. :)

Well, yeah; but hardly of general utility. Perl != CGI.
as_string( $value, [$default] ) - Always returns a defined value,
optionally $default if $value
isn't defined.
Eh what? You don't *need* to cast in Perl.

$value || $default

This only returns $default if $value isn't true. So if we have this
function:

Yeah, yeah; I know that. It's good enough most of the time,
though. Roll on 5.10 when (I think) they're going to put // in (which
tests for definedness rather than truth).
Better, but I'm tired to writing this god damn line each time I want to
make sure that a value is defined. :)

But it's a lot clearer... if the default is not supplied, your
function is a noop; otherwise, it is a defined-or-default
function. Not what I'd call as_string.
When it comes to "casting". Maybe it's not the correct word to use, but
when dealing with ie. CGI parameters, I find my functions useful. Here's
an example;

my $firstname = as_string( $cgi->param('firstname') );

What, precisely, is the difference between this and

my $firstname = $cgi->param('firstname');

?
What if $value isn't a number? Ah. A warning occurs, of course. I don't
want warnings.

So turn it off. Not exactly hard. And, again, an explicit statement to
a reader of your code that you are prepared to tolerate non-numeric
inputs.
as_decimal( $value, [$decimals] ) - Always returns $value as a
decimal number with $decimals
numbers after the decimal point.

Excactly what 'as_decimal' uses, except that I prefer writing

my $value = as_decimal( 1.23456789, 2 );

instead of

my $value = sprintf( '%.2f', 1.23456789 );

Why? This slightly peculiar taste of yours hardly justifies inclusion
in a core module.
Yeah, but 'as_boolean' takes care of "other" boolean values to;

sub as_boolean {
my $value = as_string( shift );
return ( $value =~ m,^1|y|yes|on|true$,i ) ? 1 : 0;
}

Whoa... not what it said on the tin *at* *all*. I'd definitely much
prefer that to be explicit in the code; and you *really* don't need
that ?:: m// already returns a boolean value...
'as_date', 'as_time' and 'as_datetime' are - of course - a lot easier to
use than 'strftime'. :)

'Of course'? I think not. Apart from anything else, it is clear from a
strftime call what format the result will be in, whereas your
functions are not.
Yes, but not everything can be everything _else_ than a string. Let's
look at my CGI parameter example again. Instead of writing

my $nr = (defined $cgi->param('nr')) ? $cgi->param('nr') : 0;
unless ( $nr >= 0 && $nr <= 12 ) {
# Error
}

I want to write

my $nr = as_int( $cgi->param('nr') );
unless ( is_int($nr, 0, 12) ) {
# Error
}

I wasn't debating is_int, but is_string. What, in fact, does this
function *do*?
No. Take a look at this:

my @values = ( 2, -2, +2, '2', '-2', '+2' );
foreach ( @values ) {
print "Is $_ an integer? ";
( /^\d+$/ ) ? print 'Yes' : print 'No';
print "\n";
}

OK, my mistake... $value =~ /^[+-]?\d+$/.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I guess you're right. I created many of them (the functions) "by hand",
though, and tested them agains Date::Calc and Date::Manip. Man, that was
one useful learning session. :)

Right, it's a good way to learn... but in production, use the
peer-reviewed code. *That's what it's there for*.
I can't think offhand how I'd do is_decimal, probably because I've
never had occasion to.

I once (...) encountered a problem with a web form where the user had to
register some data, and he/she _had_ to enter a number as a decimal. That
was to make sure that the data entered was as accurate as possible, or
something like that;

sub is_decimal {
my $value = as_string( shift );
return ( $value =~ m,^[-+]?(\d+)?\.\d+$, ) ? 1 : 0; ^^^^^^
\d*
}

Right. Once. As I said.
Right, but what if only one of $min or $max is known?

Err... what if? What do you *want* to happen in that case? (And how do
you implement it?... insofaras I am aware, the random-number generator
will not accept 'infinity' as an argument.)
Who says that a 'Utils' module can't be locale-aware? :)

OK; but personally I'd rather that sprintf be made aware of the '
modifier from SUSv2 (which does just this).
It isn't in a module already? Doesn't it fit into List::Util?

Well, maybe, but they aren't in there. Personally, I'd have said
there's a useful distinction to be made between 'operations on ordered
sets of data' (List::Util) and 'operations on unordered sets of data'
(Set::Util).

Anyway, if you thought it was in a module already *why aren't you
using it*?

Ben
 
T

Tassilo v. Parseval

Also sprach Ben Morrow:
perlguts and perlapi. And the perl source.

You are so mean. :)

Yet it's true. The perl source is often an indispensable source of
information since perlapi.pod is incomplete. The headers will do most of
the time.

Makes me think of how I learnt XS. I think perlxstut was my primary
weapon. It only scratches on the surface, though, but it could just be
enough to get one started. The whole XS business requires a bit of
patience and stamina. Sooner or later there might be the point of
enlightment (to a certain extent). For me that was when I understood
typemaps and how to store pointers to C structures in Perl types. That
turnt the affair from painful to quite enjoyable for me.

As for documentation, I found perlclib.pod and perlcall.pod quite
helpful. And of course (e-mail address removed) where a small but fine selection
of XS gurus is willing to offer help.

Yes, this one helped me in the past, too. Ever since I started
descriptions for the various stacks have been added. XSUBs do not use @_
for argument handling. Instead they access the argument stack directly
so it's really important to have a rough understanding of them.

Tassilo
 
T

Tore Aursand

i meant the now* funcs. they call time internally and can be skewed
between neighboring calls. see my first followup.

And? Is that a problem?

my $now1 = now();
sleep(3600);
my $now2 = now();

Do you mean that they should be the same? Just a reminder for you: These
functions ('now_*') does nothing else than grabbing the current time,
formatting it and returning it. How can values be skewed? I only grab
the time value once for each of them...!?

I must misunderstand something.
 
U

Uri Guttman

TA> And? Is that a problem?

TA> my $now1 = now();
TA> sleep(3600);
TA> my $now2 = now();

TA> Do you mean that they should be the same? Just a reminder for you: These
TA> functions ('now_*') does nothing else than grabbing the current time,
TA> formatting it and returning it. How can values be skewed? I only grab
TA> the time value once for each of them...!?

my $now1 = now_day() ;

midnight happens NOW!!!

my $now2 = now_hour() ;

my $timestamp = "$now1:$now2" ;

that is not the value you expect. a bug. not a likely bug but
effectively similar to a race condition.

the solution is to grab time once and pass it to the now() funcs which
is what strftime( "format", localtime() ) does.

TA> I must misunderstand something.

yes, you did. you didn't understand the possibility of time changing
between now*() calls.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,745
Messages
2,569,487
Members
44,910
Latest member
Isaac6955

Latest Threads

Top