Populate a hash from a list elegantly

usenet · Mar 9, 2006

Kindly consider this sample code, if you will, which illustrates my
question. This code works just fine and does exactly what I want,
but... I dunno... I just don't like the approach I've taken. At first,
I thought to approach this with a split() (using a limit of 1) instead
of a regexp, but I couldn't figure out how to make that work in
anything other than a convoluted manner. I'm interested in maybe
learning different techniques from others who may approach the task
differently.

#!/usr/bin/perl
use strict; use warnings;

my %user; # keys are userid's, values are names
while (my $line = <DATA>) {
$line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
}
print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

__DATA__
fredf Fred Flintstone
Barn Barney Rubble
bogus
WF Wilma Flintstone
betty Betty Rubble

robic0 · Mar 9, 2006

Kindly consider this sample code, if you will, which illustrates my
question. This code works just fine and does exactly what I want,
but... I dunno... I just don't like the approach I've taken. At first,
I thought to approach this with a split() (using a limit of 1) instead
of a regexp, but I couldn't figure out how to make that work in
anything other than a convoluted manner. I'm interested in maybe
learning different techniques from others who may approach the task
differently.

#!/usr/bin/perl
use strict; use warnings;

my %user; # keys are userid's, values are names
while (my $line = <DATA>) {
$line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
}
print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

__DATA__
fredf Fred Flintstone
Barn Barney Rubble
bogus
WF Wilma Flintstone
betty Betty Rubble

Nothing wrong with what you have, but give yourself some diagnostic leeway...

my ($line);
while ($line = <DATA>) {
if ($line =~ /^\s+(\w+)\s+(.*?)\s+$/) {
$user{$1} = $2;
} else {
print "no match for: <$line>\n";
}
}

robic0 · Mar 9, 2006

Nothing wrong with what you have, but give yourself some diagnostic leeway...

my ($line);
while ($line = <DATA>) {
if ($line =~ /^\s+(\w+)\s+(.*?)\s+$/) {
$user{$1} = $2;
} else {
print "no match for: <$line>\n";
}
}

excuse me, use this:

if ($line =~ /^\s*(\w+)\s*(.*?)\s*$/) {

robic0 · Mar 9, 2006

excuse me, use this:

if ($line =~ /^\s*(\w+)\s*(.*?)\s*$/) {

Too much good wine, use this:

if ($line =~ /^\s*(\w+)\s+(.*?)\s*$/) {

robic0 · Mar 9, 2006

Too much good wine, use this:

if ($line =~ /^\s*(\w+)\s+(.*?)\s*$/) {

As a general rule in your circumstance, use "split" when a nearly
"homogenous" pattern is assured. Homogenous in the sence that only the
delimiter can be described as a pattern. The source has to be known
to a %99.999 assurance, something output from like an excell csv file.

What you did with the regex was to introduce a restriction on what
the non-delimited data should be. Quality assurance is preferred over
speed.

robic0

Gunnar Hjalmarsson · Mar 9, 2006

Kindly consider this sample code, if you will, which illustrates my
question. This code works just fine and does exactly what I want,
but... I dunno... I just don't like the approach I've taken. At first,
I thought to approach this with a split() (using a limit of 1) instead
of a regexp, but I couldn't figure out how to make that work in
anything other than a convoluted manner. I'm interested in maybe
learning different techniques from others who may approach the task
differently.

my %user; # keys are userid's, values are names
while (my $line = <DATA>) {
$line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
}

This is one idea:

my %user = map
{ chomp; local @_; no warnings; split(' ', $_, 2) == 2 ? @_ : () }
<DATA>;

According perldoc -f split, use of split in scalar context is
deprecated. "local @_" and "no warnings" take care of that, but the
solution may still not be advisable.

robic0 · Mar 9, 2006

This is one idea:

my %user = map
{ chomp; local @_; no warnings; split(' ', $_, 2) == 2 ? @_ : () }
<DATA>;

According perldoc -f split, use of split in scalar context is
deprecated. "local @_" and "no warnings" take care of that, but the
solution may still not be advisable.

I can't fathom....

DJ Stunks · Mar 9, 2006

#!/usr/bin/perl
use strict; use warnings;

my %user; # keys are userid's, values are names
while (my $line = <DATA>) {
$line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
}
print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

__DATA__
fredf Fred Flintstone
Barn Barney Rubble
bogus
WF Wilma Flintstone
betty Betty Rubble

I would use a map instead of a while, but adjust the regex slightly to
ensure it fails (ie: won't return a partial match) for bogus entries.

observe:

#!/usr/bin/perl
use strict; use warnings;

my %user = map { m{^(\w+) +(.+)$} } <DATA>;

print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

__DATA__
fredf Fred Flintstone
Barn Barney Rubble
bogus
WF Wilma Flintstone
betty Betty Rubble

-jp

PS: sorry for the triple-posting in that other thread. Damn you,
Google Groups!
PPS: good newsreader for winXP suggestions?

Uri Guttman · Mar 9, 2006

u> Kindly consider this sample code, if you will, which illustrates my
u> question. This code works just fine and does exactly what I want,
u> but... I dunno... I just don't like the approach I've taken. At first,
u> I thought to approach this with a split() (using a limit of 1) instead
u> of a regexp, but I couldn't figure out how to make that work in
u> anything other than a convoluted manner. I'm interested in maybe
u> learning different techniques from others who may approach the task
u> differently.

u> #!/usr/bin/perl
u> use strict; use warnings;

u> my %user; # keys are userid's, values are names
u> while (my $line = <DATA>) {
u> $line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
u> }
u> print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

u> __DATA__
u> fredf Fred Flintstone
u> Barn Barney Rubble
u> bogus
u> WF Wilma Flintstone
u> betty Betty Rubble

use File::Slurp ;

my %user = read_file( \*DATA ) =~ /^(\w+)\s+(.*)$/mg ;

uri

usenet · Mar 9, 2006

Uri said:
use File::Slurp ;
my %user = read_file( \*DATA ) =~ /^(\w+)\s+(.*)$/mg ;

That's great! Thanks.

John W. Krahn · Mar 9, 2006

Kindly consider this sample code, if you will, which illustrates my
question. This code works just fine and does exactly what I want,
but... I dunno... I just don't like the approach I've taken. At first,
I thought to approach this with a split() (using a limit of 1) instead
of a regexp, but I couldn't figure out how to make that work in
anything other than a convoluted manner.

That is probably because split()'s limit describes the number of fields to
return and you want two fields (the hash keys and the hash value) not one field.

my ( $key, $val ) = split / +/, $line, 2;

John

usenet · Mar 9, 2006

John said:
That is probably because split()'s limit describes the number of fields to
return and you want two fields (the hash keys and the hash value) not one field.

Ah, I didn't realize that. Thanks, but that wasn't really my problem
(though it would have become a problem)...

my ( $key, $val ) = split / +/, $line, 2;

Yeah, that's where I was actually having trouble, because I can't see
how to translate that into hash-ish (without ugly intermediate scalars
or an intermediate array), such as:

$user{$dunno_what_to_put_here} = (split / +/, $line, 2)[1];

John W. Krahn · Mar 9, 2006

Gunnar said:
This is one idea:

my %user = map
{ chomp; local @_; no warnings; split(' ', $_, 2) == 2 ? @_ : () }
<DATA>;

According perldoc -f split, use of split in scalar context is
deprecated. "local @_" and "no warnings" take care of that, but the
solution may still not be advisable.

So why not use a lexically scoped array?

my %user = map
{ chomp; my @array; ( @array = split( ' ', $_, 2 ) ) == 2 ? @array : () }
<DATA>;

John

DJ Stunks · Mar 9, 2006

Uri said:
u> Kindly consider this sample code, if you will, which illustrates my
u> question. This code works just fine and does exactly what I want,
u> but... I dunno... I just don't like the approach I've taken. At first,
u> I thought to approach this with a split() (using a limit of 1) instead
u> of a regexp, but I couldn't figure out how to make that work in
u> anything other than a convoluted manner. I'm interested in maybe
u> learning different techniques from others who may approach the task
u> differently.

u> #!/usr/bin/perl
u> use strict; use warnings;

u> my %user; # keys are userid's, values are names
u> while (my $line = <DATA>) {
u> $line =~ m{^(\w+) +(.*)$} and $user{$1} = $2;
u> }
u> print map { "'$_'\t=>\t'$user{$_}'\n" } sort keys %user;

u> __DATA__
u> fredf Fred Flintstone
u> Barn Barney Rubble
u> bogus
u> WF Wilma Flintstone
u> betty Betty Rubble

use File::Slurp ;

my %user = read_file( \*DATA ) =~ /^(\w+)\s+(.*)$/mg ;

uri

DJ Stunks · Mar 9, 2006

Uri said:
use File::Slurp ;

my %user = read_file( \*DATA ) =~ /^(\w+)\s+(.*)$/mg ;

this regex does not filter the bogus entry.

try /^(\w+) +(.+)$/mg instead.

-jp

Gunnar Hjalmarsson · Mar 9, 2006

John said:
So why not use a lexically scoped array?

my %user = map
{ chomp; my @array; ( @array = split( ' ', $_, 2 ) ) == 2 ? @array : () }
<DATA>;

Thanks, John. And that made me realize that assigning _explicitly_ to @_
is enough to get rid of the warning:

my %user = map
{ chomp; ( local @_ = split ' ', $_, 2 ) == 2 ? @_ : () } <DATA>;

Dr.Ruud · Mar 9, 2006

DJ Stunks schreef:

PPS: good newsreader for winXP suggestions?

I assume you mean 'text articles' (did you just read 'testicles'?),
since you didn't say 'binary'.

Many are nice to work with:

40tude Dialog
(multi-server, multi-threaded, Unicode)

(MicroPlanet) Gravity, Super Gravity

Forte (Free) Agent

slrn http://slrn.sourceforge.net/
(use an NTFS compressed folder as spool)

Hamster Playground + Outlook Express + OE QuoteFix

Thunderbird

http://www.newsreaders.com/win/clients.html

Dr.Ruud · Mar 9, 2006

DJ Stunks schreef:

/^(\w+) +(.+)$/mg

/^(\w+)[[:blank:]]+(.+)/mg

(untested)

John Bokma · Mar 9, 2006

Dr.Ruud said:
40tude Dialog
(multi-server, multi-threaded, Unicode)

I use Xnews, and it's probably not the most user friendly program, and has
some minor issues (or major, YMMV), but I still haven't switched to Dialog
(which I want for some time) :-D

Dr.Ruud · Mar 9, 2006

John Bokma:

Dr.Ruud:

I use Xnews, and it's probably not the most user friendly program,
and has some minor issues (or major, YMMV), but I still haven't
switched to Dialog (which I want for some time) :-D

Also very nice is Pimmy, because it has almost no dependencies.

I use an old one, as a sort of watchdog, connected to many pop- and
IMAP-boxes on many servers.
http://www.geminisoft.com/en/pimmy/

Regex, spaces in pattern stored in variable.	3	Jan 6, 2010
chomp hash keys?	9	Apr 29, 2006
FAQ 7.29 How can I use a variable as a variable name?	0	Feb 22, 2011
Combining statements - a hashref from a hash slice	6	May 12, 2006
List::MoreUtils::each_arrayref: "semi-panic: attempt to dup freed string"	5	Feb 25, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Populate a hash from a list elegantly

usenet

robic0

robic0

robic0

robic0

Gunnar Hjalmarsson

robic0

DJ Stunks

Uri Guttman

usenet

John W. Krahn

usenet

John W. Krahn

DJ Stunks

DJ Stunks

Gunnar Hjalmarsson

Dr.Ruud

Dr.Ruud

John Bokma

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads