URI queries with varied amounts of named values

R

rickle

I'm looking for some assistance from the perl folk out there. I am a
perl hack who writes a few scripts a year in perl only when needed.
This seems like a job best suited for perl (most likely a hash) but I
am fumbling around more then I would like to be. I'm sure this is
very simple for anyone well versed in perl.

Objective:
Take a list of named values and put them into a CSV file. When there
isn't a named value there should just be an empty CSV slot. There
might also be some entries on the same line that are somewhat
duplicated, where if there is one entry it should always trump the
other. The CSV file will always have 7 possible entries in the CSV.
language,format,country,zip,category,ua,id

Problem:
The named values vary by line so there is never just X per line. Some
will have just X, some will be X+1, X+5, some will be empty, etc.,
etc.

Example file:
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz

l=fr&country=US&alt-country=CA # in this case we want the alt-
country to populate the country field

Example output:
en,xhtml,,,,,,
,xml,US,,,Mozilla,
sp,,,00000,books,,
en,xml,,,,,xyz
fr,,CA,,,,,

I have tried playing around with the URI perl module but haven't had
much luck. I have also made some attempts on my own but I am just not
getting things right. I know this is probably better suited to a hash
but I am very hash illiterate. I can perform basic functions in
hashes and do simple stuff but I don't play around with perl enough to
have gotten any better.

foreach (@DATA_SET) {
next unless /\S/; # strip out blank lines, i.e. no named
values
# print "$_\n";
#$format = m/format=xml/;
#my $format =~ /format=xml/;
#print $_;
#print "$format\n";
}

I also tried pushing the results to another function and doing some
work there but it didn't go well as you can see from how I ended up
completely commenting it out.

#sub string_analysis {
#my (@DATA_RESULTS) = @_;
#@DATA_RESULTS = split(/&/,$_ [0]);
#print "@DATA_RESULTS\n";
#while(@DATA_RESULTS){
# foreach (split/&/,$S)[0]){
# print "$DATA_RESULTS[1]\n";
# }
#}
# push (@SPLIT_DATA_RESULTS = split(/\&/,$_));
#}
#while (@SPLIT_DATA_RESULTS) {
# print "$_\n";
#}
#my @DATA_STRING = split /&/, @fields[9];
#print "@DATA_STRING[1], @DATA_STRING[2], @DATA_STRING[3],
@DATA_STRING[4], @DATA_STRING[5], @DATA_STRING[6], @DATA_STRING[7],
@DATA_STRING[8]\n";
#}

Any help would be greatly appreciated.
 
J

Jim Gibson

rickle said:
I'm looking for some assistance from the perl folk out there. I am a
perl hack who writes a few scripts a year in perl only when needed.
This seems like a job best suited for perl (most likely a hash) but I
am fumbling around more then I would like to be. I'm sure this is
very simple for anyone well versed in perl.

Objective:
Take a list of named values and put them into a CSV file. When there
isn't a named value there should just be an empty CSV slot. There
might also be some entries on the same line that are somewhat
duplicated, where if there is one entry it should always trump the
other. The CSV file will always have 7 possible entries in the CSV.
language,format,country,zip,category,ua,id

Problem:
The named values vary by line so there is never just X per line. Some
will have just X, some will be X+1, X+5, some will be empty, etc.,
etc.

Example file:
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz

l=fr&country=US&alt-country=CA # in this case we want the alt-
country to populate the country field

Example output:
en,xhtml,,,,,,
,xml,US,,,Mozilla,
sp,,,00000,books,,
en,xml,,,,,xyz
fr,,CA,,,,,

I have tried playing around with the URI perl module but haven't had
much luck. I have also made some attempts on my own but I am just not
getting things right. I know this is probably better suited to a hash
but I am very hash illiterate. I can perform basic functions in
hashes and do simple stuff but I don't play around with perl enough to
have gotten any better.

Here is one way:

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
}
push( @data, \%record );
}

for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";
}

__DATA__
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA
__END__

which produces:

en,xhtml,,,,,
,xml,US,,,Mozilla,
sp,,,00000,books,,
en,xml,,,,,xyz
fr,,CA,,,,
 
C

ccc31807

CODE:
while (<DATA>)
{
chomp;
my $string = $_;
$string =~ s/&/","/g;
$string =~ s/=/","/g;
$string = qq("$string"\n);
print $string;
}

__DATA__
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA

OUTPUT:
C:\PerlLearn>perl uri_hash.plx
"l","en","format","xhtml"
"format","xml","country","US","ua","Mozilla"
"l","sp","zip","00000","category","books"
"l","en","format","xml","id","xyz"
"l","fr","country","US","alt-country","CA "

C:\PerlLearn>
 
R

rick

Here is one way:

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
  chomp($line);
  my @fields = split(/&/,$line);
  my %record = map { $_, ''} @keys;
  for my $field ( @fields ) {
    my( $key, $val ) = split(/=/,$field);
    $key = 'country' if $key eq 'alt-country';
    $record{$key} = $val;
  }
  push( @data, \%record );

}

for my $record ( @data ) {
  print join(',',@{$record}{@keys}), "\n";

}

__DATA__
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA
__END__

which produces:

en,xhtml,,,,,
,xml,US,,,Mozilla,
sp,,,00000,books,,
en,xml,,,,,xyz
fr,,CA,,,,

Holy cow Jim, that is spot on. Thank you so much, you have saved me
hours of fumbling around and what was certain to be one of the ugliest
perl scripts ever created.
 
R

rick

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
  chomp($line);
  my @fields = split(/&/,$line);
  my %record = map { $_, ''} @keys;
  for my $field ( @fields ) {
    my( $key, $val ) = split(/=/,$field);
    $key = 'country' if $key eq 'alt-country';
    $record{$key} = $val;
  }
  push( @data, \%record );

}

for my $record ( @data ) {
  print join(',',@{$record}{@keys}), "\n";

}

__DATA__
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA
__END__

which produces:

en,xhtml,,,,,
,xml,US,,,Mozilla,
sp,,,00000,books,,
en,xml,,,,,xyz
fr,,CA,,,,


I'm a bit confused as to how hash's work with arrays. I created this
dummy_file.txt file. The first script I run produces the output it
should, as Jim documented. However, I need to perform this work
against an array and all atempts to work with the array have failed.
The biggest problem is the same line outputs itself repeatedly for the
number of items in the hash. I print both $key and $val and they show
the proper items but when I print the key/val pair I get the same line
repeating over and over. Could someone shed some light on this for
me?

# dummy_file.txt
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA


working_version.pl
#!/usr/bin/perl

use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;

open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while( my $line = <FILE> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
#print "$field\n";
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
}
push( @data, \%record );
}


#print "@data\n";
for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";
}

##################################################
failing_version.pl
#!/usr/bin/perl

use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
my @array;

# the file is in a pre-existing array so I need to mimic that behavior
here
open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while (<FILE>){
push (@array, $_);
}

my @fields = split(/&/,"@array");
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
chomp ($field);
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
push( @data, \%record );
}

for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";
}
 
P

Peter J. Holzer

I'm a bit confused as to how hash's work with arrays. I created this
dummy_file.txt file. The first script I run produces the output it
should, as Jim documented. However, I need to perform this work
against an array and all atempts to work with the array have failed.
The biggest problem is the same line outputs itself repeatedly for the
number of items in the hash. I print both $key and $val and they show
the proper items but when I print the key/val pair I get the same line
repeating over and over. Could someone shed some light on this for
me?

# dummy_file.txt
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA


working_version.pl
#!/usr/bin/perl

use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;

open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while( my $line = <FILE> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
#print "$field\n";
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
}
push( @data, \%record );
}


#print "@data\n";
for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";
}

##################################################
failing_version.pl
#!/usr/bin/perl

use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
my @array;

# the file is in a pre-existing array so I need to mimic that behavior
here
open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while (<FILE>){
push (@array, $_);
}

Now @array contains the contents of 'dummy_file.txt', one line per
element. So if want to do the same thing as before you just need to loop
over the elements instead of the lines. So you just have to replace the
single line

while( my $line = <FILE> ) {

from your working script with

for my $line (@array) {


my @fields = split(/&/,"@array");

Instead you concatenate all the lines (with spaces between them) and then split
the result into fields.
my %record = map { $_, ''} @keys;

Then construct a single record.
for my $field ( @fields ) {
chomp ($field);
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
push( @data, \%record );

And add that same record to the array for each field you find.

If you don't understand what your program does it is often a good idea
to step through it in the debugger:

% perl -d failing_version.pl

Loading DB routines from perl5db.pl version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(rick2:6): my @keys = qw( l format country zip category ua
id );
DB<1> n
main::(rick2:7): my @data;
[...]
main::(rick2:16): my @fields = split(/&/,"@array");
DB<1>
main::(rick2:17): my %record = map { $_, ''} @keys;
DB<1> x @fields
0 'l=en'
1 'format=xhtml
format=xml'
2 'country=US'
3 'ua=Mozilla
l=sp'
4 'zip=00000'
5 'category=books
l=en'
6 'format=xml'
7 'id=xyz
l=fr'
8 'country=US'
9 'alt-country=CA
'

So now you have 10 fields. Note that some of them probably don't look
like you expected: $fields[1] is "format=xhtml\n format=xml", that is is
contains an embedded newline and space. You probably wanted that to be
two fields: "format=xhtml" and "format=xml".

[...]

At the beginning of the second run through the loop, @data looks fine:

main::(rick2:19): chomp ($field);
DB<6> x @data
0 HASH(0x8cdb148)
'category' => ''
'country' => ''
'format' => ''
'id' => ''
'l' => 'en'
'ua' => ''
'zip' => ''

but at the second run it looks weird:

main::(rick2:19): chomp ($field);
DB<7> x @data
0 HASH(0x8cdb148)
'category' => ''
'country' => ''
'format' => 'xhtml
format'
'id' => ''
'l' => 'en'
'ua' => ''
'zip' => ''
1 HASH(0x8cdb148)
-> REUSED_ADDRESS
DB<8>

You have just pushed a second reference to %record to @data. So now both
elements point to the same data and both will be modified when you
modify %record.

hp
 
R

rick

Now @array contains the contents of 'dummy_file.txt', one line per
element. So if want to do the same thing as before you just need to loop
over the elements instead of the lines. So you just have to replace the
single line

    while( my $line = <FILE> ) {

from your working script with

    for my $line (@array) {
my @fields = split(/&/,"@array");

Instead you concatenate all the lines (with spaces between them) and thensplit
the result into fields.
my %record = map { $_, ''} @keys;

Then construct a single record.
for my $field ( @fields ) {
    chomp ($field);
    my( $key, $val ) = split(/=/,$field);
    $key = 'country' if $key eq 'alt-country';
    $record{$key} = $val;
    push( @data, \%record );

And add that same record to the array for each field you find.

If you don't understand what your program does it is often a good idea
to step through it in the debugger:

% perl -d failing_version.pl

Loading DB routines from perl5db.pl version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(rick2:6):        my @keys = qw( l format country zip category ua
id );
  DB<1> n
main::(rick2:7):        my @data;
[...]
main::(rick2:16):       my @fields = split(/&/,"@array");
  DB<1>
main::(rick2:17):       my %record = map { $_, ''} @keys;
  DB<1> x @fields
0  'l=en'
1  'format=xhtml
 format=xml'
2  'country=US'
3  'ua=Mozilla
 l=sp'
4  'zip=00000'
5  'category=books
 l=en'
6  'format=xml'
7  'id=xyz
 l=fr'
8  'country=US'
9  'alt-country=CA
'

So now you have 10 fields. Note that some of them probably don't look
like you expected: $fields[1] is "format=xhtml\n format=xml", that isis
contains an embedded newline and space. You probably wanted that to be
two fields: "format=xhtml" and "format=xml".

[...]

At the beginning of the second run through the loop, @data looks fine:

main::(rick2:19):           chomp ($field);
  DB<6> x @data
0  HASH(0x8cdb148)
   'category' => ''
   'country' => ''
   'format' => ''
   'id' => ''
   'l' => 'en'
   'ua' => ''
   'zip' => ''

but at the second run it looks weird:

main::(rick2:19):           chomp ($field);
  DB<7> x @data
0  HASH(0x8cdb148)
   'category' => ''
   'country' => ''
   'format' => 'xhtml
 format'
   'id' => ''
   'l' => 'en'
   'ua' => ''
   'zip' => ''
1  HASH(0x8cdb148)
   -> REUSED_ADDRESS
  DB<8>

You have just pushed a second reference to %record to @data. So now both
elements point to the same data and both will be modified when you
modify %record.

        hp

Explained perfectly, thank you for the help. I know how to fix this
problem and I understand how it works so I won't into this same issue
again.
 
P

Peter J. Holzer

[superceded because of excessive quoting in the original]

I'm a bit confused as to how hash's work with arrays. I created this
dummy_file.txt file. The first script I run produces the output it
should, as Jim documented. However, I need to perform this work
against an array and all atempts to work with the array have failed.
The biggest problem is the same line outputs itself repeatedly for the
number of items in the hash. I print both $key and $val and they show
the proper items but when I print the key/val pair I get the same line
repeating over and over. Could someone shed some light on this for
me?

# dummy_file.txt
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA


working_version.pl
#!/usr/bin/perl [...]
open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while( my $line = <FILE> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) { [...]
}
push( @data, \%record );
}


#print "@data\n";
for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";
}

##################################################
failing_version.pl
#!/usr/bin/perl [...]
my @array;

# the file is in a pre-existing array so I need to mimic that behavior
here
open(FILE, 'dummy_file.txt') or die "Can't open file: $!";
while (<FILE>){
push (@array, $_);
}

Now @array contains the contents of 'dummy_file.txt', one line per
element. So if want to do the same thing as before you just need to loop
over the elements instead of the lines. So you just have to replace the
single line

while( my $line = <FILE> ) {

from your working script with

for my $line (@array) {


my @fields = split(/&/,"@array");

Instead you concatenate all the lines (with spaces between them) and then split
the result into fields.
my %record = map { $_, ''} @keys;

Then construct a single record.
for my $field ( @fields ) {
chomp ($field);
my( $key, $val ) = split(/=/,$field);
$key = 'country' if $key eq 'alt-country';
$record{$key} = $val;
push( @data, \%record );

And add that same record to the array for each field you find.

If you don't understand what your program does it is often a good idea
to step through it in the debugger:

% perl -d failing_version.pl

Loading DB routines from perl5db.pl version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(rick2:6): my @keys = qw( l format country zip category ua
id );
DB<1> n
main::(rick2:7): my @data;
[...]
main::(rick2:16): my @fields = split(/&/,"@array");
DB<1>
main::(rick2:17): my %record = map { $_, ''} @keys;
DB<1> x @fields
0 'l=en'
1 'format=xhtml
format=xml'
2 'country=US'
3 'ua=Mozilla
l=sp'
4 'zip=00000'
5 'category=books
l=en'
6 'format=xml'
7 'id=xyz
l=fr'
8 'country=US'
9 'alt-country=CA
'

So now you have 10 fields. Note that some of them probably don't look
like you expected: $fields[1] is "format=xhtml\n format=xml", that is is
contains an embedded newline and space. You probably wanted that to be
two fields: "format=xhtml" and "format=xml".

[...]

At the beginning of the second run through the loop, @data looks fine:

main::(rick2:19): chomp ($field);
DB<6> x @data
0 HASH(0x8cdb148)
'category' => ''
'country' => ''
'format' => ''
'id' => ''
'l' => 'en'
'ua' => ''
'zip' => ''

but at the second run it looks weird:

main::(rick2:19): chomp ($field);
DB<7> x @data
0 HASH(0x8cdb148)
'category' => ''
'country' => ''
'format' => 'xhtml
format'
'id' => ''
'l' => 'en'
'ua' => ''
'zip' => ''
1 HASH(0x8cdb148)
-> REUSED_ADDRESS
DB<8>

You have just pushed a second reference to %record to @data. So now both
elements point to the same data and both will be modified when you
modify %record.

hp
 
R

rick

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
  chomp($line);
  my @fields = split(/&/,$line);
  my %record = map { $_, ''} @keys;
  for my $field ( @fields ) {
    my( $key, $val ) = split(/=/,$field);
    $key = 'country' if $key eq 'alt-country';
    $record{$key} = $val;
  }


Is there a way to turn the 'country' / 'alt-country' check override
into a variable? For example, let's say I have 'zip' and 'alt-zip'
and a number of others. Basically, anytime there is an "alt-" it
should overwrite the original. I thought I could take care of easily
overriding this but it's not as straight forward as I thought. I have
printed out the various "alt-" attempts below to see if the output is
what I expect and it is when done in a straight 'print "alt-$key"' so
my guess is it has something to do with the value type, scalar vs.
what the hash actually has for values.

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
my( $key, $val ) = split(/=/,$field);

# works for this simple example but there are too many other
fields in the real logs to make this viable
#$key = 'country' if $key eq 'alt-country';
#$key = 'zip' if $key eq 'alt-zip';

# only the original values shown, not the "alt-" overrides
#$key = $key if $key eq "alt-$key";
#$key = "$key" if $key eq "alt-$key";

# fail with scalar errors
#$key = $key if $key eq "alt-"$key;
#$key = $key if $key eq 'alt-'$key;

$record{$key} = $val;
}
push( @data, \%record );

}

for my $record ( @data ) {
print join(',',@{$record}{@keys}), "\n";

}

__DATA__
l=en&format=xhtml
format=xml&country=US&ua=Mozilla
l=sp&zip=00000&category=books&alt-zip=00000-1234
l=en&format=xml&id=xyz
l=fr&country=US&alt-country=CA
__END__
 
T

Tad J McClellan

Is there a way to turn the 'country' / 'alt-country' check override


"override" has a precise technical meaning.

There is no overriding anywhere in this code.

There is "overwriting" in this code though...

into a variable? For example, let's say I have 'zip' and 'alt-zip'
and a number of others. Basically, anytime there is an "alt-" it
should overwrite the original. I thought I could take care of easily
overriding this but it's not as straight forward as I thought.


I expect that it is not as *complicated* as you thought!

my guess is it has something to do with the value type, scalar vs.
what the hash actually has for values.


That makes no sense.

*all* hash values are scalars.

It is not possible for a hash value to be anything other than a scalar.

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw( l format country zip category ua id );
my @data;
while( my $line = <DATA> ) {
chomp($line);
my @fields = split(/&/,$line);
my %record = map { $_, ''} @keys;
for my $field ( @fields ) {
my( $key, $val ) = split(/=/,$field);

$key =~ s/^alt-//; # convert "alt-anything" to "anything"

# fail with scalar errors
#$key = $key if $key eq "alt-"$key;


What the heck is a "scalar error"?

I've never heard of such a thing.

You really need to take more care in your terminology if you
hope to become a programmer.
 
R

rick

      $key =~ s/^alt-//;  # convert "alt-anything" to "anything"


Thanks Tad, that was what I was hoping to get accomplished. Trust me,
I know better than anyone that I need to learn the lingo better. I am
sure this problem has been solved many times over and the problem is
that I don't know the language well enough to know what search term to
use when looking through perldocs or searching previous posts in the
newsgroup. I have a few Perl books and searching through them and on
the newsgroup for hash didn't get me where I needed to be.

As a sysadmin I have to focus on just getting things working so I know
just enough of several languages to fix problems without having
mastered any. I have made it my mission this year to pick a language
and really focus on it. Knowing Bash and a few others really well
already, and wanting to expand a bit I have thought about Perl, Python
or Ruby. For the work that I do focusing on Perl makes the most sense
for me.

Thanks again
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top