Create two-dimensional Array from string

P

pjfalbe

I'm looking for the most efficient way to create a two dimensional
array from strings such as

MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}

In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.
 
S

Sandy

I'm looking for the most efficient way to create a two dimensional
array from strings such as

MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}

In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.

Why don't you clarify what do you mean by 'efficiently'. Are you
optimizing the speed of adding records or finding a record in the
collection (based on which fields?) or what? How are you doing this
currently and what appears to be a problem?

It is hard to suggest any particular approach without such details. In
general it is simple:

my $record1 = [1, 3667, 'WARREN', 'OH'];
my $record2 = [2, 2948, 'SHAKER HTS', 'OH'];

my @records = ();

push @records, $record1;
push @records, $record2;

/sandy
http://myperlquiz.com/
 
B

Ben Morrow

Quoth "[email protected] said:
I'm looking for the most efficient way to create a two dimensional
array from strings such as

MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}

In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.

You haven;t given your code, and you haven't specified the format
accurately. For instance, can those single-quoted strings contain
commas?

I would start with something like

my @multiset;
my $multiset = q{MULTISET{ROW(...)}};

$multiset =~ /MULTISET \{ ROW \( (.*) \) \}/x
or die "bad string: $multiset";

for my $row (split / \) , ROW \( /x, $1) {
push @multiset, [split /,/, $row];
}

possibly using something like Text::CSV_XS if the comma-separated
section is more complicated than that.

Ben
 
T

Ted Zlatanov

pc> I'm looking for the most efficient way to create a two dimensional
pc> array from strings such as

pc> MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
pc> ','SHAKER HTS','OH')}

pc> In this case MULTISET is a collection of rows from a DB. What I want
pc> todo is efficiently
pc> transform into a 2-dimensional array, where each row is an array.
pc> And then collect those arrays
pc> into 1 array of arrays. I'm currently doing this but not very
pc> efficiently. Any help would be appreciated.

If they are really coming from a DB query, you could use the appropriate
*arrayref DBI functions to get the array back effortlessly. What you
describe is exactly what they do. Note there are many DBD::* modules
(the data drivers for DBI) so you may not be limited to classic RDBMS
(Oracle, MySQL, Postgres, etc) queries. For example see DBD::Google and
DBD::CSV.

Ted
 
J

jjcassidy

I'm looking for the most efficient way to create a two dimensional
array from strings such as

MULTISET{ROW(1     ,'  3667 ','WARREN    ','OH'),ROW(2     ,'  2948
','SHAKER HTS','OH')}

In this case MULTISET is a collection of rows from a DB.  What I want
todo is efficiently
transform into a 2-dimensional array,  where each row is an array.
And then collect those arrays
into 1 array of arrays.  I'm currently doing this but not very
efficiently.  Any help would be appreciated.

Let Perl parse it for you. You can *eval* it.

use Data::Dumper;

sub ROW (@) { return [ @_ ]; }

sub MULTISET (&) { return said:

# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,'
2948','SHAKER HTS','OH')}
END_EVAL1

print Dumper( $multi_set ), "\n";

You could even define the two functions as below if you simply wanted
the first field for ordering.

sub ROW (@) { return ( shift, [ @_ ] ); }

sub MULTISET (&) {
my %hash = ( shift()->() );
return [ map { $hash{$_} } sort { $a <=> $b } keys %hash ];
}

or

sub MULTISET ($) { # for {} as a hash ref
my $hash_ref = shift;
return [ map { $hash_ref->{$_} } sort { $a <=> $b } keys %
$hash_ref ];
}
 
J

jgraber

I'm looking for the most efficient way to create a two dimensional
array from strings such as

MULTISET{ROW(1     ,'  3667 ','WARREN    ','OH'),ROW(2     ,'  2948
','SHAKER HTS','OH')}

Let Perl parse it for you. You can *eval* it.

use Data::Dumper;

sub ROW (@) { return [ @_ ]; }

sub MULTISET (&) { return said:

# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,'
2948','SHAKER HTS','OH')}
END_EVAL1

Beware eval with unsanitized user inputs like

http://xkcd.com/327/
 
J

jjcassidy

Let Perl parse it for you. You can *eval* it.
use Data::Dumper;
sub ROW (@) { return [ @_ ]; }
sub MULTISET (&) { return said:
() ];  }
# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1     ,'  3667 ','WARREN    ','OH'),ROW(2    ,'
2948','SHAKER HTS','OH')}
END_EVAL1

Beware eval with unsanitized user inputs like

http://xkcd.com/327/

Oh yeah. But you have to admit: it's snazzy. :)
 
P

pjfalbe

I'm looking for the most efficient way to create a two dimensional
array from strings such as
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}
In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.

Let Perl parse it for you. You can *eval* it.

use Data::Dumper;

sub ROW (@) { return [ @_ ]; }

sub MULTISET (&) { return said:

# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,'
2948','SHAKER HTS','OH')}
END_EVAL1

print Dumper( $multi_set ), "\n";

You could even define the two functions as below if you simply wanted
the first field for ordering.

This works great! Much better than all the sed'ing and splitting I
was doing. For reference I'm doing a query like

select customer, multiset(select count(*), city, state
from orders A
where A.customer = orders.customer
and A.order_date = orders.order_date
group by 2,3)
from orders
where order_date = today;


With your transform I can now take the field returned from the
multiset and write to WriteExcel spreadsheet very easily.
 
T

Ted Zlatanov

I'm looking for the most efficient way to create a two dimensional
array from strings such as
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}
In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.

Let Perl parse it for you. You can *eval* it.

use Data::Dumper;

sub ROW (@) { return [ @_ ]; }

sub MULTISET (&) { return said:

# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,'
2948','SHAKER HTS','OH')}
END_EVAL1

print Dumper( $multi_set ), "\n";

You could even define the two functions as below if you simply wanted
the first field for ordering.

pc> This works great! Much better than all the sed'ing and splitting I
pc> was doing. For reference I'm doing a query like

pc> select customer, multiset(select count(*), city, state
pc> from orders A
pc> where A.customer = orders.customer
pc> and A.order_date = orders.order_date
pc> group by 2,3)
pc> from orders
pc> where order_date = today;

pc> With your transform I can now take the field returned from the
pc> multiset and write to WriteExcel spreadsheet very easily.

(checking it's not April 1)

Good god, you're going to use a custom parser instead of the nice DBI
interface I suggested?

my $array = $dbh->selectall_arrayref('your select statement goes here');

Even worse, it's a hacked-up eval-based parser that will fail miserably
if you look at it sideways, and performance is going to be miserable...
The parser is clever, but you should really consider submitting to
thedailywtf.com if you actually use it.

Ted
 
P

pjfalbe

I'm looking for the most efficient way to create a two dimensional
array from strings such as
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,' 2948
','SHAKER HTS','OH')}
In this case MULTISET is a collection of rows from a DB. What I want
todo is efficiently
transform into a 2-dimensional array, where each row is an array.
And then collect those arrays
into 1 array of arrays. I'm currently doing this but not very
efficiently. Any help would be appreciated.
Let Perl parse it for you. You can *eval* it.
use Data::Dumper;
sub ROW (@) { return [ @_ ]; }
sub MULTISET (&) { return [ sort { $a->[0] <=> $b->[0] } shift()-
() ]; }
# observe:
my $multi_set = eval <<END_EVAL1;
MULTISET{ROW(1 ,' 3667 ','WARREN ','OH'),ROW(2 ,'
2948','SHAKER HTS','OH')}
END_EVAL1
print Dumper( $multi_set ), "\n";
You could even define the two functions as below if you simply wanted
the first field for ordering.

pc> This works great! Much better than all the sed'ing and splitting I
pc> was doing. For reference I'm doing a query like

pc> select customer, multiset(select count(*), city, state
pc> from orders A
pc> where A.customer = orders.customer
pc> and A.order_date = orders.order_date
pc> group by 2,3)
pc> from orders
pc> where order_date = today;

pc> With your transform I can now take the field returned from the
pc> multiset and write to WriteExcel spreadsheet very easily.

(checking it's not April 1)

Good god, you're going to use a custom parser instead of the nice DBI
interface I suggested?

my $array = $dbh->selectall_arrayref('your select statement goes here');

Even worse, it's a hacked-up eval-based parser that will fail miserably
if you look at it sideways, and performance is going to be miserable...
The parser is clever, but you should really consider submitting to
thedailywtf.com if you actually use it.

Ted

I went ahead and tried selectall_arrayref and as I thought it did no
better than it's cousin fetchall_arrayref.
The problem is a MULTISET returned by DBI/DBD as a varchar and the
driver doesn't know how to parse
it into an array. There are 2 solutions to this 1) separate the
mutliset into a separate query and loop through original cursor. or
2) manipulate the varchar into an array like above with the eval or
use 's//g' and splits to manipulate
into a array. Luckily MULTISETs are pretty reliable in what they
return so the eval approach works great. And it's
performance problems are better than network latency by looping.

Thank you for your suggestion got me to look at selectall_arrayref,
I've had a habit of using fetchall_arrayref
but will use selectall_arrayref more in the future.
 
T

Ted Zlatanov

pc> I went ahead and tried selectall_arrayref and as I thought it did no
pc> better than it's cousin fetchall_arrayref.
pc> The problem is a MULTISET returned by DBI/DBD as a varchar and the
pc> driver doesn't know how to parse
pc> it into an array. There are 2 solutions to this 1) separate the
pc> mutliset into a separate query and loop through original cursor. or
pc> 2) manipulate the varchar into an array like above with the eval or
pc> use 's//g' and splits to manipulate
pc> into a array. Luckily MULTISETs are pretty reliable in what they
pc> return so the eval approach works great. And it's
pc> performance problems are better than network latency by looping.

pc> Thank you for your suggestion got me to look at selectall_arrayref,
pc> I've had a habit of using fetchall_arrayref
pc> but will use selectall_arrayref more in the future.

Well, I owe you an apology because I thought MULTISETs were parseable by
the DBI. So you do need a custom parser or to split the results up.
Maybe a temporary table would work best for you if there are only a few
results (but you'll have to have an extra field to distinguish between
multiple instances of the same query, and it gets ugly quickly).

I've posted in the past here about the dangers of evaluating data,
especially data you don't control. So the eval-based parser worried me
(and others who have responded). I would use a regular expression as
you mentioned, if possible, or write a more complex state-based parser.

Ted
 
P

pjfalbe

pc> I went ahead and tried selectall_arrayref and as I thought it did no
pc> better than it's cousin fetchall_arrayref.
pc> The problem is a MULTISET returned by DBI/DBD as a varchar and the
pc> driver doesn't know how to parse
pc> it into an array. There are 2 solutions to this 1) separate the
pc> mutliset into a separate query and loop through original cursor. or
pc> 2) manipulate the varchar into an array like above with the eval or
pc> use 's//g' and splits to manipulate
pc> into a array. Luckily MULTISETs are pretty reliable in what they
pc> return so the eval approach works great. And it's
pc> performance problems are better than network latency by looping.

pc> Thank you for your suggestion got me to look at selectall_arrayref,
pc> I've had a habit of using fetchall_arrayref
pc> but will use selectall_arrayref more in the future.

Well, I owe you an apology because I thought MULTISETs were parseable by
the DBI. So you do need a custom parser or to split the results up.

Not a problem. I just wanted clear up what was going on.
Maybe a temporary table would work best for you if there are only a few
results (but you'll have to have an extra field to distinguish between
multiple instances of the same query, and it gets ugly quickly).
Exactly.


I've posted in the past here about the dangers of evaluating data,
especially data you don't control. So the eval-based parser worried me
(and others who have responded). I would use a regular expression as
you mentioned, if possible, or write a more complex state-based parser.

I'm usually pretty careful about eval'ing strings usually turn taint
mode on to
force myself to check it. If data wasn't coming from an internal
controled
database table wouldn't even attempt this. Also, when installing to
production
I have to pass perlcritic so, I have to be doubly sure so I turn it
off that section of code...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top