resolve single line with multiple items into mutliple lines, single items

E

ela

Old line(columns tab-delimited):

Col1 Col2 Col3 ... Coln
A B1@B2 C ... N1@N2@N3

New lines
A B1 C .. N1
A B1 C .. N2
A B1 C .. N3
A B2 C .. N1
A B2 C .. N2
A B2 C .. N3

The problem is: although pattern matching can recognize "@", but how to
write the code generically so to get all N1, N2 and N3, such that the number
of items aren't known beforehand?
 
W

Willem

ela wrote:
) Old line(columns tab-delimited):
)
) Col1 Col2 Col3 ... Coln
) A B1@B2 C ... N1@N2@N3
)
) New lines
) A B1 C .. N1
) A B1 C .. N2
) A B1 C .. N3
) A B2 C .. N1
) A B2 C .. N2
) A B2 C .. N3
)
) The problem is: although pattern matching can recognize "@", but how to
) write the code generically so to get all N1, N2 and N3, such that the number
) of items aren't known beforehand?

Well obviously first you create an array of arrays for the rows and columns.

And then how about something which looks a bit like:

for $i (1 .. $n) {
@columns = map {
my @row = @$_;
map {
(@row[0..($i-1)], $_, @row[($i+1).. $n])
} split('@', $row[$i]);
} @columns;
}

But of course this is overly complex and can probably be redoces to a
clever one-liner...


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
T

Tad J McClellan

ela said:
Old line(columns tab-delimited):

Col1 Col2 Col3 ... Coln
A B1@B2 C ... N1@N2@N3

New lines
A B1 C .. N1
A B1 C .. N2
A B1 C .. N3
A B2 C .. N1
A B2 C .. N2
A B2 C .. N3

The problem is: although pattern matching can recognize "@", but how to
write the code generically so to get all N1, N2 and N3, such that the number
of items aren't known beforehand?


-------------------------
#!/usr/bin/perl
use warnings;
use strict;

$_ = "A\tB1\@B2\tC\tN1\@N2\@N3\n";

#1 while s/(.*?)([^\t]+)\@([^\t\n]+)(.*\n)/$1$2$4$1$3$4/;

1 while s/(.*?) # before pair to expand
([^\t]+) # left value
\@
([^\t\n]+) # right value
(.*\n) # after pair to expand
/$1$2$4$1$3$4/x;

print;
 
E

ela

I really thank Willem & McClellan, who proposed solns. Yet, there are too
many symbols that can't be Googled, I fail to understand their codes....
 
J

Jürgen Exner

ela said:
Old line(columns tab-delimited):

Col1 Col2 Col3 ... Coln
A B1@B2 C ... N1@N2@N3

New lines
A B1 C .. N1
A B1 C .. N2
A B1 C .. N3
A B2 C .. N1
A B2 C .. N2
A B2 C .. N3

The problem is: although pattern matching can recognize "@", but how to
write the code generically so to get all N1, N2 and N3, such that the number
of items aren't known beforehand?

split() line at tab (to get indivudual column), then foreach() column
split() at '@' to get list of individual values.

This automatically leads to a nested loop, which you can use nicely to
print the lines in the desired order.

jue
 
T

Tad J McClellan

ela said:
I really thank Willem & McClellan, who proposed solns. Yet, there are too
many symbols that can't be Googled,


Good, because you don't want to find random crap on the interweb.

You want to find focused and accurate information on your own hard disk.

perldoc perlrequick

perldoc perlretut

perldoc perlre

I fail to understand their codes....


If you ask specifice questions about specific bits of code
(after trying to find out in the std docs first), you will
likely get help here.

"I fail to understand their codes" is too general for us
to be able to help you.
 
E

ela

split() line at tab (to get indivudual column), then foreach() column
split() at '@' to get list of individual values.

This automatically leads to a nested loop, which you can use nicely to
print the lines in the desired order.

jue

It seems that this is also a direction, can foreach() be recursively used?
Because I don't want to write "n" foreach()'s.
 
J

Jürgen Exner

ela said:
It seems that this is also a direction, can foreach() be recursively used?

???
Recursion and loops are two different ways to achive the same result:
repeating the execution of some code with modified data. Yes, of course
you can mix them as you like, but why would you want to?
Because I don't want to write "n" foreach()'s.

Having said that, I spoke too hastely. Nested foreach() are great to get
the individual values and store them e.g. in an AoA.
But creating the output within the same loop is very awkward and you
will be far better of storing the data first and using a second loop as
suggested by others or by using a recursive algorithm.

jue
 
S

sln

Old line(columns tab-delimited):

Col1 Col2 Col3 ... Coln
A B1@B2 C ... N1@N2@N3

New lines
A B1 C .. N1
A B1 C .. N2
A B1 C .. N3
A B2 C .. N1
A B2 C .. N2
A B2 C .. N3

The problem is: although pattern matching can recognize "@", but how to
write the code generically so to get all N1, N2 and N3, such that the number
of items aren't known beforehand?

I just saw this. I didn't read the other posted responces that may have
actually solved this apparent easy problem.

From now on, not only will you Chinese, pigeon-English speaking, non-Perl
programming, American dollar sucking folks have to provide some DOLLA'S,
for the solution (that is what you want isin't it, source and all?),
but you will have to LEARN ENOUGH CORRECT ENLISH TO PROPERLY EXPLAIN THE
PROBLEM !!!!

If this takes hiring an Amrican (English first language) translator, then all the
better. Dok, dac, toa, dit, do, don, just don't cut it.

-sln
 
C

ccc31807

Some will say this is a simple minded solution, and maybe it is, but
FWIW here's my contribution. This decomposes your data into a data
structure in memory. It's dynamic in the sense that it doesn't matter
how many records you have or where the @'s are, as long as you have
only two levels. All you have to do then is print it out. I have used
Dumper simply because I'm to lazy to finish it.

CODE:
use strict;
use warnings;
use Data::Dumper;

while (<DATA>)
{
my @rest = split /\t/;
my $num = @rest;
for (my $i = 0; $i < $num; $i++)
{
if ($rest[$i] =~ /@/)
{
$rest[$i] = [split /@/, $rest[$i]];
}
print qq(\t$rest[$i]\n);
}
print "\nData Structure via Dumper is:\n";
print Dumper(@rest);
}

exit(0);

__DATA__
A B1@B2 C d e f N1@N2@N3

OUTPUT:

C:\PerlLearn>perl multiple.plx
A
ARRAY(0x235348)
C
d
e
f
ARRAY(0x182471c)

Data Structure via Dumper is:
$VAR1 = 'A';
$VAR2 = [
'B1',
'B2'
];
$VAR3 = 'C';
$VAR4 = 'd';
$VAR5 = 'e';
$VAR6 = 'f';
$VAR7 = [
'N1',
'N2',
'N3
'
];

C:\PerlLearn>
 
U

Uri Guttman

BM> There is no need to exit() from a Perl program under normal
BM> circumstances. Falling off the end will exit successfully.

i like to have explicit exits in my main program. i usually keep the top
level inline code very short with a few key lexicals and top sub calls
and then exit(). then come the subs in some semblance of order. arg
parsing and help/usage subs always go to the bottom out of the way. this
is how i teach to write scripts so they are easy to develop AND
read. and the explicit exit tells you the top level code is done and you
don't have to scan for it (or fall to the bottom) to see any more main
level code.

uri
 
C

ccc31807

i like to have explicit exits in my main program. i usually keep the top
level inline code very short with a few key lexicals and top sub calls
and then exit(). then come the subs in some semblance of order. arg
parsing and help/usage subs always go to the bottom out of the way. this
is how i teach to write scripts so they are easy to develop AND
read. and the explicit exit tells you the top level code is done and you
don't have to scan for it (or fall to the bottom) to see any more main
level code.

I agree fully.

As a matter of style, you can write functions that only receive
arguments and return values with no side effects or assignments
withing the functions, or you can write functions that make
assignments and have side effects.

Philosophically, I'm inclined to the first style, and attempt to write
in that style.

In practice, I normally write in the second style, so that my 'main'
program is very short and consists on of a sequence of function calls
(followed by exit(0)). The bulk of the work, including variable
assignments, are done by my user defined functions.

I'm tending now to use a lot of modules, so that my 'main' program
still consists of sequences of function calls, my user defined
functions still avoid side effects and assignments as much as
possible, and the dirty work is done in the modules. I don't
particularly like this, and my style will probably continue to change.

Your thoughts?

CC
 
U

Uri Guttman

c> As a matter of style, you can write functions that only receive
c> arguments and return values with no side effects or assignments
c> withing the functions, or you can write functions that make
c> assignments and have side effects.

it varies. in some cases a few top level lexicals are ok by me.

c> I'm tending now to use a lot of modules, so that my 'main' program
c> still consists of sequences of function calls, my user defined
c> functions still avoid side effects and assignments as much as
c> possible, and the dirty work is done in the modules. I don't
c> particularly like this, and my style will probably continue to change.

you can always pass in a main hash ref to keep all the top level
stuff. as i said, it varies based on my mood and the complexity of the
program's top level.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top