join on space instead of comma

L

LHradowy

Right now I have a perl script that takes a comma separated file and adds a
couple of things to it as will as takes away the data at the end.
I have done this the hard way, by saving a file in excel and a comma
separated file, then ftp it over, dos2ux file >file1.

And this is the outcome BEFORE I run my perl script.
3xxxx18,00 0 02 00,TELN NOT
3xxxx22,00 0 03 11,CUST HAS >

Then after all that I run my perl script against it prompts user for input,
adds some data, then greps file for certain things, and creates 3 files.

What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not split on
spaces since I have spaces that I need to be part of a column. So, (how to
explain) instead of the above mention where there is a comma, I need to
split this file, based on criteria, and also add a comma between the
columns, so it looks like above...

This is the file I get before I save it as a comma separated file.
3xxxx33 00 0 00 21 CUSTOMER HAS
3xxxx63 00 0 01 07 CUSTOMER HAS
3xxxx75 00 0 02 09 CUSTOMER HAS
3xxxx85 00 0 12 09 TELN NOT BILL
3xxxx28 00 0 02 00 TELN NOT BILL
yada...

I want to avoid this step, how do I change my perl script to reflect this
instead of a comma.
Remember in the 2 and third fields there are spaces that I need.
OUTCOME
3xxxx33,BUILDING1,ROOM2,00 0 00 21,CUSTOMER HAS > 1
3xxxx66,BUILDING1,ROOM2,00 0 01 07,CUSTOMER HAS > 1
3xxxx75,BUILDING1,ROOM2,00 0 02 09,CUSTOMER HAS > 1
3xxxx85,BUILDING1,ROOM2,00 0 12 09,TELN NOT BILL

SCRIPT
*****************************

#!/opt/perl/bin/perl

use strict;
use warnings;

system ("clear"); #Clear the screen
my $acode = "204";

print "Enter BLD: ";
chomp (my $bld =<STDIN>);
my $CAPbld = uc($bld);
my $bld4=substr $CAPbld,0,4; #Pull first 4 char out of BLD for naming of
file

print "Enter Room: ";
chomp (my $room = <STDIN>);
my $CAProom = uc($room);

open my $fc, ">$bld4.cust_has" or die "$bld4.cust_has: $!";
open my $ft, ">$bld4.teln_not" or die "$bld4.teln_not: $!";
open my $fo, ">$bld4.PRTDIST.err" or die "$bld4.PRTDIST.err: $!";

while (<>) {
chomp; # Will remove the leading , or new line
my @a = split /,/, $_, -1;
my $f = /TELN/ ? $ft : /CUST/? $fc : $fo;
print $f join "," => $acode.$a[0],$CAPbld, $CAProom, $a[1], $a[2], "\n";
}
close $fc;
close $ft;
close $fo;

## Modify the cust_has file and pull only the first column.
my $fc_name = "$bld4.cust_has";
open (my $fc, $fc_name) or die "$fc_name:$!";
open my $fcC, ">$bld4.cust_has.tn" or die "$bld4.cust_has.tn: $!";
while (<$fc>) {
chomp;
my ( $FirstField,@Rest)=split /,/;
print $fcC join (",","'$FirstField',",)."\n";
}
close fc;
close fcC;

## Modify the teln_not file to take off last column
## File is now ready for report making.
my $fc_name2 = "$bld4.teln_not";
open (my $fc, $fc_name2) or die "$fc_name2:$!";
open my $fcT, ">$bld4.teln_not-1" or die "$bld4.teln_not-1: $!";
while (<$fc>) {
chomp;
my ( $FirstField1,$SecondField1,$ThirdField1,$FourthField1,@Rest)=split /,/;
print $fcT join
(",","$FirstField1","$SecondField1","$ThirdField1","$FourthField1",)."\n";
}
close fc;
close fcT;

`mv $bld4.teln_not-1 $bld4.teln_not`;
 
G

Gunnar Hjalmarsson

LHradowy said:
And this is the outcome BEFORE I run my perl script.
3xxxx18,00 0 02 00,TELN NOT
3xxxx22,00 0 03 11,CUST HAS >

What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not
split on spaces since I have spaces that I need to be part of a
column.

Can't you split on instances of multiple spaces?
So, (how to explain) instead of the above mention where there is a
comma, I need to split this file, based on criteria, and also add a
comma between the columns, so it looks like above...

This is the file I get before I save it as a comma separated file.
3xxxx33 00 0 00 21 CUSTOMER HAS > 1
3xxxx63 00 0 01 07 CUSTOMER HAS > 1
3xxxx75 00 0 02 09 CUSTOMER HAS > 1
3xxxx85 00 0 12 09 TELN NOT BILL
3xxxx28 00 0 02 00 TELN NOT BILL

my @a = split /,/, $_, -1;

s/\s+//;
my @a = split /\s{3,}/;
 
B

Brian McCauley

^^^^^^^^^^^^^^^^^
127.0.0.127.... cute!

my (@lines, @fields) = (<>);

I somehow find the technique of tagging extra variables into the LHS
of a list assigment in order to declare them just seems ugly.

Is there really any need to slup here anyhow? Whould it not be
simpler to read the input linewise.

Isn't @fields being declared at the wrong scope anyhow - it should be
inside the loop.
chomp @lines;

for (@lines) {
$fields[0] = substr $_,7,7;
$fields[1] = substr $_,39,10;
$fields[2] = substr $_,63;

For unpacking fixed position records you may want to consider unpack()
as an alternative to several substr().

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
A

Anno Siegel

bowsayge said:
LHradowy said to us:

[...]
What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not split on
spaces since I have spaces that I need to be part of a column.
[...]

You can extract substrings from your input lines like so:

Ah, you're learning fast. This begins to look like Perl code :)
Your solution is correct. I'll add a few comments about style and
point out alternatives.

I am aware, if I read your postings right, that you are rather new to
Perl, if not to programming in general. My (and other's) comments are
brief and often have the form of directions. They're still in the spirit
of "you can also do it this way", not of "you should have done it like this".
So...
my (@lines, @fields) = (<>);

You don't need to declare @fields here. Instead, declare it in the
smallest possible scope, which would be the loop body.

But even if you had to declare it here, it isn't the done thing to
combine a mere declaration with a massive operation like slurping the
file. Use an extra line.

The parens around said:
chomp @lines;

"chomp" can be applied to an assignment, even a list assignment. This
*is* idiomatic:

chomp( my @lines = said:
for (@lines) {

This would be the place to declare @fields. The array is cleared each
time my() happens at run-time, usually what you want.
$fields[0] = substr $_,7,7;
$fields[1] = substr $_,39,10;
$fields[2] = substr $_,63;

It is rare in Perl that you need to index into an array. (Hashes are
different.) The more you think of an array as a whole, the better.
This is certainly not a place for indexing.

my @fields = (
substr( $_,7,7),
substr( ...),
substr( ...),
);

But there is a better way. See below...
local $" = ',';

Nothing wrong with that, especially since it's properly localized. Still,
there's a tendency to avoid the "punctuation variables", with a few
exceptions.
print "@fields\n";

Without assignment to $"

print join( ',', @fields), "\n";

If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);
print join( ', ', @fields), "\n";
}

Anno
 
A

Andrew Palmer

Anno Siegel said:
If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

You don't need both a starting position and a string length for each field
(unpack() will pick up at the next field where it leaves off with the last).
If you need to strip trailing spaces, use capital "A" (which is meant for
extracting space-padded fields), rather than lowercase "a" (which is for
nul-terminated fields).

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);

For the data posted, the above happens to work the same, although this is my
preferred way:
my @fields = unpack( '@7 A32 A24 A*', $_);
print join( ', ', @fields), "\n";
}

(The "@7" is for the 7 spaces at the beginning of each line. Are they there
in the actual data, or was the example just indented?)
 
D

David Combs

SNIP

If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);
print join( ', ', @fields), "\n";
}

Anno


Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

I bet lots of people here would like to see what you've got!

Thanks,

David
 
T

Tassilo v. Parseval

Also sprach David Combs:
Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

Not that I'm Anno, but here's one that I find useful, namely the '/'
construct. The template preceeding the slash is used as a count argument
for the template following the slash:

# look at the first byte and extract that many
# bytes after that (3 in this case)
# as unsigned characters

my @x = unpack "c/C", "\x03\x00\x01\xff\x03";
print "@x\n";

__END__
0 1 255

Note how this can be combined with @:

my @x = unpack '@2c/C', "\x03\x00\x01\xff\x03";
print "@x\n",
__END__
255

Tassilo
 
A

Anno Siegel

David Combs said:
SNIP




Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

I bet lots of people here would like to see what you've got!

Not all that much, come to think of it. There's the bit-counting "%32b*",
but that is advertised right in the unpack doc and needs no promotion.
I use that one even more frequently than the substr() replacement,
but I may be inordinately fond of bit tables.

Other things thing to keep in mind about pack/unpack (though not idioms)
is the possibility of reading the length of a field from the data itself
(the "/" construct). Tassilo has also pointed this out.

Then there's the use of grouping parentheses in a template, which applies
a repeat count to a group of sub-templates at once. In the form
"(<composite template>)*" this is slightly more that syntactic sugar.

Together with the knowledge what pack/unpack generally are about, this
pretty much outlines the range of their applicability. The details
can be looked up when you decide one or the other is a likely candidate.
Very few template characters deserve to be known by heart, maybe

b - a single bit
a - a binary byte
i - a native integer (native to your C compiler)

Anno
 
D

David Combs

THANK YOU!

Now, finally, I have some *real* motivation to (finally) go
learn unpack, so I can *understand* all those tricks.

Any way you two can convince someone (O'Reilly?) to come
up with a "wild hacks with perl" book, and put out a
call for donated hacks to include in it?

Thanks again;

David
 
T

Tassilo v. Parseval

Also sprach David Combs:
Now, finally, I have some *real* motivation to (finally) go
learn unpack, so I can *understand* all those tricks.

Any way you two can convince someone (O'Reilly?) to come
up with a "wild hacks with perl" book, and put out a
call for donated hacks to include in it?

I am not sure that a book with such a title would do Perl's already
quite infamous reputation much good. :)

Tassilo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top