join on space instead of comma

Discussion in 'Perl Misc' started by LHradowy, Aug 4, 2004.

  1. LHradowy

    LHradowy Guest

    Right now I have a perl script that takes a comma separated file and adds a
    couple of things to it as will as takes away the data at the end.
    I have done this the hard way, by saving a file in excel and a comma
    separated file, then ftp it over, dos2ux file >file1.

    And this is the outcome BEFORE I run my perl script.
    3xxxx18,00 0 02 00,TELN NOT
    3xxxx22,00 0 03 11,CUST HAS >

    Then after all that I run my perl script against it prompts user for input,
    adds some data, then greps file for certain things, and creates 3 files.

    What I want to do is elinate the first part of saving it as a comma
    separated file. I belive I can do this in perl, but I can not split on
    spaces since I have spaces that I need to be part of a column. So, (how to
    explain) instead of the above mention where there is a comma, I need to
    split this file, based on criteria, and also add a comma between the
    columns, so it looks like above...

    This is the file I get before I save it as a comma separated file.
    3xxxx33 00 0 00 21 CUSTOMER HAS
    > 1

    3xxxx63 00 0 01 07 CUSTOMER HAS
    > 1

    3xxxx75 00 0 02 09 CUSTOMER HAS
    > 1

    3xxxx85 00 0 12 09 TELN NOT BILL
    3xxxx28 00 0 02 00 TELN NOT BILL
    yada...

    I want to avoid this step, how do I change my perl script to reflect this
    instead of a comma.
    Remember in the 2 and third fields there are spaces that I need.
    OUTCOME
    3xxxx33,BUILDING1,ROOM2,00 0 00 21,CUSTOMER HAS > 1
    3xxxx66,BUILDING1,ROOM2,00 0 01 07,CUSTOMER HAS > 1
    3xxxx75,BUILDING1,ROOM2,00 0 02 09,CUSTOMER HAS > 1
    3xxxx85,BUILDING1,ROOM2,00 0 12 09,TELN NOT BILL

    SCRIPT
    *****************************

    #!/opt/perl/bin/perl

    use strict;
    use warnings;

    system ("clear"); #Clear the screen
    my $acode = "204";

    print "Enter BLD: ";
    chomp (my $bld =<STDIN>);
    my $CAPbld = uc($bld);
    my $bld4=substr $CAPbld,0,4; #Pull first 4 char out of BLD for naming of
    file

    print "Enter Room: ";
    chomp (my $room = <STDIN>);
    my $CAProom = uc($room);

    open my $fc, ">$bld4.cust_has" or die "$bld4.cust_has: $!";
    open my $ft, ">$bld4.teln_not" or die "$bld4.teln_not: $!";
    open my $fo, ">$bld4.PRTDIST.err" or die "$bld4.PRTDIST.err: $!";

    while (<>) {
    chomp; # Will remove the leading , or new line
    my @a = split /,/, $_, -1;
    my $f = /TELN/ ? $ft : /CUST/? $fc : $fo;
    print $f join "," => $acode.$a[0],$CAPbld, $CAProom, $a[1], $a[2], "\n";
    }
    close $fc;
    close $ft;
    close $fo;

    ## Modify the cust_has file and pull only the first column.
    my $fc_name = "$bld4.cust_has";
    open (my $fc, $fc_name) or die "$fc_name:$!";
    open my $fcC, ">$bld4.cust_has.tn" or die "$bld4.cust_has.tn: $!";
    while (<$fc>) {
    chomp;
    my ( $FirstField,@Rest)=split /,/;
    print $fcC join (",","'$FirstField',",)."\n";
    }
    close fc;
    close fcC;

    ## Modify the teln_not file to take off last column
    ## File is now ready for report making.
    my $fc_name2 = "$bld4.teln_not";
    open (my $fc, $fc_name2) or die "$fc_name2:$!";
    open my $fcT, ">$bld4.teln_not-1" or die "$bld4.teln_not-1: $!";
    while (<$fc>) {
    chomp;
    my ( $FirstField1,$SecondField1,$ThirdField1,$FourthField1,@Rest)=split /,/;
    print $fcT join
    (",","$FirstField1","$SecondField1","$ThirdField1","$FourthField1",)."\n";
    }
    close fc;
    close fcT;

    `mv $bld4.teln_not-1 $bld4.teln_not`;
     
    LHradowy, Aug 4, 2004
    #1
    1. Advertising

  2. LHradowy wrote:
    >
    > And this is the outcome BEFORE I run my perl script.
    > 3xxxx18,00 0 02 00,TELN NOT
    > 3xxxx22,00 0 03 11,CUST HAS >


    <snip>

    > What I want to do is elinate the first part of saving it as a comma
    > separated file. I belive I can do this in perl, but I can not
    > split on spaces since I have spaces that I need to be part of a
    > column.


    Can't you split on instances of multiple spaces?

    > So, (how to explain) instead of the above mention where there is a
    > comma, I need to split this file, based on criteria, and also add a
    > comma between the columns, so it looks like above...
    >
    > This is the file I get before I save it as a comma separated file.
    > 3xxxx33 00 0 00 21 CUSTOMER HAS > 1
    > 3xxxx63 00 0 01 07 CUSTOMER HAS > 1
    > 3xxxx75 00 0 02 09 CUSTOMER HAS > 1
    > 3xxxx85 00 0 12 09 TELN NOT BILL
    > 3xxxx28 00 0 02 00 TELN NOT BILL


    <snip>

    > my @a = split /,/, $_, -1;


    s/\s+//;
    my @a = split /\s{3,}/;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 4, 2004
    #2
    1. Advertising

  3. bowsayge <> writes:
    ^^^^^^^^^^^^^^^^^
    127.0.0.127.... cute!


    > my (@lines, @fields) = (<>);


    I somehow find the technique of tagging extra variables into the LHS
    of a list assigment in order to declare them just seems ugly.

    Is there really any need to slup here anyhow? Whould it not be
    simpler to read the input linewise.

    Isn't @fields being declared at the wrong scope anyhow - it should be
    inside the loop.

    > chomp @lines;
    >
    > for (@lines) {
    > $fields[0] = substr $_,7,7;
    > $fields[1] = substr $_,39,10;
    > $fields[2] = substr $_,63;


    For unpacking fixed position records you may want to consider unpack()
    as an alternative to several substr().

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
     
    Brian McCauley, Aug 4, 2004
    #3
  4. LHradowy

    Anno Siegel Guest

    bowsayge <> wrote in comp.lang.perl.misc:
    > LHradowy said to us:
    >
    > [...]
    > > What I want to do is elinate the first part of saving it as a comma
    > > separated file. I belive I can do this in perl, but I can not split on
    > > spaces since I have spaces that I need to be part of a column.

    > [...]
    >
    > You can extract substrings from your input lines like so:


    Ah, you're learning fast. This begins to look like Perl code :)
    Your solution is correct. I'll add a few comments about style and
    point out alternatives.

    I am aware, if I read your postings right, that you are rather new to
    Perl, if not to programming in general. My (and other's) comments are
    brief and often have the form of directions. They're still in the spirit
    of "you can also do it this way", not of "you should have done it like this".
    So...

    > my (@lines, @fields) = (<>);


    You don't need to declare @fields here. Instead, declare it in the
    smallest possible scope, which would be the loop body.

    But even if you had to declare it here, it isn't the done thing to
    combine a mere declaration with a massive operation like slurping the
    file. Use an extra line.

    The parens around "<>" aren't needed and un-idiomatic.

    > chomp @lines;


    "chomp" can be applied to an assignment, even a list assignment. This
    *is* idiomatic:

    chomp( my @lines = <>);

    > for (@lines) {


    This would be the place to declare @fields. The array is cleared each
    time my() happens at run-time, usually what you want.

    > $fields[0] = substr $_,7,7;
    > $fields[1] = substr $_,39,10;
    > $fields[2] = substr $_,63;


    It is rare in Perl that you need to index into an array. (Hashes are
    different.) The more you think of an array as a whole, the better.
    This is certainly not a place for indexing.

    my @fields = (
    substr( $_,7,7),
    substr( ...),
    substr( ...),
    );

    But there is a better way. See below...

    > local $" = ',';


    Nothing wrong with that, especially since it's properly localized. Still,
    there's a tendency to avoid the "punctuation variables", with a few
    exceptions.

    > print "@fields\n";


    Without assignment to $"

    print join( ',', @fields), "\n";

    > }


    If you have to extract fields of fixed length at fixed positions,
    the unpack() function is the right tool. It can extract multiple
    substrings in one step.

    "pack" and "unpack" and their formats are a sub-language of its own.
    No-one memorizes all of it, but a few idioms are worth memorizing.
    One is, to extract a substring of length $length at position $pos,
    the unpack template is "@${pos}a$length". Putting it all together,
    your solution becomes

    chomp( my @lines = <DATA>);
    for ( @lines ) {
    my @fields = unpack( '@7a7 @39a10 @63a*', $_);
    print join( ', ', @fields), "\n";
    }

    Anno
     
    Anno Siegel, Aug 4, 2004
    #4
  5. "Anno Siegel" <-berlin.de> wrote in message
    news:cer6vn$8is$-Berlin.DE...
    > If you have to extract fields of fixed length at fixed positions,
    > the unpack() function is the right tool. It can extract multiple
    > substrings in one step.
    >
    > "pack" and "unpack" and their formats are a sub-language of its own.
    > No-one memorizes all of it, but a few idioms are worth memorizing.
    > One is, to extract a substring of length $length at position $pos,
    > the unpack template is "@${pos}a$length". Putting it all together,
    > your solution becomes


    You don't need both a starting position and a string length for each field
    (unpack() will pick up at the next field where it leaves off with the last).
    If you need to strip trailing spaces, use capital "A" (which is meant for
    extracting space-padded fields), rather than lowercase "a" (which is for
    nul-terminated fields).


    >
    > chomp( my @lines = <DATA>);
    > for ( @lines ) {
    > my @fields = unpack( '@7a7 @39a10 @63a*', $_);


    For the data posted, the above happens to work the same, although this is my
    preferred way:
    my @fields = unpack( '@7 A32 A24 A*', $_);

    > print join( ', ', @fields), "\n";
    > }


    (The "@7" is for the 7 spaces at the beginning of each line. Are they there
    in the actual data, or was the example just indented?)
     
    Andrew Palmer, Aug 5, 2004
    #5
  6. LHradowy

    David Combs Guest

    In article <cer6vn$8is$-Berlin.DE>,
    Anno Siegel <-berlin.de> wrote:

    SNIP


    >If you have to extract fields of fixed length at fixed positions,
    >the unpack() function is the right tool. It can extract multiple
    >substrings in one step.
    >
    >"pack" and "unpack" and their formats are a sub-language of its own.
    >No-one memorizes all of it, but a few idioms are worth memorizing.
    >One is, to extract a substring of length $length at position $pos,
    >the unpack template is "@${pos}a$length". Putting it all together,
    >your solution becomes
    >
    > chomp( my @lines = <DATA>);
    > for ( @lines ) {
    > my @fields = unpack( '@7a7 @39a10 @63a*', $_);
    > print join( ', ', @fields), "\n";
    > }
    >
    >Anno



    Anno -- what are the *other* pack-unpack idioms you think worth
    memorizing?

    I bet lots of people here would like to see what you've got!

    Thanks,

    David
     
    David Combs, Aug 7, 2004
    #6
  7. Also sprach David Combs:

    > In article <cer6vn$8is$-Berlin.DE>,
    > Anno Siegel <-berlin.de> wrote:


    >>If you have to extract fields of fixed length at fixed positions,
    >>the unpack() function is the right tool. It can extract multiple
    >>substrings in one step.
    >>
    >>"pack" and "unpack" and their formats are a sub-language of its own.
    >>No-one memorizes all of it, but a few idioms are worth memorizing.
    >>One is, to extract a substring of length $length at position $pos,
    >>the unpack template is "@${pos}a$length". Putting it all together,
    >>your solution becomes
    >>
    >> chomp( my @lines = <DATA>);
    >> for ( @lines ) {
    >> my @fields = unpack( '@7a7 @39a10 @63a*', $_);
    >> print join( ', ', @fields), "\n";
    >> }
    >>
    >>Anno

    >
    >
    > Anno -- what are the *other* pack-unpack idioms you think worth
    > memorizing?


    Not that I'm Anno, but here's one that I find useful, namely the '/'
    construct. The template preceeding the slash is used as a count argument
    for the template following the slash:

    # look at the first byte and extract that many
    # bytes after that (3 in this case)
    # as unsigned characters

    my @x = unpack "c/C", "\x03\x00\x01\xff\x03";
    print "@x\n";

    __END__
    0 1 255

    Note how this can be combined with @:

    my @x = unpack '@2c/C', "\x03\x00\x01\xff\x03";
    print "@x\n",
    __END__
    255

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Aug 7, 2004
    #7
  8. LHradowy

    Anno Siegel Guest

    David Combs <> wrote in comp.lang.perl.misc:
    > In article <cer6vn$8is$-Berlin.DE>,
    > Anno Siegel <-berlin.de> wrote:
    >
    > SNIP
    >
    >
    > >If you have to extract fields of fixed length at fixed positions,
    > >the unpack() function is the right tool. It can extract multiple
    > >substrings in one step.
    > >
    > >"pack" and "unpack" and their formats are a sub-language of its own.
    > >No-one memorizes all of it, but a few idioms are worth memorizing.
    > >One is, to extract a substring of length $length at position $pos,
    > >the unpack template is "@${pos}a$length". Putting it all together,
    > >your solution becomes
    > >
    > > chomp( my @lines = <DATA>);
    > > for ( @lines ) {
    > > my @fields = unpack( '@7a7 @39a10 @63a*', $_);
    > > print join( ', ', @fields), "\n";
    > > }
    > >
    > >Anno

    >
    >
    > Anno -- what are the *other* pack-unpack idioms you think worth
    > memorizing?
    >
    > I bet lots of people here would like to see what you've got!


    Not all that much, come to think of it. There's the bit-counting "%32b*",
    but that is advertised right in the unpack doc and needs no promotion.
    I use that one even more frequently than the substr() replacement,
    but I may be inordinately fond of bit tables.

    Other things thing to keep in mind about pack/unpack (though not idioms)
    is the possibility of reading the length of a field from the data itself
    (the "/" construct). Tassilo has also pointed this out.

    Then there's the use of grouping parentheses in a template, which applies
    a repeat count to a group of sub-templates at once. In the form
    "(<composite template>)*" this is slightly more that syntactic sugar.

    Together with the knowledge what pack/unpack generally are about, this
    pretty much outlines the range of their applicability. The details
    can be looked up when you decide one or the other is a likely candidate.
    Very few template characters deserve to be known by heart, maybe

    b - a single bit
    a - a binary byte
    i - a native integer (native to your C compiler)

    Anno
     
    Anno Siegel, Aug 7, 2004
    #8
  9. LHradowy

    David Combs Guest

    THANK YOU!

    Now, finally, I have some *real* motivation to (finally) go
    learn unpack, so I can *understand* all those tricks.

    Any way you two can convince someone (O'Reilly?) to come
    up with a "wild hacks with perl" book, and put out a
    call for donated hacks to include in it?

    Thanks again;

    David
     
    David Combs, Aug 11, 2004
    #9
  10. Also sprach David Combs:

    > Now, finally, I have some *real* motivation to (finally) go
    > learn unpack, so I can *understand* all those tricks.
    >
    > Any way you two can convince someone (O'Reilly?) to come
    > up with a "wild hacks with perl" book, and put out a
    > call for donated hacks to include in it?


    I am not sure that a book with such a title would do Perl's already
    quite infamous reputation much good. :)

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Aug 11, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,914
    Bryan Bullard
    Jul 11, 2003
  2. Christian Seberino
    Replies:
    21
    Views:
    1,680
    Stephen Horne
    Oct 27, 2003
  3. Ian Bicking
    Replies:
    2
    Views:
    1,027
    Steve Lamb
    Oct 23, 2003
  4. Ian Bicking
    Replies:
    2
    Views:
    730
    Michael Hudson
    Oct 24, 2003
  5. J. Romano
    Replies:
    2
    Views:
    120
    J. Romano
    Aug 18, 2004
Loading...

Share This Page