space deliminated to comma delinated with varried and need spaces between some columns

Discussion in 'Perl Misc' started by LHradowy, Sep 20, 2004.

  1. LHradowy

    LHradowy Guest

    I have file that looks like this...
    1555002 00 0 04 27 TELN NOT BILL
    3555007 00 0 06 00 CUSTOMER HAS
    5555410 00 0 12 10 CUSTOMER HAS
    6755012 00 0 12 06 CUSTOMER HAS
    Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
    Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

    I need to created a perl script that takes this file and makes it look like
    1555002,00 0 04 27,TELN NOT BILL
    3555007,00 0 06 00,CUSTOMER HAS > 1
    5555410,00 0 12 10,CUSTOMER HAS > 1
    6755012,00 0 12 06,CUSTOMER HAS > 1

    This output needs to be written to a file.
    I have no idea how to start, if I split on a space " " the it will spit the
    third an fourth column up. The fourth column can basically be left alone.

    Thanks for the help.
    LHradowy, Sep 20, 2004
    1. Advertisements

  2. Please see the thread "
    Replacing spaces" that was discussed here over the weekend.
    The solutions posted in the thread mentioned above will leave those alone.

    perldoc -f open
    perldoc -f open
    perldoc -f print
    So, what is the distinguishing difference between the separator for the
    items in the third column on the one hand and the separator between the
    third column and the fourth column on the other hand?

    Jürgen Exner, Sep 20, 2004
    1. Advertisements

  3. LHradowy

    Shawn Corey Guest


    If the data is in fixed columns, you can use substr.

    perldoc -f substr

    --- Shawn
    Shawn Corey, Sep 20, 2004
  4. LHradowy

    Ian Wilson Guest

    If the data always has multiple spaces (ASCII 32) between fields, I'd
    try stripping the leading spaces and then converting >1 consecutive
    spaces to commas:

    perl -e -p 's/^ +//; s/ +/,/g' oldfile > newfile

    But I expect Shawn's substr solution to be more robust. Using unpack may
    be another useful approach.
    Ian Wilson, Sep 20, 2004
  5. LHradowy

    Tore Aursand Guest

    If we skip everything that has got to do with the file(s), here's a
    suggestion (untested);

    while ( <DATA> ) {
    chomp; # Get rid of line breaks
    s,^\s+,,; # Remove leading spaces
    my @cols = split( /\s+{2,}/, $_ ); # Split on two (or more) spaces
    print join( ',', @cols ) . "\n";
    Tore Aursand, Sep 20, 2004
  6. -----------------------------^^^^^

    Maybe you should have tested it... ;-)
    Gunnar Hjalmarsson, Sep 20, 2004
  7. LHradowy

    LHradowy Guest

    Ahhh, I think I am forgetting something, THIS is exactly what I want!
    But I am getting an error when I run it, and my skills at perl are weak.

    use strict;
    use warnings;

    while (<>) {
    chomp; # Will remove the leading , or new line
    s,^\s+,,; #Remove leading spaces
    my @cols=split(/\s+{2,}/,$_); #Split on two (or more) spaces
    print join (',',@cols)."\n";

    [email protected]$ ./ file
    Nested quantifiers in regex; marked by <-- HERE in m/\s+{ <-- HERE 2,}/ at
    ../ line 10.
    LHradowy, Sep 21, 2004
  8. LHradowy

    LHradowy Guest

    I like this but I get nothing back in the new file. And I have no tabs they
    are all spaces.
    LHradowy, Sep 21, 2004
  9. LHradowy

    Tore Aursand Guest

    You are so right, Gunnar, and I'm terribly sorry. The correct split()
    should - of course - look like this:

    my @cols = split( /\s{2,}/, $_ );

    Still untested, though. :)
    Tore Aursand, Sep 21, 2004
  10. LHradowy

    Tore Aursand Guest

    My fault. Don't split on '\s+{2,}', but on '\s{2,}';

    my @cols = split( /\s{2,}/, $_ );
    Tore Aursand, Sep 21, 2004
  11. LHradowy

    Anno Siegel Guest

    while ( <DATA> ) {
    my @l = split;
    print join( ',', $l[ 0], "@l[ 1 .. 4]", "@l[ 5 .. $#l]"), "\n";

    Anno Siegel, Sep 21, 2004
  12. I get the idea I may be oversimplifying or misunderstanding some part
    of this question, but if there is a uniform number of columns, and
    components within
    the columns a simple regex should do it, and it's a matter of just
    reconstructing it with a print statement with the spacing you want.

    perl -pi.bak -e 's/^\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(.*)/$1,$2
    $3 $4 $5,$6/g' spaces

    In my first pass the long and ugly oneliner above did it for me when I
    cut and pasted your file snippet into a file called spaces. This
    edited in place and copied the old file to spaces.bak
    If there's a need to write it to a file of another name the same regex
    be wrapped in a script opening the infile for reading and the outfile
    for writing.

    How about it? Am I misunderstanding something here?
    Larry Felton Johnson, Sep 21, 2004
  13. LHradowy

    Ian Wilson Guest

    C:\> type oldname.txt
    1555002 00 0 04 27 TELN NOT
    3555007 00 0 06 00 CUSTOMER
    HAS > 1
    5555410 00 0 12 10 CUSTOMER
    HAS > 1
    6755012 00 0 12 06 CUSTOMER
    HAS > 1

    C:\> perl -p -e "s/^ +//; s/ +/,/g" oldname.txt
    1555002,00 0 04 27,TELN NOT BILL
    3555007,00 0 06 00,CUSTOMER HAS > 1
    5555410,00 0 12 10,CUSTOMER HAS > 1
    6755012,00 0 12 06,CUSTOMER HAS > 1

    I recall some versions of Perl on some versions of Windows have problems
    with redirecting STDOUT to a file from a command prompt / DOS window.
    Maybe you have one of those combinations?
    Ian Wilson, Sep 21, 2004
  14. A couple of followup things. My g option above (after the last '/'
    was a typo. It didn't hurt or help, but was superfluous.

    The second is that the whole approach to looking at lines in a file
    like this bears a little bit of discussion. When I looked at the
    lines, the first thing that entered my mind wasn't "How do I get rid
    of the spaces?" but "What always seems to be true about these lines?"

    Basically you're looking at a line like this

    some spaces, some digits,space,digits,space,digits,space,digits,space,digits,space,some
    variable text with no necessity to format.

    I could have used \d+ instead of \w+, but everything in the match
    breaks down to
    \w+, \s+ or .*

    So there are only three types of things to match, digits, spaces and
    the "everything else" trailing at the end.

    Given this a number of the approaches people have given will all work:
    splitting into an array, substr (if the positions are uniform) and
    unpack (if the positions are uniform). The task is to capture the
    nonspace stuff into usable variables and print them out with inserted
    whitespace and any punctuation or labeling characters you choose.
    This mental approach gives you much more control over the formatting
    and use of the data than thinking of it as
    simply not wanting the spaces at the beginning of line, but wanting to
    preserve some of the spaces in the middle.
    Larry Felton Johnson, Sep 22, 2004
  15. LHradowy

    LHradowy Guest

    I want to thank all who of you that have spent time onthis problem. what a
    tremendous response!
    LHradowy, Sep 22, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.