space deliminated to comma delinated with varried and need spaces between some columns

L

LHradowy

I have file that looks like this...
1555002 00 0 04 27 TELN NOT BILL
3555007 00 0 06 00 CUSTOMER HAS
5555410 00 0 12 10 CUSTOMER HAS
6755012 00 0 12 06 CUSTOMER HAS

Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

I need to created a perl script that takes this file and makes it look like
this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

This output needs to be written to a file.
I have no idea how to start, if I split on a space " " the it will spit the
third an fourth column up. The fourth column can basically be left alone.

Thanks for the help.
 
J

Jürgen Exner

LHradowy said:
I have file that looks like this...
1555002 00 0 04 27 TELN
NOT BILL 3555007 00 0 06 00
CUSTOMER HAS
5555410 00 0 12 10
CUSTOMER HAS
6755012 00 0 12 06
CUSTOMER HAS

Notice the white spaces at beginning of the line, I DONT WANT THEM
THERE

Please see the thread "
Replacing spaces" that was discussed here over the weekend.
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM
THERE...

The solutions posted in the thread mentioned above will leave those alone.

I need to created a perl script that takes this file

perldoc -f open
perldoc perlop (and check for said:
and makes it look like this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

This output needs to be written to a file.

perldoc -f open
perldoc -f print
I have no idea how to start, if I split on a space " " the it will
spit the third an fourth column up. The fourth column can basically
be left alone.

So, what is the distinguishing difference between the separator for the
items in the third column on the one hand and the separator between the
third column and the fourth column on the other hand?

jue
 
S

Shawn Corey

Hi,

If the data is in fixed columns, you can use substr.

perldoc -f substr

--- Shawn
 
I

Ian Wilson

LHradowy said:
I have file that looks like this...
1555002 00 0 04 27 TELN NOT BILL
3555007 00 0 06 00 CUSTOMER HAS


5555410 00 0 12 10 CUSTOMER HAS


6755012 00 0 12 06 CUSTOMER HAS



Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

I need to created a perl script that takes this file and makes it look like
this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

This output needs to be written to a file.
I have no idea how to start, if I split on a space " " the it will spit the
third an fourth column up. The fourth column can basically be left alone.

Thanks for the help.

If the data always has multiple spaces (ASCII 32) between fields, I'd
try stripping the leading spaces and then converting >1 consecutive
spaces to commas:

perl -e -p 's/^ +//; s/ +/,/g' oldfile > newfile

But I expect Shawn's substr solution to be more robust. Using unpack may
be another useful approach.
 
T

Tore Aursand

I have file that looks like this...
1555002 00 0 04 27 TELN NOT BILL
3555007 00 0 06 00 CUSTOMER HAS
5555410 00 0 12 10 CUSTOMER HAS
6755012 00 0 12 06 CUSTOMER HAS

Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

I need to created a perl script that takes this file and makes it look like
this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

If we skip everything that has got to do with the file(s), here's a
suggestion (untested);

while ( <DATA> ) {
chomp; # Get rid of line breaks
s,^\s+,,; # Remove leading spaces
my @cols = split( /\s+{2,}/, $_ ); # Split on two (or more) spaces
print join( ',', @cols ) . "\n";
}
 
G

Gunnar Hjalmarsson

Tore said:
If we skip everything that has got to do with the file(s), here's a
suggestion (untested);

while ( <DATA> ) {
chomp; # Get rid of line breaks
s,^\s+,,; # Remove leading spaces
my @cols = split( /\s+{2,}/, $_ ); # Split on two (or more) spaces
-----------------------------^^^^^

Maybe you should have tested it... ;-)
 
L

LHradowy

Tore Aursand said:
If we skip everything that has got to do with the file(s), here's a
suggestion (untested);

while ( <DATA> ) {
chomp; # Get rid of line breaks
s,^\s+,,; # Remove leading spaces
my @cols = split( /\s+{2,}/, $_ ); # Split on two (or more) spaces
print join( ',', @cols ) . "\n";
}


Ahhh, I think I am forgetting something, THIS is exactly what I want!
But I am getting an error when I run it, and my skills at perl are weak.
#!/opt/perl/bin/perl

use strict;
use warnings;


while (<>) {
chomp; # Will remove the leading , or new line
s,^\s+,,; #Remove leading spaces
my @cols=split(/\s+{2,}/,$_); #Split on two (or more) spaces
print join (',',@cols)."\n";
}

user@server$ ./test.pl file
Nested quantifiers in regex; marked by <-- HERE in m/\s+{ <-- HERE 2,}/ at
../test.pl line 10.
 
L

LHradowy

Ian Wilson said:
If the data always has multiple spaces (ASCII 32) between fields, I'd
try stripping the leading spaces and then converting >1 consecutive
spaces to commas:

perl -e -p 's/^ +//; s/ +/,/g' oldfile > newfile

But I expect Shawn's substr solution to be more robust. Using unpack may
be another useful approach.

I like this but I get nothing back in the new file. And I have no tabs they
are all spaces.
 
T

Tore Aursand

-----------------------------^^^^^

Maybe you should have tested it... ;-)

You are so right, Gunnar, and I'm terribly sorry. The correct split()
should - of course - look like this:

my @cols = split( /\s{2,}/, $_ );

Still untested, though. :)
 
A

Anno Siegel

LHradowy said:
I have file that looks like this...
1555002 00 0 04 27 TELN NOT BILL
3555007 00 0 06 00 CUSTOMER HAS
5555410 00 0 12 10 CUSTOMER HAS
6755012 00 0 12 06 CUSTOMER HAS

Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

I need to created a perl script that takes this file and makes it look like
this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

This output needs to be written to a file.
I have no idea how to start, if I split on a space " " the it will spit the
third an fourth column up. The fourth column can basically be left alone.

while ( <DATA> ) {
my @l = split;
print join( ',', $l[ 0], "@l[ 1 .. 4]", "@l[ 5 .. $#l]"), "\n";
}

Anno
 
L

Larry Felton Johnson

LHradowy said:
I have file that looks like this...
1555002 00 0 04 27 TELN NOT BILL
3555007 00 0 06 00 CUSTOMER HAS
5555410 00 0 12 10 CUSTOMER HAS
6755012 00 0 12 06 CUSTOMER HAS

Notice the white spaces at beginning of the line, I DONT WANT THEM THERE
Notice the white spaces in the 2nd and 3rd columns, I NEED THEM THERE...

I need to created a perl script that takes this file and makes it look like
this
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

This output needs to be written to a file.
I have no idea how to start, if I split on a space " " the it will spit the
third an fourth column up. The fourth column can basically be left alone.

Thanks for the help.

I get the idea I may be oversimplifying or misunderstanding some part
of this question, but if there is a uniform number of columns, and
components within
the columns a simple regex should do it, and it's a matter of just
reconstructing it with a print statement with the spacing you want.

perl -pi.bak -e 's/^\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(.*)/$1,$2
$3 $4 $5,$6/g' spaces

In my first pass the long and ugly oneliner above did it for me when I
cut and pasted your file snippet into a file called spaces. This
edited in place and copied the old file to spaces.bak
If there's a need to write it to a file of another name the same regex
could
be wrapped in a script opening the infile for reading and the outfile
for writing.

How about it? Am I misunderstanding something here?
 
I

Ian Wilson

LHradowy said:
I like this but I get nothing back in the new file. And I have no
tabs they are all spaces.

C:\> type oldname.txt
1555002 00 0 04 27 TELN NOT
BILL
3555007 00 0 06 00 CUSTOMER
HAS > 1
5555410 00 0 12 10 CUSTOMER
HAS > 1
6755012 00 0 12 06 CUSTOMER
HAS > 1

C:\> perl -p -e "s/^ +//; s/ +/,/g" oldname.txt
1555002,00 0 04 27,TELN NOT BILL
3555007,00 0 06 00,CUSTOMER HAS > 1
5555410,00 0 12 10,CUSTOMER HAS > 1
6755012,00 0 12 06,CUSTOMER HAS > 1

I recall some versions of Perl on some versions of Windows have problems
with redirecting STDOUT to a file from a command prompt / DOS window.
Maybe you have one of those combinations?
 
L

Larry Felton Johnson

perl -pi.bak -e 's/^\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(.*)/$1,$2
$3 $4 $5,$6/g' spaces

A couple of followup things. My g option above (after the last '/'
was a typo. It didn't hurt or help, but was superfluous.

The second is that the whole approach to looking at lines in a file
like this bears a little bit of discussion. When I looked at the
lines, the first thing that entered my mind wasn't "How do I get rid
of the spaces?" but "What always seems to be true about these lines?"

Basically you're looking at a line like this

some spaces, some digits,space,digits,space,digits,space,digits,space,digits,space,some
variable text with no necessity to format.

I could have used \d+ instead of \w+, but everything in the match
breaks down to
\w+, \s+ or .*

So there are only three types of things to match, digits, spaces and
the "everything else" trailing at the end.

Given this a number of the approaches people have given will all work:
regex,
splitting into an array, substr (if the positions are uniform) and
unpack (if the positions are uniform). The task is to capture the
nonspace stuff into usable variables and print them out with inserted
whitespace and any punctuation or labeling characters you choose.
This mental approach gives you much more control over the formatting
and use of the data than thinking of it as
simply not wanting the spaces at the beginning of line, but wanting to
preserve some of the spaces in the middle.
 
L

LHradowy

I want to thank all who of you that have spent time onthis problem. what a
tremendous response!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top