M
MichaelC
Hi all. I am having a particularly difficult time with a perl script that I
am writing. The problem area is a place where I need to strip some newlines
out of a file.
My source data is text which is in paragraph form, but has line breaks
within the paragraphs. I need to do as much processing as possible in order
to minimise the amount of manual changes that I have to make.
Sample text is as follows:
"This document is intended to give you an
overview of DG as well as highlight some of
the features. This is a brought to your handheld using DG."
With DG you can view and edit word processing and spreadsheet files on
your handheld. Simple push-button synchronization of
the handheld with the desktop will maintain the most up-to-date
version of a file on both the desktop and handheld.
I want these to be parsed as follows:
"This document is intended to give you an overview of DG as well as
highlight some of the features. This is a brought to your handheld using
DG." With DG you can view and edit word processing and spreadsheet files on
your handheld. Simple push-button synchronization of the handheld with the
desktop will maintain the most up-to-date version of a file on both the
desktop and handheld.
--
One way that I thought might work is to catch all lines that begin upper
case, prepend them with a line break, strip the trailing break, then trap
all lines that start lower case and dump them as-is. Repeat this until no
matches are made on the lower case test, then clean up all those extra line
breaks.
I came up with this . . . but all it seems to do is strip all newlines out.
while( <infl> ) {
my $x = $_;
if ( $x =~ ?^[^a-z]? ) { $x =~ s!(.*)\n!\n\1 ! }
else { $x =~ s!(.*)\n!\1 ! }
print outfl $x;
}
Any help would be greately appreciated.
Michael
am writing. The problem area is a place where I need to strip some newlines
out of a file.
My source data is text which is in paragraph form, but has line breaks
within the paragraphs. I need to do as much processing as possible in order
to minimise the amount of manual changes that I have to make.
Sample text is as follows:
"This document is intended to give you an
overview of DG as well as highlight some of
the features. This is a brought to your handheld using DG."
With DG you can view and edit word processing and spreadsheet files on
your handheld. Simple push-button synchronization of
the handheld with the desktop will maintain the most up-to-date
version of a file on both the desktop and handheld.
I want these to be parsed as follows:
"This document is intended to give you an overview of DG as well as
highlight some of the features. This is a brought to your handheld using
DG." With DG you can view and edit word processing and spreadsheet files on
your handheld. Simple push-button synchronization of the handheld with the
desktop will maintain the most up-to-date version of a file on both the
desktop and handheld.
--
One way that I thought might work is to catch all lines that begin upper
case, prepend them with a line break, strip the trailing break, then trap
all lines that start lower case and dump them as-is. Repeat this until no
matches are made on the lower case test, then clean up all those extra line
breaks.
I came up with this . . . but all it seems to do is strip all newlines out.
while( <infl> ) {
my $x = $_;
if ( $x =~ ?^[^a-z]? ) { $x =~ s!(.*)\n!\n\1 ! }
else { $x =~ s!(.*)\n!\1 ! }
print outfl $x;
}
Any help would be greately appreciated.
Michael