How to extract part of the text (htm) file after start word until end word?

G

Guest

How to extract part of the text (htm) file after start word until end word?

start word is <! start >
end word is <! end>
Eg: some. html
-----------------------------------
not interested part of file...
not interested part of file... <! start >Interested
part
123456789<! end> not interested part..
------------------------------------


"Interested
part
123456789"

Thanks
 
G

Guest

(e-mail address removed) wrote:
: How to extract part of the text (htm) file after start word until end word?

: start word is <! start >
: end word is <! end>
: Eg: some. html
: -----------------------------------
: not interested part of file...
: not interested part of file... <! start >Interested
: part
: 123456789<! end> not interested part..
: ------------------------------------

Slurp your file in paragraph mode (search perldoc perlvar) by saying

local $/;
local $_ = <FH>;
if ( /<! start>(.*)<! end>/ ) {
$text=$1;
}
print $text;

Build a loop around this construct if you have more than one start..end
segment per file.

Oliver.
 
T

Tad McClellan

(e-mail address removed) wrote:
: How to extract part of the text (htm) file after start word until end word?

: start word is <! start >
: end word is <! end>
: Eg: some. html
: -----------------------------------
: not interested part of file...
: not interested part of file... <! start >Interested
: part
: 123456789<! end> not interested part..
: ------------------------------------

Slurp your file in paragraph mode (search perldoc perlvar) by saying

local $/;
local $_ = <FH>;
if ( /<! start>(.*)<! end>/ ) {


$text=$1;
}
print $text;

Build a loop around this construct if you have more than one start..end
segment per file.


If there is more than one, then you'd better make that:

if ( /<! start>(.*?)<! end>/s ) { # non-greedy
 
T

Tad McClellan

Aukjan van Belkum said:
if ( m/<\! start \!>/ .. m/<\! end \!>/){


There is no upside to gratuitous backslashing.

Exclamation marks are not special in regular expressions, so there
is no need to backslash them.

(and your patterns do not match the strings the OP posted.)
 
G

Guest

: if ( /<! start>(.*)<! end>/s ) { # interesting part contains newlines

Thanks for the correction, I felt I was missing something.

: If there is more than one, then you'd better make that:

: if ( /<! start>(.*?)<! end>/s ) { # non-greedy

And thank you for that, too.

Oliver.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top