Replacing Regex with part of itself

H

Hal Vaughan

I know there's a way to do this, and I know it involves special uses of a
regex, but I can't remember the terms that apply, so I'm having trouble
searching for it. I want to take a line in a malformed HTML page like:

<OPTION Value = '1' >Book_Title_1
<OPTION Value = '2' >Book_Title_2

so it'll look like:

<OPTION Value = '1' >Book_Title_1</OPTION>
<OPTION Value = '2' >Book_Title_2</OPTION>

I know I can find the pattern by looking for something like:

$htmlpage =~ /<OPTION.*?>.*?$/

I THINK I remember that I can capture the wildcard part of the regex like:

$htmlpage =~ /<OPTION(.*?)>(.*?)$/

But when I try a substitution:

$htmlpage =~ s/<OPTION(.*?)>(.*?)$/<OPTION.*?>.*?</OPTION>$/g;

how do I get the selected sections from the search part to be included in
the replace part?

Thanks for any help on this. I'm not even sure what the name is for the
type of search/replace I'm trying to do is!

Hal
 
A

Anno Siegel

Hal Vaughan said:
I know there's a way to do this, and I know it involves special uses of a
regex, but I can't remember the terms that apply, so I'm having trouble
searching for it. I want to take a line in a malformed HTML page like:

"Capture" is the word you're looking for.
<OPTION Value = '1' >Book_Title_1
<OPTION Value = '2' >Book_Title_2

so it'll look like:

<OPTION Value = '1' >Book_Title_1</OPTION>
<OPTION Value = '2' >Book_Title_2</OPTION>

I know I can find the pattern by looking for something like:

$htmlpage =~ /<OPTION.*?>.*?$/

I THINK I remember that I can capture the wildcard part of the regex like:

$htmlpage =~ /<OPTION(.*?)>(.*?)$/

The () are capturing parentheses. The "?" after each ".*" make the
match non-greedy. Why do you think you need that? The final "$" is
also unnecessary.
But when I try a substitution:

$htmlpage =~ s/<OPTION(.*?)>(.*?)$/<OPTION.*?>.*?</OPTION>$/g;
^
"/" is your delimiter. You must quote it or use an alternative delimiter.
how do I get the selected sections from the search part to be included in
the replace part?

Use $1, $2, etc. This is explained in "perldoc perlre".

s{<OPTION(.*?)>(.*)} {<OPTION$1>$2</OPTION>};

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top