T
Thomas Baetzler
Hi,
I'm looking for input on how to run search/replace operations on
paragraphs of HTML text without having to worry about the surrounding
markup.
So far I'm using HTML::Treebuilder to parse a HTML document and identfy
the individual paragraphs in the text. By recursively using the
content_list method, I can locate the individual text chunks that make
up the paragraph text.
What I'd like to do is merge these chunks into a single string, run some
search/replace regexes on it, then update the individual text chunks
with the changes.
Is there a better way to do this than stopping after each change to see
what's changed and keep track of chunk borders that way?
I could probably work on individual chunks in turn, but taking care of
all the edge cases where I'd have to do lookahead/lookback in adjoining
chunks could be, well, tedious ;-)
TIA for any suggestion you might have!
Cheers,
Thomas
I'm looking for input on how to run search/replace operations on
paragraphs of HTML text without having to worry about the surrounding
markup.
So far I'm using HTML::Treebuilder to parse a HTML document and identfy
the individual paragraphs in the text. By recursively using the
content_list method, I can locate the individual text chunks that make
up the paragraph text.
What I'd like to do is merge these chunks into a single string, run some
search/replace regexes on it, then update the individual text chunks
with the changes.
Is there a better way to do this than stopping after each change to see
what's changed and keep track of chunk borders that way?
I could probably work on individual chunks in turn, but taking care of
all the edge cases where I'd have to do lookahead/lookback in adjoining
chunks could be, well, tedious ;-)
TIA for any suggestion you might have!
Cheers,
Thomas