W
weston
I'm trying to streamline workflow from Word Documents to HTML. There
are numerous atrocities perpetrated in the process of saving a Word Doc
to filtered HTML, but there's one that I find particularly interesting
(and annoying): sometimes tags have newlines within them. Especially
<span> tags. For example:
<p><span lang=JA style='font-family:
"MS Mincho"'>(</span>
Is there a regular expression that can pull the span up onto the same
line?
So far, I've tried slurping the whole file into a single string, and
doing:
s/(<span.*?)^+([^>]*>)/$1 $2/mig;
which seems to have no effect, and this:
s/(<span.*?)\n+([^>]*>)/$1 $3/mig;
which seems to lop off everything from the first line.
It seems likely there's a way to do this, but I'm sortof stuck on what
to try next. Any ideas?
are numerous atrocities perpetrated in the process of saving a Word Doc
to filtered HTML, but there's one that I find particularly interesting
(and annoying): sometimes tags have newlines within them. Especially
<span> tags. For example:
<p><span lang=JA style='font-family:
"MS Mincho"'>(</span>
Is there a regular expression that can pull the span up onto the same
line?
So far, I've tried slurping the whole file into a single string, and
doing:
s/(<span.*?)^+([^>]*>)/$1 $2/mig;
which seems to have no effect, and this:
s/(<span.*?)\n+([^>]*>)/$1 $3/mig;
which seems to lop off everything from the first line.
It seems likely there's a way to do this, but I'm sortof stuck on what
to try next. Any ideas?