D
David R. Throop
I'm perplexed. I'm writing a PERL script that reads a single large
many-sectioned HTML document, breaks it into smaller files and
extracts some information for another text-manipulation tool to read.
The first HTML file comes from saving a 150+ page MS-Word file as HTML.
I'm having fits with some nonstandard whitespace in the HMTL file. It
appears like a long whitespace and acts as a single character, but it
doesn't patternmatch a \s. When I view it in Emacs, it appears as
%/1\200\216iso8859-15^B\201 \201 \201
where \200 \216 ^B and \201 are all single characters. But text
containing the odd whitespace fails to patternmatch those characters.
I Googled on iso8859 and found enough to get some idea that I'm
dealing with some specially encoded character, but everything I found
assumed I already knew about the encoding.
All I want to do is to turn this oddspace into regular whitespace.
Anybody?
Thanks
David Throop
many-sectioned HTML document, breaks it into smaller files and
extracts some information for another text-manipulation tool to read.
The first HTML file comes from saving a 150+ page MS-Word file as HTML.
I'm having fits with some nonstandard whitespace in the HMTL file. It
appears like a long whitespace and acts as a single character, but it
doesn't patternmatch a \s. When I view it in Emacs, it appears as
%/1\200\216iso8859-15^B\201 \201 \201
where \200 \216 ^B and \201 are all single characters. But text
containing the odd whitespace fails to patternmatch those characters.
I Googled on iso8859 and found enough to get some idea that I'm
dealing with some specially encoded character, but everything I found
assumed I already knew about the encoding.
All I want to do is to turn this oddspace into regular whitespace.
Anybody?
Thanks
David Throop