Newbie: Simple problem with htm file.

J

John Smith

Hello Perl guru's.

I am having a problem reading a file that is a log file created by another
program that is in html format.

I have been using the open command to read and write to some text files and
things are fine. When I open and read these htm log files Perl seems to be
adding an extra space after each character. For example:
< H e a d e r >
T a p e L o g

When I open the htm file with notepad the html code looks fine. What am I
missing here? Perl seems to know this file is different than a standard
text file and is adding all these spaces on it's own.

Thanks for any assistance you can provide.
 
W

Walter Roberson

:I am having a problem reading a file that is a log file created by another
:program that is in html format.

:I have been using the open command to read and write to some text files and
:things are fine. When I open and read these htm log files Perl seems to be
:adding an extra space after each character. For example:
:< H e a d e r >
:T a p e L o g

Just a guess -- but I suspect the html file is utf8. See perldoc utf8
 
J

Joe Smith

Walter said:
:I am having a problem reading a file that is a log file created by another
:program that is in html format.

:I have been using the open command to read and write to some text files and
:things are fine. When I open and read these htm log files Perl seems to be
:adding an extra space after each character. For example:
:< H e a d e r >
:T a p e L o g

Just a guess -- but I suspect the html file is utf8. See perldoc utf8

If there is a null byte between every printing character, then it is
utf16, not utf8 (and not ASCII and not ISO-8859-1).
-Joe
 
J

Jürgen Exner

Walter said:
Just a guess -- but I suspect the html file is utf8. See perldoc utf8

For English characters there is no binary difference whatsoever between
UTF-8, ASCII, ISO-Latin-1, Windows-1252, etc, etc. That's one reason why
programmers from English speaking countries typically are so ignorant about
code pages. They just don't care because they don't need to care. Even if
they write the code for ASCII and they receive UTF-8 data, it will still
work for their English only test data.

The symptons described by the OP are pointing more towards the direction of
UTF-16.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top