Converting thousands of pages to XML

L

lquast

What's the best and fastest way to approach converting a large HTML
site to XML? Thanks.
 
D

David Dorward

What's the best and fastest way to approach converting a large HTML
site to XML?

That rather depends on what dialect of XML you wish to convert the HTML to,
what form the HTML is at present, and what your skills are.

I would probably do something involving Perl, File::Find, HTML::parser or
HTML::TreeBuilder, and one of the many XML modules for Perl.
 
A

Andy Dingley

What's the best and fastest way to approach converting a large HTML
site to XML? Thanks.

HTML Tidy is a good start (assuming your target is XHTML)

Then go to c.i.w.a.h and ask "Why ?"
 
L

lquast

Andy Dingley said:
HTML Tidy is a good start (assuming your target is XHTML)

Then go to c.i.w.a.h and ask "Why ?"

Hello,

Thank you for your suggestion regarding converting to XHTML. I am new
to using these groups, however, and just looked up c.i.w.a.h! Very
interesting—and I don't think I'll ask.

Regards

LQ
 
A

Andy Dingley

Thank you for your suggestion regarding converting to XHTML. I am new
to using these groups, however, and just looked up c.i.w.a.h! Very
interesting—and I don't think I'll ask.

:cool:

c.i.w.a.h is one of the most unfriendly groups I know of, and
certainly the most useless and downright hostile that I still bother
to read. "Converting to XHTML" is a regular topic in there and
searching will show up some interesting discussion of its benefits, or
lack of them. However many people in there have egos bigger than
their knowledge and will spout the same old party line with more
volume than understanding.

HTML Tidy is open sourced, AFAIR, and if you have a huge number of
files to convert, you can tie the source into your favourite choice of
scripting language.
 
L

lquast

Andy Dingley said:
:cool:

c.i.w.a.h is one of the most unfriendly groups I know of, and
certainly the most useless and downright hostile that I still bother
to read. "Converting to XHTML" is a regular topic in there and
searching will show up some interesting discussion of its benefits, or
lack of them. However many people in there have egos bigger than
their knowledge and will spout the same old party line with more
volume than understanding.

HTML Tidy is open sourced, AFAIR, and if you have a huge number of
files to convert, you can tie the source into your favourite choice of
scripting language.

I guess it couldn't hurt to see what they have to say! Thanks again
for the info. HTML Tidy may come in handy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top