Docs to XML conversion & read the XML files

M

msinghindia

I am creating an application which need to convert document files into
XML. Then read the xml files for specific words in specific format. I
am using Microsoft.Office.Interop for converting the document files to
xml .The files are getting generated but with lots of formating
information which leads to heavy file.

I need an help to write a code which can reduce the xml files by
removing the unwanted document formating. Or can be preserved if
required.


Thanks in advance.
 
J

Joe Kesselman

I need an help to write a code which can reduce the xml files by
removing the unwanted document formating. Or can be preserved if
required.

That sounds like a straight programming problem. First, you need to
analyse the files to create rules for recognizing the "unwanted" markup.
Then you need to write code that either filters that markup out during
the conversion process, or postprocesses the XML file by reading it in,
applying those rules to alter it, and writing it back out.

Pick your programming language and have fun. If you take the
postprocessing approach, you could probably do this in XSLT... but
whether that's the best approach depends in part on the nature of the
rules you're trying to apply.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top