Word + win32ole - how to find formatting of a word?

M

Mohit Sindhwani

HI! I'm trying to use Ruby and win32ole to parse a Word document. So
far, I'm able to extract the style and text of each paragraph. That
works great to convert it into individual divs (in the HTML CSS sense).

Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I'm
trying to find how to extract those special cases. Does anyone know how
to achieve that?

Appreciate your help - thanks!

Cheers,
Mohit.
10/25/2008 | 4:33 PM.
 
A

Axel Etzold

HI! I'm trying to use Ruby and win32ole to parse a Word document. So
far, I'm able to extract the style and text of each paragraph. That
works great to convert it into individual divs (in the HTML CSS sense).

Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I'm
trying to find how to extract those special cases. Does anyone know how
to achieve that?

Dear Mohit,

you could save the Word file as an html and then extract the relevant information...
I did that using OpenOffice and got a file containing the font information in the following form.


<BODY LANG="en-US" DIR="LTR">
<P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
Libertine</FONT></P>
<P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
Charter</FONT></P>
</BODY>

If you read in the text of that file as a String, you can then find the relevant bits using regexps.

Best regards,

Axel
 
M

Mohit Sindhwani

Axel said:
Dear Mohit,

you could save the Word file as an html and then extract the relevant information...
I did that using OpenOffice and got a file containing the font information in the following form.


<BODY LANG="en-US" DIR="LTR">
<P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
Libertine</FONT></P>
<P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
Charter</FONT></P>
</BODY>

Hi Axel

Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less work
and relatively easy to work with. I guess it's time to consider your
suggestion!

Cheers,
Mohit.
10/26/2008 | 5:44 PM.
 
M

Mohit Sindhwani

Mohit said:
Hi Axel

Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less
work and relatively easy to work with. I guess it's time to consider
your suggestion!
Actually, after digging around, I found that this gets me somewhere there:
words = doc.Words
words.each {|w|
index += 1
ft = w.Font.Name
ftHash[ft] = 1
}

Thanks for your help!

Cheers,
Mohit.
10/26/2008 | 9:14 PM.
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Sun, 26 Oct 2008 22:14:53 +0900
Von: Mohit Sindhwani <[email protected]>
An: (e-mail address removed)
Betreff: Re: Word + win32ole - how to find formatting of a word?
Mohit said:
Hi Axel

Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less
work and relatively easy to work with. I guess it's time to consider
your suggestion!
Actually, after digging around, I found that this gets me somewhere there:
words = doc.Words
words.each {|w|
index += 1
ft = w.Font.Name
ftHash[ft] = 1
}

Thanks for your help!

Cheers,
Mohit.
10/26/2008 | 9:14 PM.

Dear Mohit,

you're welcome :)
It's always nice to best answer one's own questions , isn't it ? Thanks for the info !

Best regards,

Axel
 
M

Mohit Sindhwani

Axel said:
you're welcome :)
It's always nice to best answer one's own questions , isn't it ? Thanks for the info !
Thanks for your reply again! Yes, it's good to find the answer yourself
and then share it :)

I find that Win32ole is quite powerful, just that it needs a little
looking around to work with it.

Cheers,
Mohit.
10/27/2008 | 11:19 AM.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top