Word + win32ole - how to find formatting of a word?

Discussion in 'Ruby' started by Mohit Sindhwani, Oct 25, 2008.

  1. HI! I'm trying to use Ruby and win32ole to parse a Word document. So
    far, I'm able to extract the style and text of each paragraph. That
    works great to convert it into individual divs (in the HTML CSS sense).

    Now, inside the paragraphs, there are certain words that have special
    formatting (for e.g. the name of a command which is in monospace) - I'm
    trying to find how to extract those special cases. Does anyone know how
    to achieve that?

    Appreciate your help - thanks!

    Cheers,
    Mohit.
    10/25/2008 | 4:33 PM.
     
    Mohit Sindhwani, Oct 25, 2008
    #1
    1. Advertising

  2. Mohit Sindhwani

    Axel Etzold Guest

    > HI! I'm trying to use Ruby and win32ole to parse a Word document. So
    > far, I'm able to extract the style and text of each paragraph. That
    > works great to convert it into individual divs (in the HTML CSS sense).
    >
    > Now, inside the paragraphs, there are certain words that have special
    > formatting (for e.g. the name of a command which is in monospace) - I'm
    > trying to find how to extract those special cases. Does anyone know how
    > to achieve that?
    >


    Dear Mohit,

    you could save the Word file as an html and then extract the relevant information...
    I did that using OpenOffice and got a file containing the font information in the following form.


    <BODY LANG="en-US" DIR="LTR">
    <P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
    Libertine</FONT></P>
    <P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
    Charter</FONT></P>
    </BODY>

    If you read in the text of that file as a String, you can then find the relevant bits using regexps.

    Best regards,

    Axel

    --
    Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
     
    Axel Etzold, Oct 25, 2008
    #2
    1. Advertising

  3. Axel Etzold wrote:
    >> HI! I'm trying to use Ruby and win32ole to parse a Word document. So
    >> far, I'm able to extract the style and text of each paragraph. That
    >> works great to convert it into individual divs (in the HTML CSS sense).
    >>
    >> Now, inside the paragraphs, there are certain words that have special
    >> formatting (for e.g. the name of a command which is in monospace) - I'm
    >> trying to find how to extract those special cases. Does anyone know how
    >> to achieve that?
    >>

    >
    > Dear Mohit,
    >
    > you could save the Word file as an html and then extract the relevant information...
    > I did that using OpenOffice and got a file containing the font information in the following form.
    >
    >
    > <BODY LANG="en-US" DIR="LTR">
    > <P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
    > Libertine</FONT></P>
    > <P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
    > Charter</FONT></P>
    > </BODY>
    >


    Hi Axel

    Thanks for replying! Converting to HTML and working with that is my
    last option actually. In a well-written document, I found that using
    Word to return style information about the paragraph is a lot less work
    and relatively easy to work with. I guess it's time to consider your
    suggestion!

    Cheers,
    Mohit.
    10/26/2008 | 5:44 PM.
     
    Mohit Sindhwani, Oct 26, 2008
    #3
  4. Mohit Sindhwani wrote:
    > Axel Etzold wrote:
    >>> HI! I'm trying to use Ruby and win32ole to parse a Word document.
    >>> So far, I'm able to extract the style and text of each paragraph.
    >>> That works great to convert it into individual divs (in the HTML CSS
    >>> sense).
    >>>
    >>> Now, inside the paragraphs, there are certain words that have
    >>> special formatting (for e.g. the name of a command which is in
    >>> monospace) - I'm trying to find how to extract those special cases.
    >>> Does anyone know how to achieve that?
    >>>

    >>
    >> Dear Mohit,
    >> you could save the Word file as an html and then extract the
    >> relevant information...
    >> I did that using OpenOffice and got a file containing the font
    >> information in the following form.
    >>

    >
    > Hi Axel
    >
    > Thanks for replying! Converting to HTML and working with that is my
    > last option actually. In a well-written document, I found that using
    > Word to return style information about the paragraph is a lot less
    > work and relatively easy to work with. I guess it's time to consider
    > your suggestion!
    >

    Actually, after digging around, I found that this gets me somewhere there:
    words = doc.Words
    words.each {|w|
    index += 1
    ft = w.Font.Name
    ftHash[ft] = 1
    }

    Thanks for your help!

    Cheers,
    Mohit.
    10/26/2008 | 9:14 PM.
     
    Mohit Sindhwani, Oct 26, 2008
    #4
  5. Mohit Sindhwani

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Sun, 26 Oct 2008 22:14:53 +0900
    > Von: Mohit Sindhwani <>
    > An:
    > Betreff: Re: Word + win32ole - how to find formatting of a word?


    > Mohit Sindhwani wrote:
    > > Axel Etzold wrote:
    > >>> HI! I'm trying to use Ruby and win32ole to parse a Word document.
    > >>> So far, I'm able to extract the style and text of each paragraph.
    > >>> That works great to convert it into individual divs (in the HTML CSS
    > >>> sense).
    > >>>
    > >>> Now, inside the paragraphs, there are certain words that have
    > >>> special formatting (for e.g. the name of a command which is in
    > >>> monospace) - I'm trying to find how to extract those special cases.
    > >>> Does anyone know how to achieve that?
    > >>>
    > >>
    > >> Dear Mohit,
    > >> you could save the Word file as an html and then extract the
    > >> relevant information...
    > >> I did that using OpenOffice and got a file containing the font
    > >> information in the following form.
    > >>

    > >
    > > Hi Axel
    > >
    > > Thanks for replying! Converting to HTML and working with that is my
    > > last option actually. In a well-written document, I found that using
    > > Word to return style information about the paragraph is a lot less
    > > work and relatively easy to work with. I guess it's time to consider
    > > your suggestion!
    > >

    > Actually, after digging around, I found that this gets me somewhere there:
    > words = doc.Words
    > words.each {|w|
    > index += 1
    > ft = w.Font.Name
    > ftHash[ft] = 1
    > }
    >
    > Thanks for your help!
    >
    > Cheers,
    > Mohit.
    > 10/26/2008 | 9:14 PM.
    >
    >


    Dear Mohit,

    you're welcome :)
    It's always nice to best answer one's own questions , isn't it ? Thanks for the info !

    Best regards,

    Axel

    --
    Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
    Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
     
    Axel Etzold, Oct 26, 2008
    #5
  6. Axel Etzold wrote:
    > you're welcome :)
    > It's always nice to best answer one's own questions , isn't it ? Thanks for the info !
    >

    Thanks for your reply again! Yes, it's good to find the answer yourself
    and then share it :)

    I find that Win32ole is quite powerful, just that it needs a little
    looking around to work with it.

    Cheers,
    Mohit.
    10/27/2008 | 11:19 AM.
     
    Mohit Sindhwani, Oct 27, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jim Freeze
    Replies:
    0
    Views:
    105
    Jim Freeze
    Jan 27, 2004
  2. Masaki Suketa
    Replies:
    4
    Views:
    398
    Dave Burt
    Mar 27, 2006
  3. zxem
    Replies:
    1
    Views:
    251
  4. jhn Vln

    win32ole word find replace

    jhn Vln, Oct 20, 2009, in forum: Ruby
    Replies:
    1
    Views:
    294
    David Mullet
    Oct 21, 2009
  5. Chris Tranter

    Win32OLE word objects

    Chris Tranter, Apr 19, 2011, in forum: Ruby
    Replies:
    2
    Views:
    206
    Chris Tranter
    Apr 19, 2011
Loading...

Share This Page