How to get contents of word file page by page

Discussion in 'Ruby' started by Talib Hussain, Dec 12, 2008.

  1. Hi,

    I have a 3 paged document, I want to read contents of each page. How cn
    i do that.

    TIA,
    Talib Hussain
    --
    Posted via http://www.ruby-forum.com/.
    Talib Hussain, Dec 12, 2008
    #1
    1. Advertising

  2. Talib Hussain wrote:
    > Hi,
    >
    > I have a 3 paged document, I want to read contents of each page. How cn
    > i do that.
    >
    > TIA,
    > Talib Hussain


    Anyone please
    --
    Posted via http://www.ruby-forum.com/.
    Talib Hussain, Dec 12, 2008
    #2
    1. Advertising

  3. Talib Hussain

    David Mullet Guest

    Talib Hussain wrote:
    > Hi,
    >
    > I have a 3 paged document, I want to read contents of each page. How cn
    > i do that.
    >
    > TIA,
    > Talib Hussain


    Assuming...

    -- You are working with a Microsoft Word document.
    -- You have actual page breaks between pages

    ...you can create an array of the text on each page by getting the
    document contents' text and splitting it on the page break. So, where
    doc is your Word document object, you can do this:

    pages = doc.content.text.split("\f")
    pages.each do |page|
    # do something with this page's text
    end

    Hope that helps.

    David

    http://rubyonwindows.blogspot.com
    http://rubyonwindows.blogspot.com/search/label/word
    --
    Posted via http://www.ruby-forum.com/.
    David Mullet, Dec 12, 2008
    #3
  4. Talib Hussain

    Heesob Park Guest

    2008/12/12 Talib Hussain <>:
    > Hi,
    >
    > I have a 3 paged document, I want to read contents of each page. How cn
    > i do that.
    >

    If you want only text contents, try this

    require 'win32ole'
    word = WIN32OLE.new('word.application')
    file = 'c:/work/test.doc'
    doc = word.documents.open(file,'ReadOnly' => true)
    page = doc.ComputeStatistics(2) # wdStatisticPages = 2
    for i in 1..page
    word.selection.goto(1,1,i) # wdGoToPage = 1
    word.selection.goto(-1,0,0,'\page') # wdGoToBookmark = -1
    puts "PAGE #{i}"
    puts word.selection.text
    end
    word.activedocument.close(false)
    word.quit

    Regards,
    Park Heesob
    Heesob Park, Dec 12, 2008
    #4
  5. Heesob Park wrote:
    > 2008/12/12 Talib Hussain <>:
    >> Hi,
    >>

    > Regards,
    > Park Heesob


    Thanks a lot Park, you are genius.

    My requirements is that I have a document (Word file) of say 3 pages
    with formatted text.

    I need to extract the contents of each page with formatting and save
    that as a seprate .PDF document.

    Is this possible? If yes how can I do that?

    Also, do I need to install Office 2007 in order to save files as .PDF
    documents.

    Kindly let me know.

    --
    Posted via http://www.ruby-forum.com/.
    Talib Hussain, Dec 15, 2008
    #5
  6. You must be trying to solve a problem (word document convertation to
    pdf) with a wrong tool:). You don't need ruby to convert word file to
    pdf. There are tools like Word2pdf for this.

    Talib Hussain wrote:
    > Heesob Park wrote:
    >> 2008/12/12 Talib Hussain <>:
    >>> Hi,
    >>>

    >> Regards,
    >> Park Heesob

    >
    > Thanks a lot Park, you are genius.
    >
    > My requirements is that I have a document (Word file) of say 3 pages
    > with formatted text.
    >
    > I need to extract the contents of each page with formatting and save
    > that as a seprate .PDF document.
    >
    > Is this possible? If yes how can I do that?
    >
    > Also, do I need to install Office 2007 in order to save files as .PDF
    > documents.
    >
    > Kindly let me know.


    --
    Posted via http://www.ruby-forum.com/.
    Firstname Secondname, Dec 15, 2008
    #6
  7. Name Surname wrote:
    > You must be trying to solve a problem (word document convertation to
    > pdf) with a wrong tool:). You don't need ruby to convert word file to
    > pdf. There are tools like Word2pdf for this.
    >
    > Talib Hussain wrote:
    >> Heesob Park wrote:
    >>> 2008/12/12 Talib Hussain <>:
    >>>> Hi,
    >>>>
    >>> Regards,
    >>> Park Heesob

    >>
    >> Thanks a lot Park, you are genius.
    >>
    >> My requirements is that I have a document (Word file) of say 3 pages
    >> with formatted text.
    >>
    >> I need to extract the contents of each page with formatting and save
    >> that as a seprate .PDF document.
    >>
    >> Is this possible? If yes how can I do that?
    >>
    >> Also, do I need to install Office 2007 in order to save files as .PDF
    >> documents.
    >>
    >> Kindly let me know.



    Agreed, but I have to create 3 seprate doc files out of one document
    (each page of the document) and send these files as input to the pdf
    converter
    --
    Posted via http://www.ruby-forum.com/.
    Talib Hussain, Dec 15, 2008
    #7
  8. Talib Hussain

    Name Surname Guest

    If you have Word2pdf like program, then check if you can specify which
    page to covert. You could call Word2pdf several times specifying
    different page numbers to convert.

    Word2pdf -n 1 infile.doc out1.pdf
    Word2pdf -n 2 infile.doc out2.pdf
    Word2pdf -n 3 infile.doc out3.pdf

    :D
    The only thing here is to find(have) Word2pdf program which supports
    that :).


    Talib Hussain wrote:
    > Name Surname wrote:
    >> You must be trying to solve a problem (word document convertation to
    >> pdf) with a wrong tool:). You don't need ruby to convert word file to
    >> pdf. There are tools like Word2pdf for this.
    >>
    >> Talib Hussain wrote:
    >>> Heesob Park wrote:
    >>>> 2008/12/12 Talib Hussain <>:
    >>>>> Hi,
    >>>>>
    >>>> Regards,
    >>>> Park Heesob
    >>>
    >>> Thanks a lot Park, you are genius.
    >>>
    >>> My requirements is that I have a document (Word file) of say 3 pages
    >>> with formatted text.
    >>>
    >>> I need to extract the contents of each page with formatting and save
    >>> that as a seprate .PDF document.
    >>>
    >>> Is this possible? If yes how can I do that?
    >>>
    >>> Also, do I need to install Office 2007 in order to save files as .PDF
    >>> documents.
    >>>
    >>> Kindly let me know.

    >
    >
    > Agreed, but I have to create 3 seprate doc files out of one document
    > (each page of the document) and send these files as input to the pdf
    > converter


    --
    Posted via http://www.ruby-forum.com/.
    Name Surname, Dec 15, 2008
    #8
  9. * Name Surname <> [2008-12-15 15:55:34 +0900]:

    > If you have Word2pdf like program, then check if you can specify which
    > page to covert. You could call Word2pdf several times specifying
    > different page numbers to convert.
    >
    > Word2pdf -n 1 infile.doc out1.pdf
    > Word2pdf -n 2 infile.doc out2.pdf
    > Word2pdf -n 3 infile.doc out3.pdf
    >
    > :D
    > The only thing here is to find(have) Word2pdf program which supports
    > that :).
    >


    Surely, openoffice must have something - you can export word documents
    as PDFs - there may be a corresponding command line utility...


    saji

    >
    > Talib Hussain wrote:
    > > Name Surname wrote:
    > >> You must be trying to solve a problem (word document convertation to
    > >> pdf) with a wrong tool:). You don't need ruby to convert word file to
    > >> pdf. There are tools like Word2pdf for this.
    > >>
    > >> Talib Hussain wrote:
    > >>> Heesob Park wrote:
    > >>>> 2008/12/12 Talib Hussain <>:
    > >>>>> Hi,
    > >>>>>
    > >>>> Regards,
    > >>>> Park Heesob
    > >>>
    > >>> Thanks a lot Park, you are genius.
    > >>>
    > >>> My requirements is that I have a document (Word file) of say 3 pages
    > >>> with formatted text.
    > >>>
    > >>> I need to extract the contents of each page with formatting and save
    > >>> that as a seprate .PDF document.
    > >>>
    > >>> Is this possible? If yes how can I do that?
    > >>>
    > >>> Also, do I need to install Office 2007 in order to save files as .PDF
    > >>> documents.
    > >>>
    > >>> Kindly let me know.

    > >
    > >
    > > Agreed, but I have to create 3 seprate doc files out of one document
    > > (each page of the document) and send these files as input to the pdf
    > > converter

    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >


    --
    Saji N. Hameed

    APEC Climate Center +82 51 668 7470
    National Pension Corporation Busan Building 12F
    Yeonsan 2-dong, Yeonje-gu, BUSAN 611705
    KOREA
    Saji N. Hameed, Dec 15, 2008
    #9
  10. Talib Hussain

    Anandh Kumar Guest

    Thanks park... that was good... now say, my word document has got
    some student detail information such as name,marks register no... these
    are the entries i'll be having... say me how to parse this strings and
    upload it to the database...







    Thanks
    --
    Posted via http://www.ruby-forum.com/.
    Anandh Kumar, Jun 12, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Don Adams
    Replies:
    1
    Views:
    572
    Martin Honnen
    Mar 5, 2004
  2. spidey12345
    Replies:
    2
    Views:
    526
    ashani
    Oct 23, 2006
  3. Kamarulnizam Rahim
    Replies:
    4
    Views:
    200
    Robert Klemme
    Jan 28, 2011
  4. Replies:
    0
    Views:
    174
  5. Replies:
    19
    Views:
    95
Loading...

Share This Page