trouble reading a gzipped xml-file

Discussion in 'Ruby' started by Guido de Melo, Nov 14, 2005.

  1. Hi,

    I'm trying to read a gzipped xml-file into rexml, but I don't quite
    succeed. Perhaps someone can help me. Till now I tried this:

    #!/usr/bin/ruby -w

    require 'zlib'
    require 'rexml/document'

    Zlib::GzipReader.open('file.dia') {|gz|
    print gz.read
    }
    # this prints everything nicely and it works

    f = Zlib::GzipReader.open("file.dia")
    s = f.read
    p s

    # now the ungzipped contents are in s, they look like this however:
    # "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<dia:diagram
    # xmlns:dia=\"http://www.lysator.liu.se/~alla/dia/\">\n
    # <dia:diagramdata>\n <dia:attribute name=\"background\">\n
    # all in one line and encapsulated in ""
    # so of course the next command fails

    xmldoc = REXML::Document.new s
    p xmldoc
    # gives: <UNDEFINED> ... </>

    Any ideas on this? This can't be too difficult, I think...
    Guido
    Guido de Melo, Nov 14, 2005
    #1
    1. Advertising

  2. Guido de Melo wrote:
    > Hi,
    >
    > I'm trying to read a gzipped xml-file into rexml, but I don't quite
    > succeed. Perhaps someone can help me. Till now I tried this:
    >
    > #!/usr/bin/ruby -w
    >
    > require 'zlib'
    > require 'rexml/document'
    >
    > Zlib::GzipReader.open('file.dia') {|gz|
    > print gz.read
    > }
    > # this prints everything nicely and it works
    >
    > f = Zlib::GzipReader.open("file.dia")
    > s = f.read
    > p s


    You're not closing f here.

    > # now the ungzipped contents are in s, they look like this however:
    > # "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<dia:diagram
    > # xmlns:dia=\"http://www.lysator.liu.se/~alla/dia/\">\n
    > # <dia:diagramdata>\n <dia:attribute name=\"background\">\n
    > # all in one line and encapsulated in ""
    > # so of course the next command fails


    I'm not so sure that it fails just because of this.

    > xmldoc = REXML::Document.new s
    > p xmldoc
    > # gives: <UNDEFINED> ... </>
    >
    > Any ideas on this? This can't be too difficult, I think...
    > Guido


    Did you try this?

    xmldoc = Zlib::GzipReader.open('file.dia') {|gz| REXML::Document.new gz}

    Kind regards

    robert
    Robert Klemme, Nov 14, 2005
    #2
    1. Advertising

  3. Robert Klemme wrote:
    > You're not closing f here.
    >

    [...]
    >
    > I'm not so sure that it fails just because of this.


    It shouldn't, because ruby will determine the end of the file by itself
    and close the handle on exiting.

    > Did you try this?
    >
    > xmldoc = Zlib::GzipReader.open('file.dia') {|gz| REXML::Document.new gz}


    irb(main):003:0> xmldoc = Zlib::GzipReader.open('file.dia') {|gz|
    REXML::Document.new gz}
    RuntimeError: Zlib::GzipReader is not a valid input stream. It must be
    either a String, IO, StringIO or Source.

    This doesn't work either, I'm afraid...
    Guido
    Guido de Melo, Nov 14, 2005
    #3
  4. Guido de Melo wrote:
    > Robert Klemme wrote:
    >> You're not closing f here.
    >>

    > [...]
    >>
    >> I'm not so sure that it fails just because of this.

    >
    > It shouldn't, because ruby will determine the end of the file by
    > itself and close the handle on exiting.


    I didn't mean to say that it fails because of the open file. My point
    with the first remark was that it's a good habit to open files for only as
    long as they are actually used. The block form is the idiom of choice
    here: it's not much longer as a simple File.open() or File.new() and it
    ensures the file is always properly closed.

    >> Did you try this?
    >>
    >> xmldoc = Zlib::GzipReader.open('file.dia') {|gz| REXML::Document.new
    >> gz}

    >
    > irb(main):003:0> xmldoc = Zlib::GzipReader.open('file.dia') {|gz|
    > REXML::Document.new gz}
    > RuntimeError: Zlib::GzipReader is not a valid input stream. It must
    > be either a String, IO, StringIO or Source.
    >
    > This doesn't work either, I'm afraid...


    I guess this is because GzipReader doesn't inherit IO:

    >> Zlib::GzipReader.ancestors

    => [Zlib::GzipReader, Enumerable, Zlib::GzipFile, Object, Kernel]

    But you can do this

    xmldoc = Zlib::GzipReader.open('file.dia') {|gz| REXML::Document.new(
    gz.read )}

    Btw, you see differing output because you use different printing methods:

    Zlib::GzipReader.open('file.dia') {|gz|
    print gz.read
    }

    vs.

    p s

    If there is no exception during GZIP reading I guess there might be a bug
    somewhere. As a test I'd write the gunzipped content to another file and
    do a diff on the plain xml and this output to see whether GzipReader
    actually yields the same content.

    Btw, did you actually try to make REXML read the first variant? Maybe you
    have a problem in your XML file.

    Kind regards

    robert
    Robert Klemme, Nov 14, 2005
    #4
  5. Thank you very much! You solved it!

    Robert Klemme wrote:
    >>>You're not closing f here.

    >>[...]
    >>>I'm not so sure that it fails just because of this.

    >>
    >>It shouldn't, because ruby will determine the end of the file by
    >>itself and close the handle on exiting.

    >
    > I didn't mean to say that it fails because of the open file. My point
    > with the first remark was that it's a good habit to open files for only as
    > long as they are actually used. The block form is the idiom of choice
    > here: it's not much longer as a simple File.open() or File.new() and it
    > ensures the file is always properly closed.


    You are right of course, I will try to do this in the future.

    > But you can do this
    >
    > xmldoc = Zlib::GzipReader.open('file.dia') {|gz| REXML::Document.new(
    > gz.read )}


    And this worked perfect for me :)

    > If there is no exception during GZIP reading I guess there might be a bug
    > somewhere. As a test I'd write the gunzipped content to another file and
    > do a diff on the plain xml and this output to see whether GzipReader
    > actually yields the same content.


    They do

    > Btw, did you actually try to make REXML read the first variant? Maybe you
    > have a problem in your XML file.


    Thank goodness the XML produced by dia is sound :)

    Kind regards,
    Guido
    Guido de Melo, Nov 14, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Brown
    Replies:
    7
    Views:
    566
    Roedy Green
    Mar 10, 2006
  2. Paul Smith

    Can sqlite read gzipped databases?

    Paul Smith, Mar 21, 2007, in forum: Python
    Replies:
    1
    Views:
    948
    John Nagle
    Mar 21, 2007
  3. John Nagle
    Replies:
    1
    Views:
    578
    Martin v. Löwis
    Nov 22, 2007
  4. Martin Hansen

    Making File.open work on gzipped files

    Martin Hansen, Aug 17, 2010, in forum: Ruby
    Replies:
    14
    Views:
    190
    Martin Hansen
    Aug 18, 2010
  5. DmitryB
    Replies:
    2
    Views:
    361
    Michele Dondi
    Jun 1, 2007
Loading...

Share This Page