read xml file from compressed file using gzip

Discussion in 'Python' started by flebber, Jun 8, 2007.

  1. flebber

    flebber Guest

    I was working at creating a simple program that would read the content
    of a playlist file( in this case *.k3b") and write it out . the
    compressed "*.k3b" file has two file and the one I was trying to read
    was maindata.xml . I cannot however seem to use the gzip module
    correctly. Have tried the program 2 ways for no success, any ideas
    would be appreciated.

    Attempt 1

    #!/usr/bin/python

    import os
    import gzip
    playlist_file = open('/home/flebber/oddalt.k3b')
    class GzipFile([playlist_file[decompress[9, 'rb']]]);

    os.system(open("/home/flebber/tmp/maindata.xml"));

    for line in maindata.xml:
    print line

    playlist_file.close()

    Attempt 2 - largely just trying to get gzip to work

    #!/usr/bin/python

    import gzip
    fileObj = Gzipfile("/home/flebber/oddalt.k3b", 'rb');
    fileContent = fileObj.read()
    for line in filecontent:
    print line

    fileObj.close()
    flebber, Jun 8, 2007
    #1
    1. Advertising

  2. flebber wrote:
    > I was working at creating a simple program that would read the content
    > of a playlist file( in this case *.k3b") and write it out . the
    > compressed "*.k3b" file has two file and the one I was trying to read
    > was maindata.xml.


    Consider using lxml. It reads in gzip compressed XML files transparently and
    provides loads of other nice XML goodies.

    http://codespeak.net/lxml/dev/

    Stefan
    Stefan Behnel, Jun 8, 2007
    #2
    1. Advertising

  3. flebber

    flebber Guest

    On Jun 8, 3:31 pm, Stefan Behnel <> wrote:
    > flebber wrote:
    > > I was working at creating a simple program that would read the content
    > > of a playlist file( in this case *.k3b") and write it out . the
    > > compressed "*.k3b" file has two file and the one I was trying to read
    > > was maindata.xml.

    >
    > Consider using lxml. It reads in gzip compressed XML files transparently and
    > provides loads of other nice XML goodies.
    >
    > http://codespeak.net/lxml/dev/
    >
    > Stefan


    I will, baby steps at the moment for me at the moment though as I am
    only learning and can't get gzip to work
    flebber, Jun 8, 2007
    #3
  4. flebber

    flebber Guest

    On Jun 8, 9:45 pm, flebber <> wrote:
    > On Jun 8, 3:31 pm, Stefan Behnel <> wrote:
    >
    > > flebber wrote:
    > > > I was working at creating a simple program that would read the content
    > > > of a playlist file( in this case *.k3b") and write it out . the
    > > > compressed "*.k3b" file has two file and the one I was trying to read
    > > > was maindata.xml.

    >
    > > Consider using lxml. It reads in gzip compressed XML files transparently and
    > > provides loads of other nice XML goodies.

    >
    > >http://codespeak.net/lxml/dev/

    >
    > > Stefan

    >
    > I will, baby steps at the moment for me at the moment though as I am
    > only learning and can't get gzip to work


    This is my latest attempt

    #!/usr/bin/python

    import os
    import zlib

    class gzip('/home/flebber/oddalt.k3b', 'rb')

    main_data = os.system(open("/home/flebber/maindata.xml"));

    for line in main_data:
    print line

    main_data.close()
    flebber, Jun 8, 2007
    #4
  5. En Fri, 08 Jun 2007 10:00:58 -0300, flebber <>
    escribió:

    >> I will, baby steps at the moment for me at the moment though as I am
    >> only learning and can't get gzip to work


    Try reading some tutorial from http://wiki.python.org/moin/BeginnersGuide

    --
    Gabriel Genellina
    Gabriel Genellina, Jun 9, 2007
    #5
  6. flebber wrote:
    > I was working at creating a simple program that would read the content
    > of a playlist file( in this case *.k3b") and write it out . the
    > compressed "*.k3b" file has two file and the one I was trying to read
    > was maindata.xml


    The k3b format is a ZIP archive. Use the zipfile library:

    file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

    Stefan
    Stefan Behnel, Jun 9, 2007
    #6
  7. flebber

    flebber Guest

    On Jun 10, 3:45 am, Stefan Behnel <> wrote:
    > flebber wrote:
    > > I was working at creating a simple program that would read the content
    > > of a playlist file( in this case *.k3b") and write it out . the
    > > compressed "*.k3b" file has two file and the one I was trying to read
    > > was maindata.xml

    >
    > The k3b format is a ZIP archive. Use the zipfile library:
    >
    > file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
    >
    > Stefan


    Thanks for all the help, have been using the docs at python.org and
    the magnus t Hetland book. Is there any docs tha re a little more
    practical or expressive as most of the module documentation is very
    confusing for a beginner and doesn't provide much in the way of
    examples on how to use the modules.

    Not criticizing the docs as they are probably very good for
    experienced programmers.
    flebber, Jun 10, 2007
    #7
  8. flebber

    John Machin Guest

    On 10/06/2007 3:06 PM, flebber wrote:
    > On Jun 10, 3:45 am, Stefan Behnel <> wrote:
    >> flebber wrote:
    >>> I was working at creating a simple program that would read the content
    >>> of a playlist file( in this case *.k3b") and write it out . the
    >>> compressed "*.k3b" file has two file and the one I was trying to read
    >>> was maindata.xml

    >> The k3b format is a ZIP archive. Use the zipfile library:
    >>
    >> file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
    >>
    >> Stefan

    >
    > Thanks for all the help, have been using the docs at python.org and
    > the magnus t Hetland book. Is there any docs tha re a little more
    > practical or expressive as most of the module documentation is very
    > confusing for a beginner and doesn't provide much in the way of
    > examples on how to use the modules.
    >
    > Not criticizing the docs as they are probably very good for
    > experienced programmers.
    >



    Somebody else has already drawn your attention to the/a tutorial. You
    need to read, understand, and work through a *good* introductory book or
    tutorial before jumping into the deep end.

    > class GzipFile([playlist_file[decompress[9, 'rb']]]);


    Errr, no, the [] are a documentation device used in most computer
    language documentation to denote optional elements -- you don't type
    them into your program. See below.

    Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
    file), they're quite different animals, so you need the zipfile module,
    not the gzip module.


    > os.system(open("/home/flebber/tmp/maindata.xml"));


    The manuals say quite simply and clearly that:
    open() returns a file object
    os.system's arg is a string (a command, like "grep -i fubar *.pl")
    So that's guaranteed not to work.

    From the docs of the zipfile module:
    """
    class ZipFile( file[, mode[, compression[, allowZip64]]])

    Open a ZIP file, where file can be either a path to a file (a string) or
    a file-like object. The mode parameter should be 'r' to read an existing
    file, 'w' to truncate and write a new file,
    or 'a' to append to an existing file.
    """
    .... and you don't care about the rest of the class docs in your simple
    case of reading.

    A class has to be called like a function to give you an object which is
    an instance of that class. You need only the first argument; the second
    has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
    play it safe and explicit:

    import zipfile
    zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

    OK, some more useful docs:
    """
    namelist( )
    Return a list of archive members by name.
    printdir( )
    Print a table of contents for the archive to sys.stdout.
    read( name)
    Return the bytes of the file in the archive. The archive must be
    open for read or append.
    """

    So give the following a try:

    print zf.namelist()
    zf.printdir()
    xml_string = zf.read('maindata.xml')
    zf.close()

    # xml_string will be a string which may or may not have line endings in
    it ...
    print len(xml_string)

    # If you can't imagine what the next two lines will do,
    # you'll have to do it once, just to see what happens:
    for line in xml_string:
    print line

    # Wasn't that fun? How big was that file? Now do this:
    lines = xml_text.splitlines()
    print len(lines) # number of lines
    print len(lines[0]) # length of first line

    # Ummm, maybe if it's only one line you don't want to do this either,
    # but what the heck:
    for line in lines:
    print line

    HTH,
    John
    John Machin, Jun 10, 2007
    #8
  9. flebber

    flebber Guest

    On Jun 10, 7:43 pm, John Machin <> wrote:
    > On 10/06/2007 3:06 PM, flebber wrote:
    >
    >
    >
    > > On Jun 10, 3:45 am, Stefan Behnel <> wrote:
    > >> flebber wrote:
    > >>> I was working at creating a simple program that would read the content
    > >>> of a playlist file( in this case *.k3b") and write it out . the
    > >>> compressed "*.k3b" file has two file and the one I was trying to read
    > >>> was maindata.xml
    > >> The k3b format is a ZIP archive. Use the zipfile library:

    >
    > >> file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

    >
    > >> Stefan

    >
    > > Thanks for all the help, have been using the docs at python.org and
    > > the magnus t Hetland book. Is there any docs tha re a little more
    > > practical or expressive as most of the module documentation is very
    > > confusing for a beginner and doesn't provide much in the way of
    > > examples on how to use the modules.

    >
    > > Not criticizing the docs as they are probably very good for
    > > experienced programmers.

    >
    > Somebody else has already drawn your attention to the/a tutorial. You
    > need to read, understand, and work through a *good* introductory book or
    > tutorial before jumping into the deep end.
    >
    > > class GzipFile([playlist_file[decompress[9, 'rb']]]);

    >
    > Errr, no, the [] are a documentation device used in most computer
    > language documentation to denote optional elements -- you don't type
    > them into your program. See below.
    >
    > Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
    > file), they're quite different animals, so you need the zipfile module,
    > not the gzip module.
    >
    > > os.system(open("/home/flebber/tmp/maindata.xml"));

    >
    > The manuals say quite simply and clearly that:
    > open() returns a file object
    > os.system's arg is a string (a command, like "grep -i fubar *.pl")
    > So that's guaranteed not to work.
    >
    > From the docs of the zipfile module:
    > """
    > class ZipFile( file[, mode[, compression[, allowZip64]]])
    >
    > Open a ZIP file, where file can be either a path to a file (a string) or
    > a file-like object. The mode parameter should be 'r' to read an existing
    > file, 'w' to truncate and write a new file,
    > or 'a' to append to an existing file.
    > """
    > ... and you don't care about the rest of the class docs in your simple
    > case of reading.
    >
    > A class has to be called like a function to give you an object which is
    > an instance of that class. You need only the first argument; the second
    > has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
    > play it safe and explicit:
    >
    > import zipfile
    > zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')
    >
    > OK, some more useful docs:
    > """
    > namelist( )
    > Return a list of archive members by name.
    > printdir( )
    > Print a table of contents for the archive to sys.stdout.
    > read( name)
    > Return the bytes of the file in the archive. The archive must be
    > open for read or append.
    > """
    >
    > So give the following a try:
    >
    > print zf.namelist()
    > zf.printdir()
    > xml_string = zf.read('maindata.xml')
    > zf.close()
    >
    > # xml_string will be a string which may or may not have line endings in
    > it ...
    > print len(xml_string)
    >
    > # If you can't imagine what the next two lines will do,
    > # you'll have to do it once, just to see what happens:
    > for line in xml_string:
    > print line
    >
    > # Wasn't that fun? How big was that file? Now do this:
    > lines = xml_text.splitlines()
    > print len(lines) # number of lines
    > print len(lines[0]) # length of first line
    >
    > # Ummm, maybe if it's only one line you don't want to do this either,
    > # but what the heck:
    > for line in lines:
    > print line
    >
    > HTH,
    > John


    Thanks that was so helpful to see how to do it. I have read a lot but
    it wasn't sinking in, and sometimes its better to learn by doing. Some
    of the books I have read just seem to go from theory to theory with
    the occasional example ( which is meant to show us how good the author
    is rather than help us).

    For the record

    >>> ## working on region in file /usr/tmp/python-F_C5sr.py...

    ['mimetype', 'maindata.xml']
    File Name
    Modified Size
    mimetype 2007-05-27
    20:36:20 17
    maindata.xml 2007-05-27
    20:36:20 10795
    >>> print len(xml_string)

    10795
    >>> for line in xml_string:

    print line
    .... ...
    <
    ?
    x
    m
    l

    v
    e
    r
    s
    i.....(etc ...it went for a while)

    and

    >>> lines = xml_string.splitlines()
    >>> print len(lines)

    387
    >>> print len(lines[0])

    38
    >>> for line in lines:

    .... print line
    File "<stdin>", line 2
    print line
    ^
    IndentationError: expected an indented block
    >>> for line in lines:

    print line
    flebber, Jun 10, 2007
    #9
  10. flebber

    John Machin Guest

    On 10/06/2007 8:08 PM, flebber wrote:
    >
    > Thanks that was so helpful to see how to do it. I have read a lot but
    > it wasn't sinking in, and sometimes its better to learn by doing.


    IMHO it's always better to learn by: read some, try it out, read some, ...

    > Some
    > of the books I have read just seem to go from theory to theory with
    > the occasional example ( which is meant to show us how good the author
    > is rather than help us).


    Well, that's the wrong sort of book for learning a language. You need
    one with little exercises on each page, plus a couple of bigger ones per
    chapter. It helps to get used to looking things up in the manual.
    Compare the description in the manual with what's in the book.

    >
    > For the record
    >
    >>>> ## working on region in file /usr/tmp/python-F_C5sr.py...

    > ['mimetype', 'maindata.xml']
    > File Name
    > Modified Size
    > mimetype 2007-05-27
    > 20:36:20 17
    > maindata.xml 2007-05-27
    > 20:36:20 10795
    >>>> print len(xml_string)

    > 10795
    >>>> for line in xml_string:

    > print line
    > ... ...
    > <
    > ?
    > x
    > m
    > l
    >
    > v
    > e
    > r
    > s
    > i.....(etc ...it went for a while)


    Yup. At a rough guess, I'd say it printed 10795 lines.

    So now you've learned by doing it what
    for x in a_string:
    does :)

    I hope you've also learned that "xml_string" was a good name and "line"
    wasn't quite so good.

    >
    > and
    >
    >>>> lines = xml_string.splitlines()


    Have you looked up splitlines in the manual?


    >>>> print len(lines)

    > 387
    >>>> print len(lines[0])

    > 38
    >>>> for line in lines:

    > ... print line
    > File "<stdin>", line 2
    > print line
    > ^
    > IndentationError: expected an indented block
    >>>> for line in lines:

    > print line
    >


    After you fixed your indentation error, did it look like what you
    expected to find?

    Cheers,
    John
    John Machin, Jun 10, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Wall

    How do I read compressed (.Z) files

    Mark Wall, May 26, 2004, in forum: Python
    Replies:
    0
    Views:
    307
    Mark Wall
    May 26, 2004
  2. Replies:
    3
    Views:
    384
    Fredrik Lundh
    Dec 13, 2004
  3. Replies:
    7
    Views:
    558
    Charlie Gordon
    Oct 1, 2007
  4. m_ahlenius
    Replies:
    2
    Views:
    288
    m_ahlenius
    Feb 8, 2010
  5. flashkot
    Replies:
    0
    Views:
    111
    flashkot
    Apr 25, 2007
Loading...

Share This Page