read xml file from compressed file using gzip

F

flebber

I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml . I cannot however seem to use the gzip module
correctly. Have tried the program 2 ways for no success, any ideas
would be appreciated.

Attempt 1

#!/usr/bin/python

import os
import gzip
playlist_file = open('/home/flebber/oddalt.k3b')
class GzipFile([playlist_file[decompress[9, 'rb']]]);

os.system(open("/home/flebber/tmp/maindata.xml"));

for line in maindata.xml:
print line

playlist_file.close()

Attempt 2 - largely just trying to get gzip to work

#!/usr/bin/python

import gzip
fileObj = Gzipfile("/home/flebber/oddalt.k3b", 'rb');
fileContent = fileObj.read()
for line in filecontent:
print line

fileObj.close()
 
S

Stefan Behnel

flebber said:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.

Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
 
F

flebber

I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work

This is my latest attempt

#!/usr/bin/python

import os
import zlib

class gzip('/home/flebber/oddalt.k3b', 'rb')

main_data = os.system(open("/home/flebber/maindata.xml"));

for line in main_data:
print line

main_data.close()
 
S

Stefan Behnel

flebber said:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml

The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
 
F

flebber

The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.
 
J

John Machin

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.


Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
> class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.

> os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
.... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
 
F

flebber

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.
Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.
os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John

Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing. Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

For the record
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795 print line
.... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

and
lines = xml_string.splitlines()
print len(lines) 387
print len(lines[0]) 38
for line in lines:
.... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block print line
 
J

John Machin

Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing.

IMHO it's always better to learn by: read some, try it out, read some, ...
Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

Well, that's the wrong sort of book for learning a language. You need
one with little exercises on each page, plus a couple of bigger ones per
chapter. It helps to get used to looking things up in the manual.
Compare the description in the manual with what's in the book.
For the record
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795print line
... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

Yup. At a rough guess, I'd say it printed 10795 lines.

So now you've learned by doing it what
for x in a_string:
does :)

I hope you've also learned that "xml_string" was a good name and "line"
wasn't quite so good.

Have you looked up splitlines in the manual?

print len(lines) 387
print len(lines[0]) 38
for line in lines:
... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented blockprint line

After you fixed your indentation error, did it look like what you
expected to find?

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top