Trouble writing to database: RSS-reader

Arne · Jan 21, 2008

Hi!

I try to make a rss-reader in python just for fun, and I'm almost
finished. I don't have any syntax-errors, but when i run my program,
nothing happends.

This program is supposed to download a .xml-file, save the contents in
a buffer-file(buffer.txt) and parse the file looking for start-tags.
When it has found a start tag, it asumes that the content (between the
start-tag and the end-tag) is on the same line, so then it removes the
start-tag and the end-tag and saves the content and put it into a
database.

The problem is that i cant find the data in the database! If i watch
my program while im running it, i can see that it sucsessfuly
downloads the .xml-file from the web and saves it in the buffer.

But I dont think that i save the data in the correct way, so it would
be nice if someone had some time to help me.

Full code: http://pastebin.com/m56487698
Saving to database: http://pastebin.com/m7ec69e1b
Retrieving from database: http://pastebin.com/m714c3ef8

And yes, I know that there is rss-parseres already built, but this is
only for learning.

Dennis Lee Bieber · Jan 21, 2008

The problem is that i cant find the data in the database! If i watch
my program while im running it, i can see that it sucsessfuly
downloads the .xml-file from the web and saves it in the buffer.

Did you COMMIT the transaction with the database?

DB-API specification is that connections do NOT perform auto-commit;
so it you do a string of INSERT, and just close the connection, the
changes are supposed to be rolled-back (deleted).
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Bruno Desthuilliers · Jan 21, 2008

Arne a écrit :

Hi!

I try to make a rss-reader in python just for fun, and I'm almost
finished.

Bad news : you're not.

I don't have any syntax-errors, but when i run my program,
nothing happends.

This program is supposed to download a .xml-file, save the contents in
a buffer-file(buffer.txt) and parse the file looking for start-tags.
When it has found a start tag, it asumes that the content (between the
start-tag and the end-tag) is on the same line,

Very hazardous assumption. FWIW, you can more safely assule this will
almost never be the case. FWIW, don't assume *anything* wrt/ newlines
when it comes to XML - you can even have newlines between two attributes
of a same tag...

so then it removes the
start-tag and the end-tag and saves the content and put it into a
database.

The problem is that i cant find the data in the database! If i watch
my program while im running it, i can see that it sucsessfuly
downloads the .xml-file from the web and saves it in the buffer.

But I dont think that i save the data in the correct way, so it would
be nice if someone had some time to help me.

Full code: http://pastebin.com/m56487698
Saving to database: http://pastebin.com/m7ec69e1b
Retrieving from database: http://pastebin.com/m714c3ef8

1/ you don't need to make each and every variable an attribute of the
class - only use attributes for what constitute the object state (ie:
need to be maintain between different, possibly unrelated method calls).
In your update_sql method, for exemple, beside self.connection and
_eventually_ self.cursor, you don't need any attribute - local variables
are enough.

2/ you don't need these <xxx>Stored variables at all - just reset
title/link/description to None *when needed* (cf below), then test these
variables against None.

3/ learn to use if/elif properly !-)

4/ *big* logic flaw (and probably the first cause of your problem): on
*each* iteration, you reset your <xxx>Stored flags to False - whether
you stored something in the database or not. Since you don't expect to
have all there data on a single line (another wrong assumption : you
might get a whole rss stream as one single big line), I bet you never
write anything into the database .

5/ other big flaw : either use an autoincrement for your primary key -
and *dont* pass any value for it in your query - or provide (a
*unique*) id by yourself.

6/ FWIW, also learn to properly use the DB api - don't build your SQL
query using string formatting, but pass the argument as a tuple, IOW:

# bad:
cursor.execute(
'''INSERT INTO main VALUES(null, %s, %s, %s)'''
% title, link, description
)

# good (assuming you're using an autoincrementing key for your id) :
cursor.execute(
"INSERT INTO main VALUES(<X>, <X>, <X>)",
(title, link, description)
)

NB : replace <X> with the appropriate placeholder for your database - cf
your db module documentation (usually either '?' or '%s')

This will make the db module properly escape and convert values.

7/ str.replace() doesn't modify the string in-place (Python strings are
immutable), but returns a new string. so you want:
line = line.replace('x', 'y')

8/ you don't need to explicitely call connection.commit on each and
every statement, and you don't need to call it at all on SELECT
statements !-)

9/ have you tried calling print_rss *twice* on the same instance ?-)

10/ are you sure it's useful to open the same 'buffer.txt' file for
writing *twice* (once in __init__, the other in update_sql). BTW, use
open(), not file().

11/ are you sure you need to use this buffer file at all ?

12/ are you really *sure* you want to *destroy* your table and recreate
it each time you call your script ?

And yes, I know that there is rss-parseres already built, but this is
only for learning.

This should not prevent you from learning how to properly parse XML
(hint: with an XML parser). XML is *not* a line-oriented format, so you
just can't get nowhere trying to parse it this way.

HTH

Bruno Desthuilliers · Jan 21, 2008

Dennis Lee Bieber a écrit :

Did you COMMIT the transaction with the database?

Did you READ the code ?-)

NB : Yes, he did. The problem*s* are elsewhere.

Gabriel Genellina · Jan 21, 2008

En Mon said:
I try to make a rss-reader in python just for fun, and I'm almost
finished. I don't have any syntax-errors, but when i run my program,
nothing happends.

This program is supposed to download a .xml-file, save the contents in
a buffer-file(buffer.txt) and parse the file looking for start-tags.
When it has found a start tag, it asumes that the content (between the
start-tag and the end-tag) is on the same line, so then it removes the
start-tag and the end-tag and saves the content and put it into a
database.

That's a gratuitous assumption and may not hold on many sources; you
should use a proper XML parser instead (using ElementTree, by example, is
even easier than your sequence of find and replace)

The problem is that i cant find the data in the database! If i watch
my program while im running it, i can see that it sucsessfuly
downloads the .xml-file from the web and saves it in the buffer.

Ok. So the problem should be either when you read the buffer again, when
processing it, or when saving in the database.
It's very strange to create the table each time you want to save anything,
but this gives you another clue: the table is created and remains empty,
else the select statement in print_rss would have failed. So you know that
those lines are executed. Now, the print statement is your friend:

self.buffer = file('buffer.txt')
for line in self.buffer.readline():
print "line=",line # add this and see what you get

Once you get your code working, it's time to analyze it. I think someone
told you "in Python, you have to use self. everywhere" and you read it
literally. Let's see:

def update_buffer(self):
self.buffer = file('buffer.txt', 'w')
self.temp_buffer = urllib2.urlopen(self.rssurl).read()
self.buffer.write(self.temp_buffer)
self.buffer.close()

All those "self." are unneeded and wrong. You *can*, and *should*, use
local variables. Perhaps it's a bit hard to grasp at first, but local
variables, instance attributes and global variables are different things
used for different purposes. I'll try an example: you [an object] have a
diary, where you record things that you have to remember [your instance
attributes, or "data members" as they are called on other languages]. You
also carry a tiny notepad in your pocket, where you make a few notes when
you are doing something, but you always throw away the page once the job
is finished [local variables]. Your brothers, sisters and parents [other
objects] use the same schema, but there is a whiteboard on the kitchen
where important things that all of you have to know are recorded [global
variables] (anybody can read and write on the board).
Now, back to the code, why "self." everywhere? Let's see, self.buffer is a
file: opened, written, and closed, all inside the same function. Once it's
closed, there is no need to keep a reference to the file elsewhere. It's
discardable, as your notepad pages: use a local variable instead. In fact,
*all* your variables should be locals, the *only* things you should keep
inside your object are rssurl and the database location, and perhaps
temp_buffer (with another, more meaningful name, rssdata by example).

Other -more or less random- remarks:

if self.titleStored == True and self.linkStored == True and
descriptionStored == True:

Don't compare against True/False. Just use their boolean value:

if titleStored and linkStored and descriptionStored:

Your code resets those flags at *every* line read, and since a line
contains at most one tag, they will never be True at the same time. You
should reset the flags only after you got the three items and wrote them
onto the database.

The rss feed, after being read, is available into self.temp_buffer; why do
you read it again from the buffer file? If you want to iterate over the
individual lines, use:

for line in self.temp_buffer.splitlines():

Arne · Jan 21, 2008

This should not prevent you from learning how to properly parse XML
(hint: with an XML parser). XML is *not* a line-oriented format, so you
just can't get nowhere trying to parse it this way.

HTH

Do you think i should use xml.dom.minidom for this? I've never used
it, and I don't know how to use it, but I've heard it's useful.

So, I shouldn't use this techinicke (probably wrong spelled) trying to
parse XML? Should i rather use minidom?

Thank you for for answering, I've learnt a lot from both of you,
Desthuilliers and Genellina!

Bruno Desthuilliers · Jan 21, 2008

Arne a écrit :

Do you think i should use xml.dom.minidom for this?

I'd rather go for a sax parser. A dom parser is only useful if you need
an in-memory representation of the whole document tree.

So, I shouldn't use this techinicke (probably wrong spelled)

May I suggest "technic" ?-)

Gabriel Genellina · Jan 21, 2008

En Mon said:
Do you think i should use xml.dom.minidom for this? I've never used
it, and I don't know how to use it, but I've heard it's useful.

So, I shouldn't use this techinicke (probably wrong spelled) trying to
parse XML? Should i rather use minidom?

Thank you for for answering, I've learnt a lot from both of you,
Desthuilliers and Genellina!

Try ElementTree instead; there is an implementation included with Python
2.5, documentation at http://effbot.org/zone/element.htm and another
implementation available at http://codespeak.net/lxml/

import xml.etree.cElementTree as ET
import urllib2

rssurl = 'http://www.jabber.org/news/rss.xml'
rssdata = urllib2.urlopen(rssurl).read()
rssdata = rssdata.replace('&', '&') # ouch!

tree = ET.fromstring(rssdata)
for item in tree.getiterator('item'):
print item.find('link').text
print item.find('title').text
print item.find('description').text
print

Note that this particular RSS feed is NOT a well formed XML document - I
had to replace the & with & to make the parser happy.

MRAB · Jan 22, 2008

Arne a écrit :

I'd rather go for a sax parser. A dom parser is only useful if you need
an in-memory representation of the whole document tree.

May I suggest "technic" ?-)

That should be "technique"; just ask a Francophone!

Dennis Lee Bieber · Jan 22, 2008

Did you READ the code ?-)

I was in a bit of a hurry to get to work (why, I don't know -- since
I'm due to be surplused in three weeks)... so took a quick grab at the
most common reason for not finding "expected" data in a database -- the
common /lack/ of commits.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Bruno Desthuilliers · Jan 22, 2008

MRAB a écrit :

That should be "technique"; just ask a Francophone!

<mode="the usual frenchy">
My bad

</mode>

Arne · Jan 23, 2008

En Mon, 21 Jan 2008 18:38:48 -0200, Arne <[email protected]> escribiï¿½:

Try ElementTree instead; there is an implementation included with Python Â
2.5, documentation Â athttp://effbot.org/zone/element.htmand another Â
implementation available athttp://codespeak.net/lxml/

import xml.etree.cElementTree as ET
import urllib2

rssurl = 'http://www.jabber.org/news/rss.xml'
rssdata = urllib2.urlopen(rssurl).read()
rssdata = rssdata.replace('&', '&') # ouch!

tree = ET.fromstring(rssdata)
for item in tree.getiterator('item'):
Â Â print item.find('link').text
Â Â print item.find('title').text
Â Â print item.find('description').text
Â Â print

Note that this particular RSS feed is NOT a well formed XML document - I Â
had to replace the & with & to make the parser happy.

This look very interesting! But it looks like that no documents is
well-formed! I've tried several RSS-feeds, but they are eighter
"undefined entity" or "not well-formed". This is not how it should be,
right?

Gabriel Genellina · Jan 23, 2008

En Wed said:
This look very interesting! But it looks like that no documents is
well-formed! I've tried several RSS-feeds, but they are eighter
"undefined entity" or "not well-formed". This is not how it should be,
right?

Well, the RSS feed "should" be valid XML...
Try a more forgiving parser like BeautifulStone, or preprocess the input
with Tidy or a similar program before feeding it to ElementTree.

member thudfoo · Jan 24, 2008

En Mon, 21 Jan 2008 18:38:48 -0200, Arne <[email protected]> escribiï¿½:

Click to expand...

[...]

This look very interesting! But it looks like that no documents is
well-formed! I've tried several RSS-feeds, but they are eighter
"undefined entity" or "not well-formed". This is not how it should be,
right?

Go to http://www.feedparser.org
Download feedparser.py
Read the documentation, at least.: you will find out a lot about
working with rss.

PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
Database schema for file organizer.	1	May 17, 2022
Separate Rows in reader	29	Mar 24, 2013
Random winners from .csv	0	Apr 10, 2022
Can't execute php to delete multiple rows in database	3	May 14, 2023
Unicode characters, XML/RSS	1	Jul 31, 2008
How do i add parentheses and exponents to my code?	2	Dec 1, 2022
requirements in writing an email/rss/usenet client?	2	Aug 9, 2010

Trouble writing to database: RSS-reader

Arne

Dennis Lee Bieber

Bruno Desthuilliers

Bruno Desthuilliers

Gabriel Genellina

Arne

Bruno Desthuilliers

Gabriel Genellina

MRAB

Dennis Lee Bieber

Bruno Desthuilliers

Arne

Gabriel Genellina

member thudfoo

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads