Trouble writing to database: RSS-reader

Discussion in 'Python' started by Arne, Jan 21, 2008.

  1. Arne

    Arne Guest

    Hi!

    I try to make a rss-reader in python just for fun, and I'm almost
    finished. I don't have any syntax-errors, but when i run my program,
    nothing happends.

    This program is supposed to download a .xml-file, save the contents in
    a buffer-file(buffer.txt) and parse the file looking for start-tags.
    When it has found a start tag, it asumes that the content (between the
    start-tag and the end-tag) is on the same line, so then it removes the
    start-tag and the end-tag and saves the content and put it into a
    database.

    The problem is that i cant find the data in the database! If i watch
    my program while im running it, i can see that it sucsessfuly
    downloads the .xml-file from the web and saves it in the buffer.

    But I dont think that i save the data in the correct way, so it would
    be nice if someone had some time to help me.

    Full code: http://pastebin.com/m56487698
    Saving to database: http://pastebin.com/m7ec69e1b
    Retrieving from database: http://pastebin.com/m714c3ef8

    And yes, I know that there is rss-parseres already built, but this is
    only for learning.
     
    Arne, Jan 21, 2008
    #1
    1. Advertising

  2. On Mon, 21 Jan 2008 08:12:43 -0800 (PST), Arne <>
    declaimed the following in comp.lang.python:

    >
    > The problem is that i cant find the data in the database! If i watch
    > my program while im running it, i can see that it sucsessfuly
    > downloads the .xml-file from the web and saves it in the buffer.
    >

    Did you COMMIT the transaction with the database?

    DB-API specification is that connections do NOT perform auto-commit;
    so it you do a string of INSERT, and just close the connection, the
    changes are supposed to be rolled-back (deleted).
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Jan 21, 2008
    #2
    1. Advertising

  3. Arne a écrit :
    > Hi!
    >
    > I try to make a rss-reader in python just for fun, and I'm almost
    > finished.


    Bad news : you're not.

    > I don't have any syntax-errors, but when i run my program,
    > nothing happends.
    >
    > This program is supposed to download a .xml-file, save the contents in
    > a buffer-file(buffer.txt) and parse the file looking for start-tags.
    > When it has found a start tag, it asumes that the content (between the
    > start-tag and the end-tag) is on the same line,


    Very hazardous assumption. FWIW, you can more safely assule this will
    almost never be the case. FWIW, don't assume *anything* wrt/ newlines
    when it comes to XML - you can even have newlines between two attributes
    of a same tag...

    > so then it removes the
    > start-tag and the end-tag and saves the content and put it into a
    > database.
    >
    > The problem is that i cant find the data in the database! If i watch
    > my program while im running it, i can see that it sucsessfuly
    > downloads the .xml-file from the web and saves it in the buffer.
    >
    > But I dont think that i save the data in the correct way, so it would
    > be nice if someone had some time to help me.
    >
    > Full code: http://pastebin.com/m56487698
    > Saving to database: http://pastebin.com/m7ec69e1b
    > Retrieving from database: http://pastebin.com/m714c3ef8


    1/ you don't need to make each and every variable an attribute of the
    class - only use attributes for what constitute the object state (ie:
    need to be maintain between different, possibly unrelated method calls).
    In your update_sql method, for exemple, beside self.connection and
    _eventually_ self.cursor, you don't need any attribute - local variables
    are enough.

    2/ you don't need these <xxx>Stored variables at all - just reset
    title/link/description to None *when needed* (cf below), then test these
    variables against None.

    3/ learn to use if/elif properly !-)

    4/ *big* logic flaw (and probably the first cause of your problem): on
    *each* iteration, you reset your <xxx>Stored flags to False - whether
    you stored something in the database or not. Since you don't expect to
    have all there data on a single line (another wrong assumption : you
    might get a whole rss stream as one single big line), I bet you never
    write anything into the database .

    5/ other big flaw : either use an autoincrement for your primary key -
    and *dont* pass any value for it in your query - or provide (a
    *unique*) id by yourself.

    6/ FWIW, also learn to properly use the DB api - don't build your SQL
    query using string formatting, but pass the argument as a tuple, IOW:

    # bad:
    cursor.execute(
    '''INSERT INTO main VALUES(null, %s, %s, %s)'''
    % title, link, description
    )

    # good (assuming you're using an autoincrementing key for your id) :
    cursor.execute(
    "INSERT INTO main VALUES(<X>, <X>, <X>)",
    (title, link, description)
    )

    NB : replace <X> with the appropriate placeholder for your database - cf
    your db module documentation (usually either '?' or '%s')

    This will make the db module properly escape and convert values.

    7/ str.replace() doesn't modify the string in-place (Python strings are
    immutable), but returns a new string. so you want:
    line = line.replace('x', 'y')

    8/ you don't need to explicitely call connection.commit on each and
    every statement, and you don't need to call it at all on SELECT
    statements !-)

    9/ have you tried calling print_rss *twice* on the same instance ?-)

    10/ are you sure it's useful to open the same 'buffer.txt' file for
    writing *twice* (once in __init__, the other in update_sql). BTW, use
    open(), not file().

    11/ are you sure you need to use this buffer file at all ?

    12/ are you really *sure* you want to *destroy* your table and recreate
    it each time you call your script ?

    > And yes, I know that there is rss-parseres already built, but this is
    > only for learning.


    This should not prevent you from learning how to properly parse XML
    (hint: with an XML parser). XML is *not* a line-oriented format, so you
    just can't get nowhere trying to parse it this way.



    HTH
     
    Bruno Desthuilliers, Jan 21, 2008
    #3
  4. Dennis Lee Bieber a écrit :
    > On Mon, 21 Jan 2008 08:12:43 -0800 (PST), Arne <>
    > declaimed the following in comp.lang.python:
    >
    >> The problem is that i cant find the data in the database! If i watch
    >> my program while im running it, i can see that it sucsessfuly
    >> downloads the .xml-file from the web and saves it in the buffer.
    >>

    > Did you COMMIT the transaction with the database?


    Did you READ the code ?-)

    NB : Yes, he did. The problem*s* are elsewhere.
     
    Bruno Desthuilliers, Jan 21, 2008
    #4
  5. En Mon, 21 Jan 2008 14:12:43 -0200, Arne <> escribi�:

    > I try to make a rss-reader in python just for fun, and I'm almost
    > finished. I don't have any syntax-errors, but when i run my program,
    > nothing happends.
    >
    > This program is supposed to download a .xml-file, save the contents in
    > a buffer-file(buffer.txt) and parse the file looking for start-tags.
    > When it has found a start tag, it asumes that the content (between the
    > start-tag and the end-tag) is on the same line, so then it removes the
    > start-tag and the end-tag and saves the content and put it into a
    > database.


    That's a gratuitous assumption and may not hold on many sources; you
    should use a proper XML parser instead (using ElementTree, by example, is
    even easier than your sequence of find and replace)

    > The problem is that i cant find the data in the database! If i watch
    > my program while im running it, i can see that it sucsessfuly
    > downloads the .xml-file from the web and saves it in the buffer.


    Ok. So the problem should be either when you read the buffer again, when
    processing it, or when saving in the database.
    It's very strange to create the table each time you want to save anything,
    but this gives you another clue: the table is created and remains empty,
    else the select statement in print_rss would have failed. So you know that
    those lines are executed. Now, the print statement is your friend:

    self.buffer = file('buffer.txt')
    for line in self.buffer.readline():
    print "line=",line # add this and see what you get

    Once you get your code working, it's time to analyze it. I think someone
    told you "in Python, you have to use self. everywhere" and you read it
    literally. Let's see:

    def update_buffer(self):
    self.buffer = file('buffer.txt', 'w')
    self.temp_buffer = urllib2.urlopen(self.rssurl).read()
    self.buffer.write(self.temp_buffer)
    self.buffer.close()

    All those "self." are unneeded and wrong. You *can*, and *should*, use
    local variables. Perhaps it's a bit hard to grasp at first, but local
    variables, instance attributes and global variables are different things
    used for different purposes. I'll try an example: you [an object] have a
    diary, where you record things that you have to remember [your instance
    attributes, or "data members" as they are called on other languages]. You
    also carry a tiny notepad in your pocket, where you make a few notes when
    you are doing something, but you always throw away the page once the job
    is finished [local variables]. Your brothers, sisters and parents [other
    objects] use the same schema, but there is a whiteboard on the kitchen
    where important things that all of you have to know are recorded [global
    variables] (anybody can read and write on the board).
    Now, back to the code, why "self." everywhere? Let's see, self.buffer is a
    file: opened, written, and closed, all inside the same function. Once it's
    closed, there is no need to keep a reference to the file elsewhere. It's
    discardable, as your notepad pages: use a local variable instead. In fact,
    *all* your variables should be locals, the *only* things you should keep
    inside your object are rssurl and the database location, and perhaps
    temp_buffer (with another, more meaningful name, rssdata by example).

    Other -more or less random- remarks:

    if self.titleStored == True and self.linkStored == True and
    descriptionStored == True:

    Don't compare against True/False. Just use their boolean value:

    if titleStored and linkStored and descriptionStored:

    Your code resets those flags at *every* line read, and since a line
    contains at most one tag, they will never be True at the same time. You
    should reset the flags only after you got the three items and wrote them
    onto the database.

    The rss feed, after being read, is available into self.temp_buffer; why do
    you read it again from the buffer file? If you want to iterate over the
    individual lines, use:

    for line in self.temp_buffer.splitlines():

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 21, 2008
    #5
  6. Arne

    Arne Guest

    On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    > wrote:

    > This should not prevent you from learning how to properly parse XML
    > (hint: with an XML parser). XML is *not* a line-oriented format, so you
    > just can't get nowhere trying to parse it this way.
    >
    > HTH


    Do you think i should use xml.dom.minidom for this? I've never used
    it, and I don't know how to use it, but I've heard it's useful.

    So, I shouldn't use this techinicke (probably wrong spelled) trying to
    parse XML? Should i rather use minidom?

    Thank you for for answering, I've learnt a lot from both of you,
    Desthuilliers and Genellina! :)
     
    Arne, Jan 21, 2008
    #6
  7. Arne a écrit :
    > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    > > wrote:
    >
    >> This should not prevent you from learning how to properly parse XML
    >> (hint: with an XML parser). XML is *not* a line-oriented format, so you
    >> just can't get nowhere trying to parse it this way.
    >>
    >> HTH

    >
    > Do you think i should use xml.dom.minidom for this?


    I'd rather go for a sax parser. A dom parser is only useful if you need
    an in-memory representation of the whole document tree.

    >
    > So, I shouldn't use this techinicke (probably wrong spelled)


    May I suggest "technic" ?-)
     
    Bruno Desthuilliers, Jan 21, 2008
    #7
  8. En Mon, 21 Jan 2008 18:38:48 -0200, Arne <> escribi�:

    > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    > > wrote:
    >
    >> This should not prevent you from learning how to properly parse XML
    >> (hint: with an XML parser). XML is *not* a line-oriented format, so you
    >> just can't get nowhere trying to parse it this way.
    >>
    >> HTH

    >
    > Do you think i should use xml.dom.minidom for this? I've never used
    > it, and I don't know how to use it, but I've heard it's useful.
    >
    > So, I shouldn't use this techinicke (probably wrong spelled) trying to
    > parse XML? Should i rather use minidom?
    >
    > Thank you for for answering, I've learnt a lot from both of you,
    > Desthuilliers and Genellina! :)
    >


    Try ElementTree instead; there is an implementation included with Python
    2.5, documentation at http://effbot.org/zone/element.htm and another
    implementation available at http://codespeak.net/lxml/

    import xml.etree.cElementTree as ET
    import urllib2

    rssurl = 'http://www.jabber.org/news/rss.xml'
    rssdata = urllib2.urlopen(rssurl).read()
    rssdata = rssdata.replace('&', '&amp;') # ouch!

    tree = ET.fromstring(rssdata)
    for item in tree.getiterator('item'):
    print item.find('link').text
    print item.find('title').text
    print item.find('description').text
    print

    Note that this particular RSS feed is NOT a well formed XML document - I
    had to replace the & with &amp; to make the parser happy.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 21, 2008
    #8
  9. Arne

    MRAB Guest

    On Jan 21, 9:15 pm, Bruno Desthuilliers
    <> wrote:
    > Arne a écrit :
    >
    > > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    > > > wrote:

    >
    > >> This should not prevent you from learning how to properly parse XML
    > >> (hint: with an XML parser). XML is *not* a line-oriented format, so you
    > >> just can't get nowhere trying to parse it this way.

    >
    > >> HTH

    >
    > > Do you think i should use xml.dom.minidom for this?

    >
    > I'd rather go for a sax parser. A dom parser is only useful if you need
    > an in-memory representation of the whole document tree.
    >
    >
    >
    > > So, I shouldn't use this techinicke (probably wrong spelled)

    >
    > May I suggest "technic" ?-)


    That should be "technique"; just ask a Francophone! :)
     
    MRAB, Jan 22, 2008
    #9
  10. On Mon, 21 Jan 2008 19:24:09 +0100, Bruno Desthuilliers
    <> declaimed the
    following in comp.lang.python:

    >
    > Did you READ the code ?-)
    >

    I was in a bit of a hurry to get to work (why, I don't know -- since
    I'm due to be surplused in three weeks)... so took a quick grab at the
    most common reason for not finding "expected" data in a database -- the
    common /lack/ of commits.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Jan 22, 2008
    #10
  11. MRAB a écrit :
    > On Jan 21, 9:15 pm, Bruno Desthuilliers
    > <> wrote:
    >> Arne a écrit :

    (snip)
    >>> So, I shouldn't use this techinicke (probably wrong spelled)

    >> May I suggest "technic" ?-)

    >
    > That should be "technique"; just ask a Francophone! :)


    <mode="the usual frenchy">
    My bad :(
    </mode>
     
    Bruno Desthuilliers, Jan 22, 2008
    #11
  12. Arne

    Arne Guest

    On Jan 21, 11:25 pm, "Gabriel Genellina" <>
    wrote:
    > En Mon, 21 Jan 2008 18:38:48 -0200, Arne <> escribi�:
    >
    >
    >
    > > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    > > > wrote:

    >
    > >> This should not prevent you from learning how to properly parse XML
    > >> (hint: with an XML parser). XML is *not* a line-oriented format, so you
    > >> just can't get nowhere trying to parse it this way.

    >
    > >> HTH

    >
    > > Do you think i should use xml.dom.minidom for this? I've never used
    > > it, and I don't know how to use it, but I've heard it's useful.

    >
    > > So, I shouldn't use this techinicke (probably wrong spelled) trying to
    > > parse XML? Should i rather use minidom?

    >
    > > Thank you for for answering, I've learnt a lot from both of you,
    > > Desthuilliers and Genellina! :)

    >
    > Try ElementTree instead; there is an implementation included with Python  
    > 2.5, documentation  athttp://effbot.org/zone/element.htmand another  
    > implementation available athttp://codespeak.net/lxml/
    >
    > import xml.etree.cElementTree as ET
    > import urllib2
    >
    > rssurl = 'http://www.jabber.org/news/rss.xml'
    > rssdata = urllib2.urlopen(rssurl).read()
    > rssdata = rssdata.replace('&', '&amp;') # ouch!
    >
    > tree = ET.fromstring(rssdata)
    > for item in tree.getiterator('item'):
    >    print item.find('link').text
    >    print item.find('title').text
    >    print item.find('description').text
    >    print
    >
    > Note that this particular RSS feed is NOT a well formed XML document - I  
    > had to replace the & with &amp; to make the parser happy.
    >
    > --
    > Gabriel Genellina


    This look very interesting! But it looks like that no documents is
    well-formed! I've tried several RSS-feeds, but they are eighter
    "undefined entity" or "not well-formed". This is not how it should be,
    right? :)
     
    Arne, Jan 23, 2008
    #12
  13. En Wed, 23 Jan 2008 14:06:10 -0200, Arne <> escribió:

    > On Jan 21, 11:25 pm, "Gabriel Genellina" <>
    > wrote:
    >> > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
    >> > > wrote:

    >>
    >> >> This should not prevent you from learning how to properly parse XML
    >> >> (hint: with an XML parser). XML is *not* a line-oriented format, so

    >> you
    >> >> just can't get nowhere trying to parse it this way.

    >>
    >> Try ElementTree instead; there is an implementation included with
    >> Python  

    >
    > This look very interesting! But it looks like that no documents is
    > well-formed! I've tried several RSS-feeds, but they are eighter
    > "undefined entity" or "not well-formed". This is not how it should be,
    > right? :)


    Well, the RSS feed "should" be valid XML...
    Try a more forgiving parser like BeautifulStone, or preprocess the input
    with Tidy or a similar program before feeding it to ElementTree.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 23, 2008
    #13
  14. On 1/23/08, Arne <> wrote:
    > On Jan 21, 11:25pm, "Gabriel Genellina" <>
    > wrote:
    > > En Mon, 21 Jan 2008 18:38:48 -0200, Arne <> escribi�:
    > >

    [...]

    >
    > This look very interesting! But it looks like that no documents is
    > well-formed! I've tried several RSS-feeds, but they are eighter
    > "undefined entity" or "not well-formed". This is not how it should be,
    > right? :)
    >


    Go to http://www.feedparser.org
    Download feedparser.py
    Read the documentation, at least.: you will find out a lot about
    working with rss.

    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    member thudfoo, Jan 24, 2008
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. thecoder

    XML parser for RSS reader

    thecoder, Oct 17, 2005, in forum: XML
    Replies:
    6
    Views:
    435
    thecoder
    Oct 19, 2005
  2. wannieb
    Replies:
    0
    Views:
    734
    wannieb
    May 15, 2006
  3. Guilherme Grillo

    reader inside a reader

    Guilherme Grillo, Nov 7, 2007, in forum: ASP .Net
    Replies:
    5
    Views:
    545
    sloan
    Nov 7, 2007
  4. Pokkai Dokkai
    Replies:
    1
    Views:
    250
    Hassan Schroeder
    Mar 24, 2008
  5. Jonathan Groll
    Replies:
    1
    Views:
    327
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page