REXML/RSS parse error

Discussion in 'Ruby' started by Patrick Plattes, Dec 7, 2006.

  1. Hello,

    I have a problem while parsing an RSS file. I try to open a URL via
    open-uri and it usually works fine, but with the RSS URLs from ccMixter
    I get a parse error. It's a bit strange because if i download the file
    and try to open it, it works fine.

    I tried:
    rss =
    RSS::parser.parse("http://ccmixter.org/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss",false)

    And got:
    RSS::NotWellFormedError: This is not well formed XML
    Missing end tag for 'html' (got "head")
    Line:
    Position:
    Last 80 unconsumed characters:
    from /usr/lib/ruby/1.8/rss/rexmlparser.rb:24:in `_parse'
    from /usr/lib/ruby/1.8/rss/parser.rb:163:in `parse'
    from /usr/lib/ruby/1.8/rss/parser.rb:78:in `parse'
    from (irb):43


    If i save the file and try to open it, it works fine:
    rss = RSS::parser.parse("query",false)


    Imho there should be no difference between open a local file or an URL.


    Thanks for all the help I got the last days from this list,
    Patrick
     
    Patrick Plattes, Dec 7, 2006
    #1
    1. Advertising

  2. Patrick Plattes

    Kouhei Sutou Guest

    Hi,

    In <>
    "REXML/RSS parse error" on Thu, 7 Dec 2006 19:45:53 +0900,
    Patrick Plattes <> wrote:

    > I have a problem while parsing an RSS file. I try to open a URL via
    > open-uri and it usually works fine, but with the RSS URLs from ccMixter
    > I get a parse error. It's a bit strange because if i download the file
    > and try to open it, it works fine.
    >
    > I tried:
    > rss =
    > RSS::parser.parse("http://ccmixter.org/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss",false)
    >
    > And got:
    > RSS::NotWellFormedError: This is not well formed XML
    > Missing end tag for 'html' (got "head")


    I got some garbages after RSS 2.0:

    % ruby -r open-uri -e 'puts open("http://ccmixter.org/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss").read' | tail -n 25
    </item>
    </channel>
    </rss>
    "/web/ccmixter/www/cclib/cc-util.php"(205): Cannot modify header information - headers already sent by (output started at /web/ccmixter/www/cclib/cc-feed.php:432) [2006-12-07 07:10 am][138.243.129.4][/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss]
    <html>
    <head>
    <style>
    body {
    font-size: 11px;
    font-family: Verdana, sans-serif;
    background-color: #F99;
    margin: 4%;
    text-align: center;
    }
    </style>
    </head>
    <body>
    <p> <img src="/mixter-files/skull.gif" /></p>
    <h3>wups, ccMixter is experiencing technical difficulties...</h3>
    <p>If you were in the middle of an upload or posting a message it probably worked OK
    but you should click <a href="/">here</a> to get back to the site's home page or
    use your browser's BACK button to return to the site and make sure.</p>
    <p>The admins have been notified of the problem and will look into it very shortly.</p>
    </body>
    </head>


    Thanks,
    --
    kou
     
    Kouhei Sutou, Dec 7, 2006
    #2
    1. Advertising

  3. Kouhei Sutou schrieb:
    > I got some garbages after RSS 2.0:


    Thank you, I hadn't seen it. I've written an e-mail to them, but the
    most RSS reader are able to parse this malicious file. Do you know any
    way to force the parser to read this file. For RSS It would be ok, to
    stop parsing after the closing RSS tag.

    Thanks,
    Patrick
     
    Patrick Plattes, Dec 7, 2006
    #3
  4. Patrick Plattes

    Kouhei Sutou Guest

    Hi,

    In <>
    "Re: REXML/RSS parse error" on Thu, 7 Dec 2006 23:56:38 +0900,
    Patrick Plattes <> wrote:

    > most RSS reader are able to parse this malicious file. Do you know any
    > way to force the parser to read this file. For RSS It would be ok, to
    > stop parsing after the closing RSS tag.


    What about gsub(/<\/rss>.*\z/m, '</rss>')?

    Thanks,
    --
    kou
     
    Kouhei Sutou, Dec 7, 2006
    #4
  5. Kouhei Sutou schrieb:

    > What about gsub(/<\/rss>.*\z/m, '</rss>')?


    Yes, that works very well :). I'm very happy now *g*. Thank you very much,
    Patrick
     
    Patrick Plattes, Dec 7, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Damphyr
    Replies:
    2
    Views:
    141
    Damphyr
    Jul 16, 2003
  2. Daniel Berger

    rexml error - REXML::Validation

    Daniel Berger, Oct 12, 2004, in forum: Ruby
    Replies:
    2
    Views:
    153
    Henrik Horneber
    Oct 12, 2004
  3. Phlip
    Replies:
    0
    Views:
    143
    Phlip
    Jan 15, 2008
  4. Pavel Drobushevich

    rexml system error on parse

    Pavel Drobushevich, Dec 29, 2008, in forum: Ruby
    Replies:
    0
    Views:
    92
    Pavel Drobushevich
    Dec 29, 2008
  5. Jonathan Groll
    Replies:
    1
    Views:
    281
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page