RSS Parser Help..

G

Gim Ick

I am trying to parse a rss file. I use the rss module to do it.

Suppose this is the data file,

<item>
<title>Singapore Airlines Asia Travel - A345 All Business Class to
Asia</title>
<pubDate>Fri, 18 Sep 2009 22:56:33 +0000</pubDate>
<guid
isPermaLink="false">http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d#galvezcreative</guid>
<link>http://a345.singaporeair.com/</link>
<dc:creator><![CDATA[galvezcreative]]></dc:creator>
<comments>http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d</comments>
<wfw:commentRss>http://feeds.delicious.com/v2/rss/url/cc78bfa8bb00f50825d7cac52339375d</wfw:commentRss>
<source
url="http://feeds.delicious.com/v2/rss/galvezcreative">galvezcreative's
bookmarks</source>
<category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>
<category
domain="http://delicious.com/galvezcreative/">marketing</category>
</item>

How do I parse to get value in category( In the above example it is
Industry-Airlines and marketing).

When i try rss.items[0].category , I get the entire element( In the
above case, <category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>)
 
K

Kouhei Sutou

Hi,

In <[email protected]>
"RSS Parser Help.." on Sat, 19 Sep 2009 09:21:36 +0900,
Gim Ick said:
I am trying to parse a rss file. I use the rss module to do it.

Suppose this is the data file,

<item>
<title>Singapore Airlines Asia Travel - A345 All Business Class to
Asia</title>
<pubDate>Fri, 18 Sep 2009 22:56:33 +0000</pubDate>
<guid
isPermaLink="false">http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d#galvezcreative</guid>
<link>http://a345.singaporeair.com/</link>
<dc:creator><![CDATA[galvezcreative]]></dc:creator>
<comments>http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d</comments>
<wfw:commentRss>http://feeds.delicious.com/v2/rss/url/cc78bfa8bb00f50825d7cac52339375d</wfw:commentRss>
<source
url="http://feeds.delicious.com/v2/rss/galvezcreative">galvezcreative's
bookmarks</source>
<category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>
<category
domain="http://delicious.com/galvezcreative/">marketing</category>
</item>

How do I parse to get value in category( In the above example it is
Industry-Airlines and marketing).

rss.items[0].categories.each do |category|
p category.content
end
When i try rss.items[0].category , I get the entire element( In the
above case, <category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>)

rss.items[0].category returns Category object not "<category
...>...</category>" string. (Hint: Category object has #to_s
method that returns "<category ...>...</category>" string)

Thanks,
 
R

Richard.Williams.20

I am trying to parse a rss file.  I use the rss module to do it.

Suppose this is the data file,

  <item>
      <title>Singapore Airlines Asia Travel - A345 All Business Class to
Asia</title>
      <pubDate>Fri, 18 Sep 2009 22:56:33 +0000</pubDate>
      <guid
isPermaLink="false">http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d#galvezcreative</guid>
      <link>http://a345.singaporeair.com/</link>
      <dc:creator><![CDATA[galvezcreative]]></dc:creator>
      <comments>http://delicious.com/url/cc78bfa8bb00f50825d7cac52339375d</comments>
      <wfw:commentRss>http://feeds.delicious.com/v2/rss/url/cc78bfa8bb00f50825d7cac52339375d</wfw:commentRss>
      <source
url="http://feeds.delicious.com/v2/rss/galvezcreative">galvezcreative's
bookmarks</source>
      <category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>
      <category
domain="http://delicious.com/galvezcreative/">marketing</category>
    </item>

How do I parse to get value in category( In the above example it is
Industry-Airlines and marketing).

When i try rss.items[0].category , I get the entire element( In the
above case, <category
domain="http://delicious.com/galvezcreative/">Industry-Airlines</category>)



Alternate biterscripting script.

# Script category.txt
var str rss ; cat "file.rss" > $rss
while ( { sen -r -c "^^" $rss } > 0 )
do
var str category ; stex -r -c "^<category&\>&</category\>^" $rss >
$category
stex -r -c "^<category&\>^]" $category > null ; stex -r -c "[^</
category\>^" $category > null
echo $category
done


For documentation on stex (string extractor) command, see
http://www.biterscripting.com/helppages/stex.html


Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top