Get content in a xml element using hpricot

Discussion in 'Ruby' started by Bonita, Apr 13, 2007.

  1. Bonita

    Bonita Guest

    Hi


    I'm using hpricot to parse the following file.

    <item
    rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
    <title>[from morwyn] * HTML for the Conceptually Challenged</title>
    <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
    <description>HTML for the Conceptually Challenged. Very basic tutorial,
    plainly worded for people who hate to read instructions.</description>
    <dc:creator>morwyn</dc:creator>
    <dc:date>2006-10-10T07:28:28Z</dc:date>
    <dc:subject>html imported webpagedesign</dc:subject>
    <taxo:topics>
    <rdf:Bag>
    <rdf:li resource="http://del.icio.us/tag/imported" />
    <rdf:li resource="http://del.icio.us/tag/html" />
    <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
    </rdf:Bag>
    </taxo:topics>
    </item>

    I'm trying to get the content from <dc:subject> like this

    doc = Hpricot.parse(File.read("965.xhtml"))

    (doc/"item").each do |t|

    puts (t/"dc:subject").innerTEXT

    end

    but I got

    <dc:subject>html internet tutorial web</dc:subject>

    while I only need "html internet tutorial web"

    Anyone knows what's the right function to call?

    THanks

    --
    Posted via http://www.ruby-forum.com/.
    Bonita, Apr 13, 2007
    #1
    1. Advertising

  2. Bonita

    Guest

    On Apr 13, 9:48 am, Bonita <> wrote:
    > Hi
    >
    > I'm using hpricot to parse the following file.
    >
    > <item
    > rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
    > <title>[from morwyn] * HTML for the Conceptually Challenged</title>
    > <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
    > <description>HTML for the Conceptually Challenged. Very basic tutorial,
    > plainly worded for people who hate to read instructions.</description>
    > <dc:creator>morwyn</dc:creator>
    > <dc:date>2006-10-10T07:28:28Z</dc:date>
    > <dc:subject>html imported webpagedesign</dc:subject>
    > <taxo:topics>
    > <rdf:Bag>
    > <rdf:li resource="http://del.icio.us/tag/imported" />
    > <rdf:li resource="http://del.icio.us/tag/html" />
    > <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
    > </rdf:Bag>
    > </taxo:topics>
    > </item>
    >
    > I'm trying to get the content from <dc:subject> like this
    >
    > doc = Hpricot.parse(File.read("965.xhtml"))
    >
    > (doc/"item").each do |t|
    >
    > puts (t/"dc:subject").innerTEXT
    >
    > end
    >
    > but I got
    >
    > <dc:subject>html internet tutorial web</dc:subject>
    >
    > while I only need "html internet tutorial web"
    >
    > Anyone knows what's the right function to call?
    >
    > THanks
    >
    > --
    > Posted viahttp://www.ruby-forum.com/.


    >> puts (t/'dc:subject').text
    , Apr 13, 2007
    #2
    1. Advertising

  3. Bonita

    Guest

    On Apr 13, 12:40 pm, wrote:
    > On Apr 13, 9:48 am, Bonita <> wrote:
    >
    > > Hi

    >
    > > I'm using hpricot to parse the following file.

    >
    > > <item
    > > rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
    > > <title>[from morwyn] * HTML for the Conceptually Challenged</title>
    > > <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
    > > <description>HTML for the Conceptually Challenged. Very basic tutorial,
    > > plainly worded for people who hate to read instructions.</description>
    > > <dc:creator>morwyn</dc:creator>
    > > <dc:date>2006-10-10T07:28:28Z</dc:date>
    > > <dc:subject>html imported webpagedesign</dc:subject>
    > > <taxo:topics>
    > > <rdf:Bag>
    > > <rdf:li resource="http://del.icio.us/tag/imported" />
    > > <rdf:li resource="http://del.icio.us/tag/html" />
    > > <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
    > > </rdf:Bag>
    > > </taxo:topics>
    > > </item>

    >
    > > I'm trying to get the content from <dc:subject> like this

    >
    > > doc = Hpricot.parse(File.read("965.xhtml"))

    >
    > > (doc/"item").each do |t|

    >
    > > puts (t/"dc:subject").innerTEXT

    >
    > > end

    >
    > > but I got

    >
    > > <dc:subject>html internet tutorial web</dc:subject>

    >
    > > while I only need "html internet tutorial web"

    >
    > > Anyone knows what's the right function to call?

    >
    > > THanks

    >
    > > --
    > > Posted viahttp://www.ruby-forum.com/.
    > >> puts (t/'dc:subject').text


    puts (t/'dc:subject').text

    Sorry for the double post but I shouldn't have copy/paste the result
    directly from irb :(
    , Apr 13, 2007
    #3
  4. Bonita

    Billy Hsu Guest

    Sorry for deleted your text :(

    Maybe you can try:

    puts (t/"dc:subject").text

    Bonita wrote:
    > I'm trying to get the content from <dc:subject> like this
    >
    > doc = Hpricot.parse(File.read("965.xhtml"))
    >
    > (doc/"item").each do |t|
    >
    > puts (t/"dc:subject").innerTEXT
    >
    > end
    >
    > but I got
    >
    > <dc:subject>html internet tutorial web</dc:subject>
    >
    > while I only need "html internet tutorial web"
    >
    > Anyone knows what's the right function to call?
    >
    > THanks


    --
    Posted via http://www.ruby-forum.com/.
    Billy Hsu, Apr 13, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. frankabel
    Replies:
    4
    Views:
    391
  2. HANM
    Replies:
    2
    Views:
    698
    Joseph Kesselman
    Jan 29, 2008
  3. Mark Nielsen

    Scraping 3rd element with hpricot

    Mark Nielsen, Dec 9, 2008, in forum: Ruby
    Replies:
    2
    Views:
    89
    Mark Nielsen
    Dec 10, 2008
  4. Nikita Ratlos

    extracing the URL from hpricot element

    Nikita Ratlos, Dec 10, 2008, in forum: Ruby
    Replies:
    0
    Views:
    83
    Nikita Ratlos
    Dec 10, 2008
  5. Milo Thurston

    Adding new xml element with hpricot

    Milo Thurston, Mar 16, 2009, in forum: Ruby
    Replies:
    0
    Views:
    100
    Milo Thurston
    Mar 16, 2009
Loading...

Share This Page