Parsing query parameters from hyperlink

Discussion in 'Ruby' started by lrlebron@gmail.com, Sep 1, 2007.

  1. Guest

    I am trying to parse strings like this
    <a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

    I need to get the cpnum value (555)

    I am using the following function

    def get_drugId(link)
    arrParts = link.html.split('?')
    cpnum = arrParts[1].split('&amp')
    cpnumparts= cpnum[0].split("=")
    drugId = cpnumparts[1]
    end

    but I imagine there is a simpler way to do this. Also, I would like
    something more flexible that would return all the query parameters (if
    there are more than one) in an array or a hash.

    Any ideas?

    thanks,

    Luis
    , Sep 1, 2007
    #1
    1. Advertising

  2. On 01.09.2007 19:34, wrote:
    > I am trying to parse strings like this
    > <a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
    >
    > I need to get the cpnum value (555)
    >
    > I am using the following function
    >
    > def get_drugId(link)
    > arrParts = link.html.split('?')
    > cpnum = arrParts[1].split('&amp')
    > cpnumparts= cpnum[0].split("=")
    > drugId = cpnumparts[1]
    > end
    >
    > but I imagine there is a simpler way to do this. Also, I would like
    > something more flexible that would return all the query parameters (if
    > there are more than one) in an array or a hash.
    >
    > Any ideas?


    The std lib:

    require 'uri'

    irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    irb(main):007:0> u.query
    => "dodo=1&dada=2"
    irb(main):008:0> u.query.split('&')
    => ["dodo=1", "dada=2"]
    ....

    robert
    Robert Klemme, Sep 1, 2007
    #2
    1. Advertising

  3. On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
    > On 01.09.2007 19:34, wrote:
    > >I am trying to parse strings like this
    > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
    > >
    > >I need to get the cpnum value (555)
    > >
    > >I am using the following function
    > >
    > >def get_drugId(link)
    > > arrParts = link.html.split('?')
    > > cpnum = arrParts[1].split('&amp')
    > > cpnumparts= cpnum[0].split("=")
    > > drugId = cpnumparts[1]
    > > end
    > >
    > >but I imagine there is a simpler way to do this. Also, I would like
    > >something more flexible that would return all the query parameters (if
    > >there are more than one) in an array or a hash.
    > >
    > >Any ideas?

    >
    > The std lib:
    >
    > require 'uri'
    >
    > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    > irb(main):007:0> u.query
    > => "dodo=1&dada=2"
    > irb(main):008:0> u.query.split('&')
    > => ["dodo=1", "dada=2"]
    > ...


    Query strings are allowed to use semicolons as delimeters, not to
    mention you must handle multiple values per key. I recommend using the
    CGI library with the URI library:

    irb(main):001:0> require 'uri'
    => true
    irb(main):002:0> require 'cgi'
    => true
    irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
    => {"a"=>["b"], "b"=>["c"]}
    irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
    => {"a"=>["b"], "b"=>["c"]}
    irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
    => {"b"=>["a", "c"]}
    irb(main):006:0>

    --
    Aaron Patterson
    http://tenderlovemaking.com/
    Aaron Patterson, Sep 1, 2007
    #3
  4. Guest

    On Sep 1, 2:15 pm, Aaron Patterson <> wrote:
    > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
    > > On 01.09.2007 19:34, wrote:
    > > >I am trying to parse strings like this
    > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

    >
    > > >I need to get the cpnum value (555)

    >
    > > >I am using the following function

    >
    > > >def get_drugId(link)
    > > > arrParts = link.html.split('?')
    > > > cpnum = arrParts[1].split('&amp')
    > > > cpnumparts= cpnum[0].split("=")
    > > > drugId = cpnumparts[1]
    > > > end

    >
    > > >but I imagine there is a simpler way to do this. Also, I would like
    > > >something more flexible that would return all the query parameters (if
    > > >there are more than one) in an array or a hash.

    >
    > > >Any ideas?

    >
    > > The std lib:

    >
    > > require 'uri'

    >
    > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    > > irb(main):007:0> u.query
    > > => "dodo=1&dada=2"
    > > irb(main):008:0> u.query.split('&')
    > > => ["dodo=1", "dada=2"]
    > > ...

    >
    > Query strings are allowed to use semicolons as delimeters, not to
    > mention you must handle multiple values per key. I recommend using the
    > CGI library with the URI library:
    >
    > irb(main):001:0> require 'uri'
    > => true
    > irb(main):002:0> require 'cgi'
    > => true
    > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
    > => {"a"=>["b"], "b"=>["c"]}
    > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
    > => {"a"=>["b"], "b"=>["c"]}
    > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
    > => {"b"=>["a", "c"]}
    > irb(main):006:0>
    >
    > --
    > Aaron Pattersonhttp://tenderlovemaking.com/- Hide quoted text -
    >
    > - Show quoted text -


    This would work if the string where a proper url. But it is a
    hyperlink.
    , Sep 1, 2007
    #4
  5. On Sun, Sep 02, 2007 at 04:30:05AM +0900, wrote:
    > On Sep 1, 2:15 pm, Aaron Patterson <> wrote:
    > > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
    > > > On 01.09.2007 19:34, wrote:
    > > > >I am trying to parse strings like this
    > > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

    > >
    > > > >I need to get the cpnum value (555)

    > >
    > > > >I am using the following function

    > >
    > > > >def get_drugId(link)
    > > > > arrParts = link.html.split('?')
    > > > > cpnum = arrParts[1].split('&amp')
    > > > > cpnumparts= cpnum[0].split("=")
    > > > > drugId = cpnumparts[1]
    > > > > end

    > >
    > > > >but I imagine there is a simpler way to do this. Also, I would like
    > > > >something more flexible that would return all the query parameters (if
    > > > >there are more than one) in an array or a hash.

    > >
    > > > >Any ideas?

    > >
    > > > The std lib:

    > >
    > > > require 'uri'

    > >
    > > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    > > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    > > > irb(main):007:0> u.query
    > > > => "dodo=1&dada=2"
    > > > irb(main):008:0> u.query.split('&')
    > > > => ["dodo=1", "dada=2"]
    > > > ...

    > >
    > > Query strings are allowed to use semicolons as delimeters, not to
    > > mention you must handle multiple values per key. I recommend using the
    > > CGI library with the URI library:
    > >
    > > irb(main):001:0> require 'uri'
    > > => true
    > > irb(main):002:0> require 'cgi'
    > > => true
    > > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
    > > => {"a"=>["b"], "b"=>["c"]}
    > > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
    > > => {"a"=>["b"], "b"=>["c"]}
    > > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
    > > => {"b"=>["a", "c"]}
    > > irb(main):006:0>
    > >
    > > --
    > > Aaron Pattersonhttp://tenderlovemaking.com/- Hide quoted text -
    > >
    > > - Show quoted text -

    >
    > This would work if the string where a proper url. But it is a
    > hyperlink.


    Use hpricot to extract the href, then feed it though URI and CGI.

    --
    Aaron Patterson
    http://tenderlovemaking.com/
    Aaron Patterson, Sep 1, 2007
    #5
  6. Guest

    On Sep 1, 2:29 pm, "" <> wrote:
    > On Sep 1, 2:15 pm, Aaron Patterson <> wrote:
    >
    >
    >
    >
    >
    > > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
    > > > On 01.09.2007 19:34, wrote:
    > > > >I am trying to parse strings like this
    > > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

    >
    > > > >I need to get the cpnum value (555)

    >
    > > > >I am using the following function

    >
    > > > >def get_drugId(link)
    > > > > arrParts = link.html.split('?')
    > > > > cpnum = arrParts[1].split('&amp')
    > > > > cpnumparts= cpnum[0].split("=")
    > > > > drugId = cpnumparts[1]
    > > > > end

    >
    > > > >but I imagine there is a simpler way to do this. Also, I would like
    > > > >something more flexible that would return all the query parameters (if
    > > > >there are more than one) in an array or a hash.

    >
    > > > >Any ideas?

    >
    > > > The std lib:

    >
    > > > require 'uri'

    >
    > > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    > > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    > > > irb(main):007:0> u.query
    > > > => "dodo=1&dada=2"
    > > > irb(main):008:0> u.query.split('&')
    > > > => ["dodo=1", "dada=2"]
    > > > ...

    >
    > > Query strings are allowed to use semicolons as delimeters, not to
    > > mention you must handle multiple values per key. I recommend using the
    > > CGI library with the URI library:

    >
    > > irb(main):001:0> require 'uri'
    > > => true
    > > irb(main):002:0> require 'cgi'
    > > => true
    > > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
    > > => {"a"=>["b"], "b"=>["c"]}
    > > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
    > > => {"a"=>["b"], "b"=>["c"]}
    > > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
    > > => {"b"=>["a", "c"]}
    > > irb(main):006:0>

    >
    > > --
    > > Aaron Pattersonhttp://tenderlovemaking.com/-Hide quoted text -

    >
    > > - Show quoted text -

    >
    > This would work if the string where a proper url. But it is a
    > hyperlink.- Hide quoted text -
    >
    > - Show quoted text -


    Sorry for the second reply. I took your suggestions and came up with
    the following

    require 'uri'
    require 'cgi'

    str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

    def get_cpnum(link)
    arrParts = link.split(' ')
    CGI.parse(URI.parse(arrParts[1]).query)['cpnum']
    end

    puts get_cpnum(str)
    , Sep 1, 2007
    #6
  7. Phil Guest

    wrote:
    > This would work if the string where a proper url. But it is a
    > hyperlink.


    Your point? A hyperlink *is* a URL in the WWW context.

    --
    Phillip Gawlowski
    Phil, Sep 1, 2007
    #7
  8. Guest

    On Sep 1, 3:50 pm, Phil <> wrote:
    > wrote:
    > > This would work if the string where a proper url. But it is a
    > > hyperlink.

    >
    > Your point? A hyperlink *is* a URL in the WWW context.
    >
    > --
    > Phillip Gawlowski


    If you try to parse URI throws an error.
    , Sep 2, 2007
    #8
  9. Guest

    On Sep 1, 2:47 pm, Aaron Patterson <> wrote:
    > On Sun, Sep 02, 2007 at 04:30:05AM +0900, wrote:
    > > On Sep 1, 2:15 pm, Aaron Patterson <> wrote:
    > > > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
    > > > > On 01.09.2007 19:34, wrote:
    > > > > >I am trying to parse strings like this
    > > > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

    >
    > > > > >I need to get the cpnum value (555)

    >
    > > > > >I am using the following function

    >
    > > > > >def get_drugId(link)
    > > > > > arrParts = link.html.split('?')
    > > > > > cpnum = arrParts[1].split('&amp')
    > > > > > cpnumparts= cpnum[0].split("=")
    > > > > > drugId = cpnumparts[1]
    > > > > > end

    >
    > > > > >but I imagine there is a simpler way to do this. Also, I would like
    > > > > >something more flexible that would return all the query parameters (if
    > > > > >there are more than one) in an array or a hash.

    >
    > > > > >Any ideas?

    >
    > > > > The std lib:

    >
    > > > > require 'uri'

    >
    > > > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
    > > > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
    > > > > irb(main):007:0> u.query
    > > > > => "dodo=1&dada=2"
    > > > > irb(main):008:0> u.query.split('&')
    > > > > => ["dodo=1", "dada=2"]
    > > > > ...

    >
    > > > Query strings are allowed to use semicolons as delimeters, not to
    > > > mention you must handle multiple values per key. I recommend using the
    > > > CGI library with the URI library:

    >
    > > > irb(main):001:0> require 'uri'
    > > > => true
    > > > irb(main):002:0> require 'cgi'
    > > > => true
    > > > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
    > > > => {"a"=>["b"], "b"=>["c"]}
    > > > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
    > > > => {"a"=>["b"], "b"=>["c"]}
    > > > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
    > > > => {"b"=>["a", "c"]}
    > > > irb(main):006:0>

    >
    > > > --
    > > > Aaron Pattersonhttp://tenderlovemaking.com/-Hide quoted text -

    >
    > > > - Show quoted text -

    >
    > > This would work if the string where a proper url. But it is a
    > > hyperlink.

    >
    > Use hpricot to extract the href, then feed it though URI and CGI.
    >
    > --
    > Aaron Pattersonhttp://tenderlovemaking.com/


    Here's what I ended up with

    require 'uri'
    require 'cgi'
    require 'hpricot'

    def get_query_value(link, key='')
    doc = Hpricot(link)

    if key.empty?
    CGI.parse(URI.parse(doc.at("a")['href']).query)
    else
    CGI.parse(URI.parse(doc.at("a")['href']).query)[key]
    end

    end

    str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

    p get_query_value(str)
    puts get_query_value(str,'cpnum')
    puts get_query_value(str,'monotype')

    It allows me to ask for the complete hash or a particular key

    Thanks,

    Luis
    , Sep 2, 2007
    #9
  10. On 02.09.2007 01:03, wrote:
    > On Sep 1, 3:50 pm, Phil <> wrote:
    >> wrote:
    >>> This would work if the string where a proper url. But it is a
    >>> hyperlink.

    >> Your point? A hyperlink *is* a URL in the WWW context.
    >>
    >> --
    >> Phillip Gawlowski

    >
    > If you try to parse URI throws an error.


    Does it? This works for me:

    irb(main):001:0> require 'uri'
    => true
    irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
    => #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
    irb(main):003:0> u.query
    => "x=2"
    irb(main):004:0> u=URI.parse('baz?x=2')
    => #<URI::Generic:0x3ff9f15c URL:baz?x=2>
    irb(main):005:0> u.query
    => "x=2"

    Cheers

    robert
    Robert Klemme, Sep 2, 2007
    #10
  11. Guest

    On Sep 2, 6:59 am, Robert Klemme <> wrote:
    > On 02.09.2007 01:03, wrote:
    >
    > > On Sep 1, 3:50 pm, Phil <> wrote:
    > >> wrote:
    > >>> This would work if the string where a proper url. But it is a
    > >>> hyperlink.
    > >> Your point? A hyperlink *is* a URL in the WWW context.

    >
    > >> --
    > >> Phillip Gawlowski

    >
    > > If you try to parse URI throws an error.

    >
    > Does it? This works for me:
    >
    > irb(main):001:0> require 'uri'
    > => true
    > irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
    > => #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
    > irb(main):003:0> u.query
    > => "x=2"
    > irb(main):004:0> u=URI.parse('baz?x=2')
    > => #<URI::Generic:0x3ff9f15c URL:baz?x=2>
    > irb(main):005:0> u.query
    > => "x=2"
    >
    > Cheers
    >
    > robert


    I meant if you try to parse the string
    str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"
    it throws an error.

    c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad URI(is not
    URI?): <a href='showmono.asp?cpnum=555&monotype=full' target='main'>
    (URI::InvalidURIError)
    from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
    from uritest.rb:8
    , Sep 2, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?R3JlZyBSZXlub2xkcw==?=

    How to pass multiple hyperlink parameters with datagrid

    =?Utf-8?B?R3JlZyBSZXlub2xkcw==?=, Oct 20, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    4,568
    Shiva
    Oct 20, 2004
  2. Lan H. Nguyen

    passing parameters to hyperlink control

    Lan H. Nguyen, Nov 15, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    11,043
  3. davetichenor
    Replies:
    1
    Views:
    804
    Eliyahu Goldin
    Oct 30, 2006
  4. Ken
    Replies:
    1
    Views:
    360
    Alvin Bruney
    Aug 16, 2003
  5. Dave
    Replies:
    0
    Views:
    923
Loading...

Share This Page