Parsing query parameters from hyperlink

L

lrlebron

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

thanks,

Luis
 
R

Robert Klemme

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

The std lib:

require 'uri'

irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
....

robert
 
A

Aaron Patterson

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

The std lib:

require 'uri'

irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>
 
L

lrlebron

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
I need to get the cpnum value (555)
I am using the following function
def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end
but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.
Any ideas?
The std lib:
require 'uri'
irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>

This would work if the string where a proper url. But it is a
hyperlink.
 
A

Aaron Patterson

On 01.09.2007 19:34, (e-mail address removed) wrote:
I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
I need to get the cpnum value (555)
I am using the following function
def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end
but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.
Any ideas?
The std lib:
require 'uri'
irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>

This would work if the string where a proper url. But it is a
hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.
 
L

lrlebron

On 01.09.2007 19:34, (e-mail address removed) wrote:
I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
I need to get the cpnum value (555)
I am using the following function
def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end
but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.
Any ideas?
The std lib:
require 'uri'
irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...
Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>
- Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.- Hide quoted text -

- Show quoted text -

Sorry for the second reply. I took your suggestions and came up with
the following

require 'uri'
require 'cgi'

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

def get_cpnum(link)
arrParts = link.split(' ')
CGI.parse(URI.parse(arrParts[1]).query)['cpnum']
end

puts get_cpnum(str)
 
L

lrlebron

34, (e-mail address removed) wrote:
I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
I need to get the cpnum value (555)
I am using the following function
def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end
but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.
Any ideas?
The std lib:
require 'uri'
irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...
Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>
This would work if the string where a proper url. But it is a
hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.

Here's what I ended up with

require 'uri'
require 'cgi'
require 'hpricot'

def get_query_value(link, key='')
doc = Hpricot(link)

if key.empty?
CGI.parse(URI.parse(doc.at("a")['href']).query)
else
CGI.parse(URI.parse(doc.at("a")['href']).query)[key]
end

end

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

p get_query_value(str)
puts get_query_value(str,'cpnum')
puts get_query_value(str,'monotype')

It allows me to ask for the complete hash or a particular key

Thanks,

Luis
 
R

Robert Klemme

If you try to parse URI throws an error.

Does it? This works for me:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> "x=2"
irb(main):004:0> u=URI.parse('baz?x=2')
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> "x=2"

Cheers

robert
 
L

lrlebron

Does it? This works for me:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> "x=2"
irb(main):004:0> u=URI.parse('baz?x=2')
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> "x=2"

Cheers

robert

I meant if you try to parse the string
str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"
it throws an error.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad URI(is not
URI?): <a href='showmono.asp?cpnum=555&monotype=full' target='main'>
(URI::InvalidURIError)
from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
from uritest.rb:8
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top