Getting a valid URL from a command line

H

Hunt Jon

Hi - I'm working on the script below, which attempts at getting
a user input and validate that the input is formed like a URL.
And if the user fails to input, it should ask again.

require 'uri'
puts "Type a URL"
begin
url = gets.chomp
URI.parse(url) # should raise if a variable 'url' is malformed.
rescue URI::InvalidURIError
puts "That is not a valid URL. Try again."
retry
end

I expect that if I run "URI.parse()" it should raise an error, but
it doesn't happen.

Can anybody help me on this one?

Jon
 
R

Rob Biedenharn

Hi - I'm working on the script below, which attempts at getting
a user input and validate that the input is formed like a URL.
And if the user fails to input, it should ask again.

require 'uri'
puts "Type a URL"
begin
url = gets.chomp
URI.parse(url) # should raise if a variable 'url' is malformed.
rescue URI::InvalidURIError
puts "That is not a valid URL. Try again."
retry
end

I expect that if I run "URI.parse()" it should raise an error, but
it doesn't happen.

Can anybody help me on this one?

Jon

require 'uri'
print "Type a URL: "
begin
url = gets.chomp
puts "You said: #{url.inspect}"
uri = URI.parse(url) # should raise if a variable 'url' is malformed.
puts uri.inspect
rescue URI::InvalidURIError
puts "That is not a valid URL. Try again."
retry
end

Try getting a little bit more information out (and post what input you
are trying that you expect to be malformed).

Note that some URI's are HTTP and some might be Generic. There are a
lot more types of URI that just those that start with http://. Have
you ever seen a jdbc resource string?

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 
H

Hunt Jon

require 'uri'
print "Type a URL: "
begin
=A0url =3D gets.chomp
=A0puts "You said: #{url.inspect}"
=A0uri =3D URI.parse(url) # should raise if a variable 'url' is malformed=
 
R

Rob Biedenharn

I expect a user to input a HTTP or HTTPS URL. e.g., http://abcdef.gov
Maybe using URI seems *too* generic after the research as 'uri' means
different protocols, not just http/https.

I'll look into it. Perhaps using Regexp match would be better.

Jon

You can see what the scheme is determined to be:

irb> require 'uri'
=> true
irb> u=URI.parse('http://example.com/')
=> #<URI::HTTP:0x395b34 URL:http://example.com/>
irb> u.scheme
=> "http"
irb> x=URI.parse('example.com')
=> #<URI::Generic:0x392f24 URL:example.com>
irb> x.scheme
=> nil

You probably don't want to jump down the Regexp rabbit-hole if you
know that you want a valid URI. Let the library do the heavy lifting.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 
D

David Masover

I expect a user to input a HTTP or HTTPS URL. e.g., http://abcdef.gov
Maybe using URI seems *too* generic after the research as 'uri' means
different protocols, not just http/https.

Well, a URI isn't even required to work. Just a clarification:

A URL is meant to actually refer to a resource. For example,

http://ruby-lang.org/

actually refers to a working website, and is thus a URL -- thus, the protocol
must be something that actually exists, and as a practical matter, you'll want
it to be something you (or your browser) know how to handle.


A URI only needs to be globally unique. For example:

http://www.w3.org/1999/xhtml

It doesn't matter AT ALL whether this points to a working resource. The Web
will continue to work, even if w3.org completely implodes. As a matter of
courtesy, the W3C has actually made this a valid URL, which points to a
description of what that namespace is, and the specifications that use it --
but when your browser sees that URI at the top of a web page:

<html xmlns='http://www.w3.org/1999/xhtml' ...>

It doesn't actually talk to w3.org at all. It just knows internally that this
namespaces is where HTML elements go in an XHTML document.



On a completely unrelated note, if you know how XML namespaces work,
technically, the following would probably work, on browsers that understand
XHTML:

<foobar:html xmlns:foobar='http://www.w3.org/1999/xhtml'>
<foobar:head>
...
</foobar:head>
<foobar:body>
...
</foobar:body>
</foobar:html>

I suspect that the spec explicitly disallows this, at least in the
"transitional" mode, because it's not backwards compatible with HTML 4.0. But
the point is, internally, the browser is looking for an html element
associated with that URI -- which is why it's not a valid xhtml document if
you don't include that xmlns in some form.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top