Extract URL from HTML

D

Darius Blaszijk

Hi there,

I started recently with my first HTML. I'm now busy with an app. that parses
a HTML file and extracts some of the data. This question's I have concern
the extraction of an URL. Please be so kind to look the questions through
and comment. There are a lot of them, and you don't need to answer
everything at the same time. Some questions will be obvious, and some not,
but I hope I'll learn from them.

As I understand I need the following;

<BASE> href indicates the base path + host of later found URL's
Q: is HTML case sensitive?? Sometimes href is upcase sometimes locase?!?!
Q: can <BASE> occur more than once in a file and can it occur at the end?
Q can the BASE URI be http//host/index.html?? Which means that when further
links occur I ned to strip index.html and add the two together??

<a is element which contains URL. In the <a element an <b element can occur
which is the title of the URL

Q: Fragment identifier is '#', can this character also occur in an URL?? In
other words can it be also part of an URL or is this explicitly used for the
sole purpose??
Q: what about the ? and = character. Are they also part of the URL or are
they just there for the server to provide parameters to do something?? What
I mean is that in principle the URL's http://host/index.html#apple and
http://host/index.html#banana are pointing to the same file. Does the same
apply for the above mentioned and perhaps other characters??

Kind regards, Darius Blaszijk
 
S

SpaceGirl

Darius said:
Hi there,

I started recently with my first HTML. I'm now busy with an app. that parses
a HTML file and extracts some of the data. This question's I have concern
the extraction of an URL. Please be so kind to look the questions through
and comment. There are a lot of them, and you don't need to answer
everything at the same time. Some questions will be obvious, and some not,
but I hope I'll learn from them.

As I understand I need the following;

<BASE> href indicates the base path + host of later found URL's
Q: is HTML case sensitive?? Sometimes href is upcase sometimes locase?!?!

XHTML is case sensitive. HTML is only case sensitive when using links or
references to documents that are sat on a case sensitive server.
Q: can <BASE> occur more than once in a file and can it occur at the end?

No, and no.
Q can the BASE URI be http//host/index.html?? Which means that when further
links occur I ned to strip index.html and add the two together??

That doesn't make sense.

<a is element which contains URL. In the <a element an <b element can occur
which is the title of the URL

Q: Fragment identifier is '#', can this character also occur in an URL?? In
other words can it be also part of an URL or is this explicitly used for the
sole purpose??

It can be used in an url (http://www.mysite.com/index.asp#myref)
Q: what about the ? and = character. Are they also part of the URL or are
they just there for the server to provide parameters to do something?? What
I mean is that in principle the URL's http://host/index.html#apple and
http://host/index.html#banana are pointing to the same file. Does the same
apply for the above mentioned and perhaps other characters??

There indicate bookmarks within the page. You can have as many bookmarks
as you like in a page. So, http://host/index.html#cat would jump the the
like marked in the HTML (<p><a name="cat"></a>cat</p>).

http://host/index.asp?something=abc

This indicates that a parameter (something) has been passed with the
value "abc". You can send a lot of parameters easily, using "&" to
seperate each para -

http://host/index.asp?something=abc&somethingelse=123&somethingmore=xyz

to process these parameters, you need to use some sort of server side
language (such as ASP, JPS or PHP).

Kind regards, Darius Blaszijk


--


x theSpaceGirl (miranda)

# lead designer @ http://www.dhnewmedia.com #
# remove NO SPAM to email, or use form on website #
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top