D
Darius Blaszijk
Hi there,
I started recently with my first HTML. I'm now busy with an app. that parses
a HTML file and extracts some of the data. This question's I have concern
the extraction of an URL. Please be so kind to look the questions through
and comment. There are a lot of them, and you don't need to answer
everything at the same time. Some questions will be obvious, and some not,
but I hope I'll learn from them.
As I understand I need the following;
<BASE> href indicates the base path + host of later found URL's
Q: is HTML case sensitive?? Sometimes href is upcase sometimes locase?!?!
Q: can <BASE> occur more than once in a file and can it occur at the end?
Q can the BASE URI be http//host/index.html?? Which means that when further
links occur I ned to strip index.html and add the two together??
<a is element which contains URL. In the <a element an <b element can occur
which is the title of the URL
Q: Fragment identifier is '#', can this character also occur in an URL?? In
other words can it be also part of an URL or is this explicitly used for the
sole purpose??
Q: what about the ? and = character. Are they also part of the URL or are
they just there for the server to provide parameters to do something?? What
I mean is that in principle the URL's http://host/index.html#apple and
http://host/index.html#banana are pointing to the same file. Does the same
apply for the above mentioned and perhaps other characters??
Kind regards, Darius Blaszijk
I started recently with my first HTML. I'm now busy with an app. that parses
a HTML file and extracts some of the data. This question's I have concern
the extraction of an URL. Please be so kind to look the questions through
and comment. There are a lot of them, and you don't need to answer
everything at the same time. Some questions will be obvious, and some not,
but I hope I'll learn from them.
As I understand I need the following;
<BASE> href indicates the base path + host of later found URL's
Q: is HTML case sensitive?? Sometimes href is upcase sometimes locase?!?!
Q: can <BASE> occur more than once in a file and can it occur at the end?
Q can the BASE URI be http//host/index.html?? Which means that when further
links occur I ned to strip index.html and add the two together??
<a is element which contains URL. In the <a element an <b element can occur
which is the title of the URL
Q: Fragment identifier is '#', can this character also occur in an URL?? In
other words can it be also part of an URL or is this explicitly used for the
sole purpose??
Q: what about the ? and = character. Are they also part of the URL or are
they just there for the server to provide parameters to do something?? What
I mean is that in principle the URL's http://host/index.html#apple and
http://host/index.html#banana are pointing to the same file. Does the same
apply for the above mentioned and perhaps other characters??
Kind regards, Darius Blaszijk