URL context constructor broken?

R

Roedy Green

There in a URL constructor that takes an context url and a relative
String. It is supposed to merge them into a new URL. It mostly
works, but fails on this list of real world examples with a
MalformedURLException. I am trying to analyse the location field in an
http redirect and apply it to the original URL.

Are these bugs, or "features"?

http://new.myfonts.com/fonts/linotype/frutiger/ +
/fonts/adobe/frutiger/
http://new.myfonts.com/fonts/linotype/helvetica/ +
/fonts/adobe/helvetica/
http://new.myfonts.com/fonts/storm/baskerville-ten/ +
/fonts/storm/baskerville-original-pro/
http://new.myfonts.com/search?search[text]=britannic +
/search/britannic/
http://new.myfonts.com/search?search[text]=century+old+style +
/search/century+old+style/
http://new.myfonts.com/search?search[text]=garamond +
/search/garamond/
http://new.myfonts.com/search?search[text]=goudy + /search/goudy/
http://new.myfonts.com/search?search[text]=zapfino +
/search/zapfino/
http://www.allposters.com/-sp/Earth-by-Day-Posters_i357956_.htm
?aid=1025105019 + /-st/World-Maps-Posters_c13013_.htm
http://www.allposters.com/-sp/Grevy-s-Zebras-Posters_i4470723_.htm
?aid=1025105019 + /-st/Zebras-Color-Photography-Posters_c55788_.htm
http://www.allposters.com/-sp/Moose-at-Water-s-Edge-Posters_i377585_.htm
?aid=1025105019 + /-st/Robert-Bateman-Posters_c59250_.htm
http://www.allposters.com/-sp/Orange-Zinnia-Posters_i1353154_.htm
?aid=1025105019 + /-st/Michael-Bird-Posters_c66523_.htm
http://www.allposters.com/-sp/Solar-System-Posters_i386802_.htm
?aid=1025105019 + /-st/Cubicle-Decoration-Posters_c8122_.htm
http://www.allposters.com/-sp/The-Ten-Commandments-Posters_i925164_.htm
?aid=1025105019 + /-st/Ten-Commandments-Posters_c19664_.htm
http://www.allposters.com/-sp/Thomas-Wyndham-Spaniel-III-Posters_i2540602_.htm
?aid=1025105019 + /-st/Spaniel-Posters_c19306_.htm
http://bg-soft.net/Product/SubmitProduct.aspx + /product/submit/
http://codecharge.com/ + index2.php
http://cws.internet.com/news.html + /category/2248-1-d.htm
http://digital-ear.com/ + /digital-ear/index.asp
http://downloadsphere.com/developer/submit.php +
/developer/submit.php?a4e6dd80
http://english.aljazeera.net/NR/exeres/629C6560-DEE0-477D-99A2-1B6A26A9A409.htm
+ /news/middleeast/2006/11/2008525125925700309.html
http://english.aljazeera.net/NR/exeres/A3D53FDC-7273-4B2D-B04A-D44906D96653.htm
+ /
http://english.aljazeera.net/NR/exeres/B2FB50C8-E8B6-4841-A482-9A51AC964BFA.htm
+ /
http://find-software.net/AccountManager.php?tag=softman + LogIn.php
http://home.airindia.in/ +
Http://home.AirIndia.in/SBCMS/WebPages/Home.aspx


http://host-tracker.com/ + /?sdm=1
http://paddb.com/UploadPad.aspx +
/Register.aspx?ReturnUrl=%2fUploadPad.aspx
http://pm.gc.ca/default.asp?Language=E&Page=home + index.asp
http://softsland.com/submit.php + /submit.php?a4e6dd80
http://software.ivertech.com/developer/ +
/developerLogin.aspx?ReturnUrl=%2fdeveloper%2fdefault.aspx
http://support.mozillamessaging.com/ + /en-US/kb/
http://support.mozillamessaging.com/kb/Keyboard+shortcuts +
/en-US/kb/Keyboard+shortcuts
http://support.mozillamessaging.com/kb/Thunderbird+FAQ +
/en-US/kb/Thunderbird+FAQ
http://winwarelinks.com/submit.htm +
index.php?t=page_error&error_code=error404
http://www.airberlin.com/ + /prepage.php
http://www.airtran.com/ + /Home.aspx
http://www.alitalia.com/ + /EN_EN/splash-page.aspx
http://www.altiris.com/products/carboncopysol/ + /
http://www.altria.com/ + /en/cms/Home/default.aspx
http://www.amtrak.com/ +
/servlet/ContentServer?pagename=Amtrak/HomePage
http://www.bitenova.org/index.php?idx=upload + /index.html
http://www.blocquebecois.org/ + /accueil.aspx
http://www.braun.com/global/household/food-preparation/hand-blenders/multiquick-classic.html
+
/global/household/food-preparation/multiquick-hand-blenders/multiquick-7.html
http://www.britishairways.com/ +
/travel/globalgateway.jsp/global/public
http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521853249 +
/0521853249
http://www.cellphoto.net/ + /?a4e6dd80
http://www.certum.pl/english/eng/services/ts/index.html +
/certum/main.xml
http://www.clearwave.com/ + main.php
http://www.compuware.com/optimalj + /page_not_found.asp
http://www.convert-djvu-to-pdf.com/ + /?a4e6dd80
http://www.cookiecentral.com/index.html + /
http://www.creative.com/ +
/geoLocator/checkip.asp?sDestURL=/welcome.asp
http://www.crowdgravity.com/AF_CG/ViewCounter/index.cfm?ap_id=10014&bIsAffiliate=0
+ /AF_CG/gfx/counting/trans.gif
http://www.downloadsbay.com/dom/submit.aspx +
/default.pk?aspxerrorpath=/dom/submit.aspx
http://www.eclipse.org/downloads/index_topic.php + /downloads/
http://www.eclipse.org/emf/ + /modeling/emf/
http://www.elal.co.il/ + /ELAL/English/States/Canada/
http://www.equifax.com/home/ + /home/en_us
http://www.filemapper.com/submit_your_software.php +
vendor/submit_your_software.php
http://www.freescale.com/codewarrior +
/webapp/sps/site/homepage.jsp?code=CW_HOME&tid=vancodewarrior
http://www.georgia.gov/ +
/00/home/0,2061,4802,00.html;jsessionid=C1D012269B9FA143E4701464C41E9123
http://www.gouv.qc.ca/ + /portail/quebec/pgs/commun/
http://www.greenpeace.org/canada/en/campaigns/greatbear +
/canada/en/campaigns/greatbear/
http://www.greenpeace.org/canada/en/recent/climate_chaos_copenhagen +
/canada/en/recent/climate_chaos_copenhagen/
http://www.greenpeace.org/canada/en/recent/greenpeace-demands-action-on-t
+ /canada/en/recent/greenpeace-demands-action-on-t/
http://www.greenpeace.org/canada/en/recent/greenpeace-welcomes-president-obama/social-media
+ /canada/en/recent/greenpeace-welcomes-president-obama/social-media/
http://www.greenpeace.org/canada/en/recent/petropolis_film_festival +
/canada/en/recent/petropolis_film_festival/
http://www.greenpeace.org/canada/en/recent/probably-no-cod +
/canada/en/recent/probably-no-cod/
http://www.greenpeace.org/canada/en/recent/stopstarsands3_action +
/canada/en/recent/stopstarsands3_action/
http://www.greenpeace.org/canada/en/recent/xerox-stop-destroying +
/canada/en/recent/xerox-stop-destroying/
http://www.greenpeace.org/international/press/releases/kyoto-protocol-moves-ahead-as
+ /international/press/releases/kyoto-protocol-moves-ahead-as/
http://www.greenpeace.org/usa/campaigns/global-warming-and-energy/exxon-secrets
+ /usa/campaigns/global-warming-and-energy/exxon-secrets/
http://www.greyhound.ca/ + /home
http://www.gulf-daily-news.com/Story.asp?Article=113244&Sn=WORL&IssueID=28069
+ NewsDetails.aspx?storyid=113244
http://www.iana.org/cctld/cctld-whois.htm + /domains/root/db/
http://www.iana.org/gtld/gtld.htm + /domains/root/db/
http://www.iana.org/root-whois/int.htm + /domains/root/db/int.html
http://www.icq.com/ + en.html
http://www.incredimail.com/ + /english/splash.aspx
http://www.inetsoftware.de/products/crystalclear/default.htm +
/products/crystal-clear/index
http://www.inetsoftware.de/products/jdbc/suite/gate3/default.asp +
/products/jdbc-driver/suite/gate3/index

http://www.interactivebrokers.com/ + ibg/main.php
http://www.j2eebrain.com/ + /wp-admin/install.php
http://www.japhar.org/ + /?a4e6dd80
http://www.javarules.com/ + /?a4e6dd80
http://www.jguru.com/jguru/faq/ + /faq/
http://www.jmadden.info/Arrows.htm + /Arrows.htm?a4e6dd80
http://www.judycollins.com/ + index1.php
http://www.keytronic.com/ + /home/index.html
http://www.kickinghorsecoffee.com/ + /en
http://www.laweekly.com/news/news/the-outing/1322/ + /
http://www.manscat.com/ + forum/lobby.php
http://www.martybunch.com/ + /?01db9720
http://www.matterform.com/qbullets/legend.html +
/mac_software/free_web_icons/legend.html
http://www.mcafee.com/ + /us/index.html
http://www.mcp.com/ + maindesign_2.asp?aid=20511&gid=9597
http://www.melitta.ca/ + /Content/main.aspx
http://www.microsoft.com/ + /en/us/default.aspx
http://www.microsoft.com/downloads/details.aspx?familyid=6ebcfad9-d3f5-4365-8070-334cd175d4bb
+
/downloads/details.aspx?familyid=6ebcfad9-d3f5-4365-8070-334cd175d4bb&displaylang=en
http://www.microsoft.com/hardware/mouseandkeyboard/productdetails.aspx?pid=080
+ /hardware/mouseandkeyboard/default.mspx
http://www.mininova.org/upload/ + /distribution
http://www.myfonts.com/category/index.html +
/category/myfonts/index.html
http://www.myspace.com/video/vid/661292 + /video?vs=11
http://www.newsindia-times.com/ + /NewsIndiaTimes/index.htm
http://www.nlsearch.com/ + home.php
http://www.nvidia.com/page/gpumobo_6100-430_features.html +
/page/gpu_mobo_features.html
http://www.oo-software.com/home/en/products/oodefrag/upgrade/index.html
+ /home/en/products/oodefrag/index.html
http://www.opentext.com/microstar + /error/404-error.html
http://www.ozonecondoms.com/welcome.php + index.php?h=1
http://www.parallels.com/products/virtuozzo/ + /en/products/pvc46/
http://www.pentagon.net/ + /?a4e6dd80
http://www.pgdp.net/ + /c/
--
Roedy Green Canadian Mind Products
http://mindprod.com
A short order cook is a master of multitasking. Every movement is
optimised from years of practice. Yet when a computer executes a
multitasking program, it approaches the task as if for the first time.
 
J

Joshua Cranmer

The key seems to be in the phrase «f the spec's path component begins
with a slash character "/"». How does the constructor decide what is the
"path component" of the spec? Maybe it thinks that the authority is
"fonts". If so it might be trying to construct the URL
"http://fonts/adobe/frutiger/", which does contain a scheme but looks
pretty malformed to me.


URLs have a basic structure:
<scheme>://<authority>/<path>

[Technically the `/' is also part of the path].

The exact form of <authority> is dictated by RFC 3986, so I won't repeat
it here.
 
A

Abu Yahya

There in a URL constructor that takes an context url and a relative
String. It is supposed to merge them into a new URL. It mostly
works, but fails on this list of real world examples with a
MalformedURLException. I am trying to analyse the location field in an
http redirect and apply it to the original URL.

Are these bugs, or "features"?

http://new.myfonts.com/fonts/linotype/frutiger/ +
/fonts/adobe/frutiger/

For me, (at least some of) the URLs work well. For example, I get the
following output for the code snippet pasted below:

======Output=========
http://new.myfonts.com/fonts/adobe/frutiger/
http://www.filemapper.com/vendor/submit_your_software.php
---------------------

======Code=========

URL url1 = new URL(new URL(
"http://new.myfonts.com/fonts/linotype/frutiger/"
), "/fonts/adobe/frutiger/" );

System.out.println(url1);


URL url2 = new URL(new URL(
"http://www.filemapper.com/submit_your_software.php"
), "vendor/submit_your_software.php");

System.out.println(url2);
 
S

Steven Simpson

There in a URL constructor that takes an context url and a relative
String. It is supposed to merge them into a new URL. It mostly
works, but fails on this list of real world examples with a
MalformedURLException.

You might prefer to convert to a java.net.URI and use its
resolve(String) method.
I am trying to analyse the location field in an
http redirect and apply it to the original URL.

According to this, Location must be an absoluteURI:

<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30>
 
R

Roedy Green

Which version of the JDK are you using? (I tried this with Sun's 1.5,
and IBM's 1.6)

You can see this with

try
{

URL originalURL = new URL( "xxxx" );
String newLocation = "yyyy";
String merged = new URL( originalURL, newLocation ).toString();
out.println( originalURL + " " + newLocation + " " + merged );
}
catch ( MalformedURLException e )

I am using 1.6.0_23 from Oracle.

It does lots of them correctly. If you are interested I could collect
a set of real world examples it does properly. I suspect one pattern
it does not handle correctly is:
http://domain.com/ + /xxxx

It may also get inhibited if there is a ?xxx on either piece.

I read the RFC, looked the code, and studied the JavaDoc. Since there
are no examples, the matter of what SHOULD it do, still seems
ambiguous.

For now, The best approach is to write a replacement that is quite
forgiving. Sometimes the goofs even forget the lead / on the
newlocation string. It makes no sense unless you presume one. Browsers
seem to sort this out.

Then I need to write a SSCCE that demonstrates each of the
non-functioning patterns in simplest form. I will leave it up to
Oracle to decide if they intended it to be broken for the purpose I am
trying to use it. It is the sort of bug that fixing might break
existing code.

--
Roedy Green Canadian Mind Products
http://mindprod.com
A short order cook is a master of multitasking. Every movement is
optimised from years of practice. Yet when a computer executes a
multitasking program, it approaches the task as if for the first time.
 
R

Roedy Green

According to this, Location must be an absoluteURI:

<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30>


There are a handful without the lead /
e.g. http://www.nlsearch.com/ + home.php

The normal case is you get a lead / on the forwarding string.

http://bg-soft.net/Product/SubmitProduct.aspx + /product/submit/

Maybe it is upset by trailing slashes.
--
Roedy Green Canadian Mind Products
http://mindprod.com
A short order cook is a master of multitasking. Every movement is
optimised from years of practice. Yet when a computer executes a
multitasking program, it approaches the task as if for the first time.
 
L

Lew

Lew said:
The key seems to be in the phrase «f the spec's path component begins
with a slash character "/"». How does the constructor decide what is the
"path component" of the spec? Maybe it thinks that the authority is
"fonts". If so it might be trying to construct the URL
"http://fonts/adobe/frutiger/", which does contain a scheme but looks
pretty malformed to me.


Joshua said:
URLs have a basic structure:
<scheme>://<authority>/<path>

[Technically the `/' is also part of the path].

The exact form of <authority> is dictated by RFC 3986, so I won't repeat it here.

Yeah, I was reading that and took it all into account.

What's not clear is what the URL constructor does with the spec argument,
which needn't contain an authority. In turn, the definition of "authority"
that you didn't repeat here does not require periods or multiple components,
so the constructor could consider "fonts" as an authority despite that the
intent was to specify it as part of the path. At least as I read the RFC and
Javadocs.

The thing is that the constructor's spec argument needn't be an entire URL, so
I see ambiguity there. Regardless, the only documented reason for the
constructor to throw 'MalformedURLException' is if the scheme is absent, which
it isn't because it's supposed to come from the first constructor argument.

As Roedy said,
Since there are no examples,
the matter of what [it] SHOULD it do, still seems ambiguous.
 
R

Roedy Green

There in a URL constructor that takes an context url and a relative
String. It is supposed to merge them into a new URL. It mostly
works, but fails on this list of real world examples with a
MalformedURLException. I am trying to analyse the location field in an
http redirect and apply it to the original URL.

I created the following SSCCE to explore this, and oddly one of the
URLS that failed is now behaving. I tried both under Java.exe and Jet.

I am going to do more tests.

/*
* @(#)TestURL.java
*
* Summary: Test the two-parameter URL constructor.
*
* Copyright: (c) 2010 Roedy Green, Canadian Mind Products,
http://mindprod.com
*
* Licence: This software may be copied and used freely for any
purpose but military.
* http://mindprod.com/contact/nonmil.html
*
* Requires: JDK 1.6+
*
* Created with: IntelliJ IDEA IDE.
*
* Version History:
* 1.0 2010-12-30 - initial version
*/
package com.mindprod.example;

import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;

import static java.lang.System.out;

/**
* Test the two-parameter URL constructor, and the equivalent
URI.resolve.
*
* @author Roedy Green, Canadian Mind Products
* @version 1.0 2010-12-30 - initial version
* @since 2010-12-30
*/
public final class TestURL
{
// -------------------------- STATIC METHODS
--------------------------

/**
* tes the URL constructor with two parms, and the equivalent
URI.resolve.
*
* @param originalURLString original context URL
* @param location partial URL to override the original
* @param expectedURLString What we expect the merged result to
be.
* @param notes notes
*/
private static void test( final String originalURLString,
final String location,
final String expectedURLString,
final String notes )
{
testURL( originalURLString, location, expectedURLString, notes );
testURI( originalURLString, location, expectedURLString, notes );
}

/**
* test URI.resolve.
*
* @param originalURIString original context URL
* @param location partial URL to override the original
* @param expectedURIString What we expect the merged result to
be.
* @param notes notes
*/
private static void testURI( final String originalURIString,
final String location,
final String expectedURIString,
final String notes )
{
out.println( "\n>>>testing URI.resolve: ["
+ originalURIString
+ "] ["
+ location
+ "] " + notes );

final URI originalURI;
try
{
originalURI = new URI( originalURIString );
}
catch ( URISyntaxException e )
{
out.println( " original URI failed with URISyntaxException"
);
return;
}

final URI mergedURI = originalURI.resolve( location );
final String mergedURIString = mergedURI.toString();
if ( mergedURIString.equals( expectedURIString ) )
{
out.println( " OK: [" + mergedURIString + "]" );
}
else
{
out.println( " unexpected resolved URI: ["
+ mergedURIString
+ "] expected URI: ["
+ expectedURIString
+ "]" );
}
}

/**
* testURL the URL constructor with two parms.
*
* @param originalURLString original context URL
* @param location partial URL to override the original
* @param expectedURLString What we expect the merged result to
be.
* @param notes notes
*/
private static void testURL( final String originalURLString,
final String location,
final String expectedURLString,
final String notes )
{
out.println( "\n>>>testing URL constructor: ["
+ originalURLString
+ "] ["
+ location
+ "] "
+ notes );

final URL originalURL;
try
{
originalURL = new URL( originalURLString );
}
catch ( MalformedURLException e )
{
out.println( " original URL failed with
MalformedURLException" );
return;
}

final URL resolvedURL;
try
{
resolvedURL = new URL( originalURL, location );
}
catch ( MalformedURLException e )
{
out.println( " resolve failed with MalformedURLException" );
return;
}
final String resolvedURLString = resolvedURL.toString();
if ( resolvedURLString.equals( expectedURLString ) )
{
out.println( " OK: [" + resolvedURLString + "]" );
}
else
{
out.println( " unexpected resolved URL: ["
+ resolvedURLString
+ "] expected URL: ["
+ expectedURLString
+ "]" );
}
}

// --------------------------- main() method
---------------------------

/**
* Test the URL constructor. It is producing what I consider
anomalous results.
*
* @param args not used
*/
public static void main( final String[] args )
{
test( "http://mindprod.com/jgloss.html",
"/jgloss/jgloss.html",
"http://mindprod.com/jgloss/jgloss.html",
"common redirect pattern" );

test( "http://mindprod.com/",
"/index.html",
"http://mindprod.com/index.html",
"common redirect pattern" );


test( "http://new.myfonts.com/fonts/linotype/frutiger/",
"/fonts/adobe/frutiger/",
"http://new.myfonts.com/fonts/adobe/frutiger/",
"real world redirect pattern, trailing slashes" );
}
}
--
Roedy Green Canadian Mind Products
http://mindprod.com
A short order cook is a master of multitasking. Every movement is
optimised from years of practice. Yet when a computer executes a
multitasking program, it approaches the task as if for the first time.
 
R

Roedy Green

There in a URL constructor that takes an context url and a relative
String. It is supposed to merge them into a new URL. It mostly
works, but fails on this list of real world examples with a
MalformedURLException. I am trying to analyse the location field in an
http redirect and apply it to the original URL.

Mystery solved. In the original program I was feeding the wrong
variable to new URL( URL x, String y ). The odd thing is the code
worked as well as it did.

The irony is I split a variable into two for clarity and used the
wrong version in one spot.
--
Roedy Green Canadian Mind Products
http://mindprod.com
A short order cook is a master of multitasking. Every movement is
optimised from years of practice. Yet when a computer executes a
multitasking program, it approaches the task as if for the first time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top