SaX,, Xerces: parse() and IOException caused by wrong URI-encoding ?

Discussion in 'Java' started by Pascal Lagass?, Feb 26, 2004.

  1. Hi,

    Environment: Java 1.4.1_02
    OS: Windows 2000
    XML-Parse: Xerces-J 2.6.0

    The following snipplet functions without problem with a filename, only
    with standard ASCII characters (like: file.xml ...).

    The XML processing job is done as it should be:

    ------------------------ SNIPPLET --------------------------
    import org.apache.xerces.parsers.SAXParser;
    ....

    public class mySaX {

    private mySaxContentHandler m_wch;

    public void parse(String uri) {
    ContentHandler contentHandler = new mySaxContentHandler();

    try {
    XMLReader parser = new SAXParser();
    parser.setContentHandler(contentHandler);
    m_wch = (mySaxContentHandler) contentHandler;
    parser.parse(uri);
    ...
    } catch (IOException e) {
    System.out.println("Fehler beim Lesen des URI: " + uri + " "+
    e.getMessage());
    } catch (SAXException e) {
    System.out.println("Fehler beim Parsen: " + e.getMessage());
    }
    }
    ------------------------END SNIPPLET --------------------------

    But, when I am using a filename with a name like "para_grisé.xml",
    then I got:

    ------------------------- ERROR MESSAGE -------------------
    Fehler beim Lesen des URI: para_grisÚ.xml unknown protocol: e
    ------------------------- END ERROR MESSAGE ---------------

    I never encountered a problem with this before, when using a File
    Object or when displaying the name with a SWING component.

    Does anybody have encountered the same problem before? How should I
    reencode the URI for the parsing?

    Thank you very much in advance,

    /Pascal Lagassé

    Kösel GmbH & Co. KG - Über 400 Jahre Bücher mit System
    Wartenseestraße 11 87435 Kempten
    http://www.koeselbuch.de mailto:p
     
    Pascal Lagass?, Feb 26, 2004
    #1
    1. Advertising

  2. Pascal Lagass?

    Daniel Guest

    Re: SaX,, Xerces: parse() and IOException caused by wrong URI-encoding?

    It is simple.
    Use only a - z , A - Z , 1 - 0 and ".". No special charakter. All other
    are not allowed.

    Pascal Lagass? wrote:
    > Hi,
    >
    > Environment: Java 1.4.1_02
    > OS: Windows 2000
    > XML-Parse: Xerces-J 2.6.0
    >
    > The following snipplet functions without problem with a filename, only
    > with standard ASCII characters (like: file.xml ...).
    >
    > The XML processing job is done as it should be:
    >
    > ------------------------ SNIPPLET --------------------------
    > import org.apache.xerces.parsers.SAXParser;
    > ...
    >
    > public class mySaX {
    >
    > private mySaxContentHandler m_wch;
    >
    > public void parse(String uri) {
    > ContentHandler contentHandler = new mySaxContentHandler();
    >
    > try {
    > XMLReader parser = new SAXParser();
    > parser.setContentHandler(contentHandler);
    > m_wch = (mySaxContentHandler) contentHandler;
    > parser.parse(uri);
    > ...
    > } catch (IOException e) {
    > System.out.println("Fehler beim Lesen des URI: " + uri + " "+
    > e.getMessage());
    > } catch (SAXException e) {
    > System.out.println("Fehler beim Parsen: " + e.getMessage());
    > }
    > }
    > ------------------------END SNIPPLET --------------------------
    >
    > But, when I am using a filename with a name like "para_grisé.xml",
    > then I got:
    >
    > ------------------------- ERROR MESSAGE -------------------
    > Fehler beim Lesen des URI: para_grisÚ.xml unknown protocol: e
    > ------------------------- END ERROR MESSAGE ---------------
    >
    > I never encountered a problem with this before, when using a File
    > Object or when displaying the name with a SWING component.
    >
    > Does anybody have encountered the same problem before? How should I
    > reencode the URI for the parsing?
    >
    > Thank you very much in advance,
    >
    > /Pascal Lagassé
    >
    > Kösel GmbH & Co. KG - Über 400 Jahre Bücher mit System
    > Wartenseestraße 11 87435 Kempten
    > http://www.koeselbuch.de mailto:p
     
    Daniel, Feb 28, 2004
    #2
    1. Advertising

  3. > It is simple.
    > Use only a - z , A - Z , 1 - 0 and ".". No special charakter. All other
    > are not allowed.
    >


    Thanks for your answer, but unfortunately, I do not have the control
    over the filenames. Nevertheless, I have found the answer at:

    www.w3.org/International/O-URL-code.html

    where there is a java class (URLUTF8Encoder) to encode URL as UTF-8.

    Input:
    para_grisÚ.xml

    After UTF8-Encoding :
    para_gris%c3%a9.xml

    Snipplet:
    // ---------------------------------------------------------
    File uriFile = new File (uri);
    String encodedUri = uriFile.getParent() + File.separator +
    URLUTF8Encoder.encode(uriFile.getName());
    parser.parse(encodedUri);
    // ---------------------------------------------------------

    Best regards to you all, Java Developers!

    /Pascal Lagassé

    Kösel GmbH & Co. KG - Über 400 Jahre Bücher mit System
    Wartenseestraße 11 87435 Kempten
    http://www.koeselbuch.de mailto:p
     
    Pascal Lagass?, Mar 1, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Harris
    Replies:
    0
    Views:
    6,414
    Simon Harris
    May 10, 2005
  2. John Smith

    Xerces SAX encoding problems

    John Smith, Sep 21, 2005, in forum: Java
    Replies:
    1
    Views:
    2,026
    Roedy Green
    Sep 21, 2005
  3. cvissy
    Replies:
    0
    Views:
    623
    cvissy
    Nov 16, 2004
  4. Nobody
    Replies:
    3
    Views:
    975
    Joseph Kesselman
    May 9, 2006
  5. Turbo
    Replies:
    2
    Views:
    168
    Turbo
    Nov 1, 2006
Loading...

Share This Page