XML, JDom and regular expressions ...

Discussion in 'Java' started by Pimousse, Jul 9, 2004.

  1. Pimousse

    Pimousse Guest

    Hi everybody,

    I'm helping a friend with a parsing problem using JDom. As we're latin
    people ;), we have in our xml files characters like "é" or "à".
    That far, no problem.

    But we have to modifiing XML files using alphabets that don't support
    these characters, such as UTF-8 (but non only this one). In fact, our
    company re-used xml files previously developped by another company ("not
    latin"). Inserting data was not a problem, but today modifiing isn't so
    easy. And with this configuration, JDom throws exception, even if we add
    these lines :

    Format format=Format.getPrettyFormat();
    format.setEncoding("iso-8859-1");

    So we're not able to generate a DOM document ! And so we can't modify
    our documents !


    Then we decided to modify the line :
    <?xml version="1.0" encoding="utf-8"?> (for example)
    by something like :
    <?xml version="1.0" encoding="iso-8859-1"?>

    But as we can't know before reading the file the alphabet type, we
    decided to use a regular expression.

    As I'm more skilled in PHP than in Java, I developped that pattern in
    PHP (tested and working) :
    (<\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\?>)
    that should be replaced by :
    \\1iso-8859-1\\3

    But I don't succeed in translating it in Java.
    Using that syntax :

    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(string);
    string = m.replaceAll(replace);

    where
    pattern = "(<\\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\\?>)";
    replace = "\\1iso-8859-1\\3";

    does not work ...

    Can someone help me to translate my pattern from a PHP syntax to a Java
    syntax ?

    Thanks.

    Ps : I already read
    http://java.sun.com/docs/books/tutorial/extra/regex/index.html .... ;)
     
    Pimousse, Jul 9, 2004
    #1
    1. Advertising

  2. Pimousse

    Mike Lischke Guest

    Pimousse wrote

    >Then we decided to modify the line :
    ><?xml version="1.0" encoding="utf-8"?> (for example)
    >by something like :
    ><?xml version="1.0" encoding="iso-8859-1"?>


    Why on earth would you want to switch from Unicode to ANSI when dealing with several languages? This is exactly the wrong direction unless you are forced to use ANSI (latin-1 or whatever). I recommend to use utf-8 instead. It is a bit tricky to store a file with JDOM in UTF-8 but nonetheless possible and works like a charm if you know how.

    Mike
    --
    www.soft-gems.net
     
    Mike Lischke, Jul 9, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wendy S
    Replies:
    1
    Views:
    6,366
    Darren Davison
    Aug 5, 2003
  2. Bernd Oninger
    Replies:
    4
    Views:
    12,320
    GIMME
    Jun 21, 2004
  3. Tinker
    Replies:
    4
    Views:
    5,316
    Harry Bosch
    Oct 9, 2005
  4. Bernd Oninger
    Replies:
    3
    Views:
    2,879
    GIMME
    Jun 21, 2004
  5. Noman Shapiro
    Replies:
    0
    Views:
    234
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page