XML, JDom and regular expressions ...

P

Pimousse

Hi everybody,

I'm helping a friend with a parsing problem using JDom. As we're latin
people ;), we have in our xml files characters like "é" or "à".
That far, no problem.

But we have to modifiing XML files using alphabets that don't support
these characters, such as UTF-8 (but non only this one). In fact, our
company re-used xml files previously developped by another company ("not
latin"). Inserting data was not a problem, but today modifiing isn't so
easy. And with this configuration, JDom throws exception, even if we add
these lines :

Format format=Format.getPrettyFormat();
format.setEncoding("iso-8859-1");

So we're not able to generate a DOM document ! And so we can't modify
our documents !


Then we decided to modify the line :
<?xml version="1.0" encoding="utf-8"?> (for example)
by something like :
<?xml version="1.0" encoding="iso-8859-1"?>

But as we can't know before reading the file the alphabet type, we
decided to use a regular expression.

As I'm more skilled in PHP than in Java, I developped that pattern in
PHP (tested and working) :
(<\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\?>)
that should be replaced by :
\\1iso-8859-1\\3

But I don't succeed in translating it in Java.
Using that syntax :

Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
string = m.replaceAll(replace);

where
pattern = "(<\\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\\?>)";
replace = "\\1iso-8859-1\\3";

does not work ...

Can someone help me to translate my pattern from a PHP syntax to a Java
syntax ?

Thanks.

Ps : I already read
http://java.sun.com/docs/books/tutorial/extra/regex/index.html .... ;)
 
M

Mike Lischke

Pimousse wrote
Then we decided to modify the line :
<?xml version="1.0" encoding="utf-8"?> (for example)
by something like :
<?xml version="1.0" encoding="iso-8859-1"?>

Why on earth would you want to switch from Unicode to ANSI when dealing with several languages? This is exactly the wrong direction unless you are forced to use ANSI (latin-1 or whatever). I recommend to use utf-8 instead. It is a bit tricky to store a file with JDOM in UTF-8 but nonetheless possible and works like a charm if you know how.

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top