how can this happen?

J

jahhaj

Here's the message I get from a PatternSyntaxException

Unknown character category {Digit} near index 8
\p{Digit}{1,2}
^

How can this be? {Digit} is a valid character category, it's in the
javadoc, it's even in the source code. (Incidentally the single \ is
how java reports the error, in the source I have "\\p{Digit}{1,2}")

I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

john
 
R

Robert Klemme

jahhaj said:
Here's the message I get from a PatternSyntaxException

Unknown character category {Digit} near index 8
\p{Digit}{1,2}
^

How can this be? {Digit} is a valid character category, it's in the
javadoc, it's even in the source code. (Incidentally the single \ is
how java reports the error, in the source I have "\\p{Digit}{1,2}")

I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

Works for me. Also 1.4.2._06, OS is Windows 2k Server, no app server.

robert
 
J

jahhaj

Robert said:
Works for me. Also 1.4.2._06, OS is Windows 2k Server, no app server.

robert

Works for me as well when I run as a standalone Java app, baffling.

john
 
R

Roedy Green

Unknown character category {Digit} near index 8
\p{Digit}{1,2}

you did not post your code so I wrote this SSCCE

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* snippet ™ to demonstrate a problem with regex
*/
public class Regex4
{
private static final Pattern p =
Pattern.compile("(\\p{Digit}){1,2}");

/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args )
{

// format 1
Matcher m = p.matcher("89");

m.matches();
int count = m.groupCount() + 1;

// display groups found
for ( int i=0; i<count; i++ )
{
System.out.println(m.group(i));
}

}
}

When I ran it on JDK 1.5.0_05
it gave the following results:
89
9
 
J

jahhaj

Roedy said:
you did not post your code so I wrote this SSCCE

My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine. My interest is in suggestions for what could
possibily be going wrong for the JVM not to recognise a perfectly
standard character category.

If you look at the source for Pattern then the character categories are
looked up in a simple map, in a single place in the code. How could
this go wrong? That's my question.
 
R

Robert Klemme

jahhaj said:
My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine. My interest is in suggestions for what could
possibily be going wrong for the JVM not to recognise a perfectly
standard character category.

If you look at the source for Pattern then the character categories
are looked up in a simple map, in a single place in the code. How
could this go wrong? That's my question.

Maybe some wired threading or class loading issue... Just a wild guess.

robert
 
C

Chris Uppal

jahhaj said:
My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine. My interest is in suggestions for what could
possibily be going wrong for the JVM not to recognise a perfectly
standard character category.

If you look at the source for Pattern then the character categories are
looked up in a simple map, in a single place in the code. How could
this go wrong? That's my question.

The only thing I can think of is that your code is somehow picking up a
different implementation of Pattern when it's runing in your J2EE environment.
Might be worth scanning all the directories, JARs, etc, to see if there are any
candidates for confusion.

-- chris
 
J

jahhaj

Maybe some wired threading or class loading issue... Just a wild guess.

robert

Hmm, I'm no java expert but if you look at the code in Pattern you see
this


private Node retrieveCategoryNode(String name) {
if (categories == null) {
int cns = categoryNodes.length;
categories = new HashMap((int)(cns/.75) + 1);
for (int x=0; x<cns; x++)
categories.put(categoryNames[x], categoryNodes[x]);
}
Node n = (Node)categories.get(name);
if (n != null)
return n;

return familyError(name, "Unknown character category {");
}

categories is a HashMap of the known categories. It's a static member.
The thing that strikes me is that the creation of the map is not
synchronised, so is it possible that one thread could be in the process
of populating the categories when another thread comes along and uses
the part populated map?

As I say, I'm no expert in java. Could someone with more expertise
confirm if this is plausible?
 
C

Chris Uppal

jahhaj said:
Hmm, I'm no java expert but if you look at the code in Pattern you see
this


private Node retrieveCategoryNode(String name) {
if (categories == null) {
int cns = categoryNodes.length;
categories = new HashMap((int)(cns/.75) + 1);
for (int x=0; x<cns; x++)
categories.put(categoryNames[x], categoryNodes[x]);
}
Node n = (Node)categories.get(name);
if (n != null)
return n;

return familyError(name, "Unknown character category {");
}

categories is a HashMap of the known categories. It's a static member.

Ugh! Unless there's something subtle that I've missed, that code is completely
broken. It isn't even /nearly/ right (it could at least wait unless the new
HashMap was populated before assigning it to the 'categories' variable -- which
would still be technically incorrect).

That code has been completely replaced in 1.5.0 by something that /is/ correct
(I think).

Can you force the Pattern initalisation to happen early (before any of your
real threads are running) by compiling a throwaway Regex during some sort of
system initialisation phase ?

-- chris
 
T

Thomas G. Marshall

jahhaj coughed up:
My real code is a few lines inside a large J2EE application. I know
that if I extract the code and run it in a different environment then
it will work fine.

Two ideas pulled out of someplace fairly dark:

1. Don't run it in a different environment. Extract it and keep it as much
as possible in the /same/ environment.

2. Don't "extract" it at all. Instead /pair down/ the problem code as much
as you can, possibly by putting in the testing code around the issue, and
keep testing until you remove something and see the problem go away.

This is a technique that works very well to expose many things. Even if
your paired down version ends up looking just like the extracted version you
already attempted, there might be a smidgeon of a detail missing that will
illuminate the problem.

I hope this applies to your issue. YMM(ofcourse)V.
 
R

Roedy Green

did Bea reimplement Regex for speed and simply failed to test
adequately.
If you can get my code into Bea and get it to fail, you can submit it
as a bug report.
 
R

Roedy Green

As I say, I'm no expert in java. Could someone with more expertise
confirm if this is plausible?

Try some code that "warms up" the Pattern class. You might even sleep.

Pattern dummy Pattern.compile("a");

Pattern p = Pattern.compile("(\\p{Digit}){1,2}");
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top