how can this happen?

Discussion in 'Java' started by jahhaj, Nov 18, 2005.

  1. jahhaj

    jahhaj Guest

    Here's the message I get from a PatternSyntaxException

    Unknown character category {Digit} near index 8
    \p{Digit}{1,2}
    ^

    How can this be? {Digit} is a valid character category, it's in the
    javadoc, it's even in the source code. (Incidentally the single \ is
    how java reports the error, in the source I have "\\p{Digit}{1,2}")

    I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

    john
     
    jahhaj, Nov 18, 2005
    #1
    1. Advertising

  2. jahhaj wrote:
    > Here's the message I get from a PatternSyntaxException
    >
    > Unknown character category {Digit} near index 8
    > \p{Digit}{1,2}
    > ^
    >
    > How can this be? {Digit} is a valid character category, it's in the
    > javadoc, it's even in the source code. (Incidentally the single \ is
    > how java reports the error, in the source I have "\\p{Digit}{1,2}")
    >
    > I'm using java 1.4.2_06 and running under BEA Weblogic 8.1


    Works for me. Also 1.4.2._06, OS is Windows 2k Server, no app server.

    robert
     
    Robert Klemme, Nov 18, 2005
    #2
    1. Advertising

  3. jahhaj

    jahhaj Guest

    Robert Klemme wrote:
    > jahhaj wrote:
    > > Here's the message I get from a PatternSyntaxException
    > >
    > > Unknown character category {Digit} near index 8
    > > \p{Digit}{1,2}
    > > ^
    > >
    > > How can this be? {Digit} is a valid character category, it's in the
    > > javadoc, it's even in the source code. (Incidentally the single \ is
    > > how java reports the error, in the source I have "\\p{Digit}{1,2}")
    > >
    > > I'm using java 1.4.2_06 and running under BEA Weblogic 8.1

    >
    > Works for me. Also 1.4.2._06, OS is Windows 2k Server, no app server.
    >
    > robert


    Works for me as well when I run as a standalone Java app, baffling.

    john
     
    jahhaj, Nov 18, 2005
    #3
  4. jahhaj

    Roedy Green Guest

    On 18 Nov 2005 04:53:40 -0800, "jahhaj" <>
    wrote, quoted or indirectly quoted someone who said :

    >Unknown character category {Digit} near index 8
    >\p{Digit}{1,2}


    you did not post your code so I wrote this SSCCE

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    /**
    * snippet &trade; to demonstrate a problem with regex
    */
    public class Regex4
    {
    private static final Pattern p =
    Pattern.compile("(\\p{Digit}){1,2}");

    /**
    * test harness
    *
    * @param args not used
    */
    public static void main ( String[] args )
    {

    // format 1
    Matcher m = p.matcher("89");

    m.matches();
    int count = m.groupCount() + 1;

    // display groups found
    for ( int i=0; i<count; i++ )
    {
    System.out.println(m.group(i));
    }

    }
    }

    When I ran it on JDK 1.5.0_05
    it gave the following results:
    89
    9
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Nov 18, 2005
    #4
  5. jahhaj

    jahhaj Guest

    Roedy Green wrote:
    > On 18 Nov 2005 04:53:40 -0800, "jahhaj" <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    > >Unknown character category {Digit} near index 8
    > >\p{Digit}{1,2}

    >
    > you did not post your code so I wrote this SSCCE
    >


    My real code is a few lines inside a large J2EE application. I know
    that if I extract the code and run it in a different environment then
    it will work fine. My interest is in suggestions for what could
    possibily be going wrong for the JVM not to recognise a perfectly
    standard character category.

    If you look at the source for Pattern then the character categories are
    looked up in a simple map, in a single place in the code. How could
    this go wrong? That's my question.
     
    jahhaj, Nov 18, 2005
    #5
  6. jahhaj wrote:
    > Roedy Green wrote:
    >> On 18 Nov 2005 04:53:40 -0800, "jahhaj" <>
    >> wrote, quoted or indirectly quoted someone who said :
    >>
    >>> Unknown character category {Digit} near index 8
    >>> \p{Digit}{1,2}

    >>
    >> you did not post your code so I wrote this SSCCE
    >>

    >
    > My real code is a few lines inside a large J2EE application. I know
    > that if I extract the code and run it in a different environment then
    > it will work fine. My interest is in suggestions for what could
    > possibily be going wrong for the JVM not to recognise a perfectly
    > standard character category.
    >
    > If you look at the source for Pattern then the character categories
    > are looked up in a simple map, in a single place in the code. How
    > could this go wrong? That's my question.


    Maybe some wired threading or class loading issue... Just a wild guess.

    robert
     
    Robert Klemme, Nov 18, 2005
    #6
  7. jahhaj

    Chris Uppal Guest

    jahhaj wrote:

    > My real code is a few lines inside a large J2EE application. I know
    > that if I extract the code and run it in a different environment then
    > it will work fine. My interest is in suggestions for what could
    > possibily be going wrong for the JVM not to recognise a perfectly
    > standard character category.
    >
    > If you look at the source for Pattern then the character categories are
    > looked up in a simple map, in a single place in the code. How could
    > this go wrong? That's my question.


    The only thing I can think of is that your code is somehow picking up a
    different implementation of Pattern when it's runing in your J2EE environment.
    Might be worth scanning all the directories, JARs, etc, to see if there are any
    candidates for confusion.

    -- chris
     
    Chris Uppal, Nov 18, 2005
    #7
  8. jahhaj

    jahhaj Guest

    > >
    > > If you look at the source for Pattern then the character categories
    > > are looked up in a simple map, in a single place in the code. How
    > > could this go wrong? That's my question.

    >
    > Maybe some wired threading or class loading issue... Just a wild guess.
    >
    > robert


    Hmm, I'm no java expert but if you look at the code in Pattern you see
    this


    private Node retrieveCategoryNode(String name) {
    if (categories == null) {
    int cns = categoryNodes.length;
    categories = new HashMap((int)(cns/.75) + 1);
    for (int x=0; x<cns; x++)
    categories.put(categoryNames[x], categoryNodes[x]);
    }
    Node n = (Node)categories.get(name);
    if (n != null)
    return n;

    return familyError(name, "Unknown character category {");
    }

    categories is a HashMap of the known categories. It's a static member.
    The thing that strikes me is that the creation of the map is not
    synchronised, so is it possible that one thread could be in the process
    of populating the categories when another thread comes along and uses
    the part populated map?

    As I say, I'm no expert in java. Could someone with more expertise
    confirm if this is plausible?
     
    jahhaj, Nov 18, 2005
    #8
  9. jahhaj

    Chris Uppal Guest

    jahhaj wrote:

    > Hmm, I'm no java expert but if you look at the code in Pattern you see
    > this
    >
    >
    > private Node retrieveCategoryNode(String name) {
    > if (categories == null) {
    > int cns = categoryNodes.length;
    > categories = new HashMap((int)(cns/.75) + 1);
    > for (int x=0; x<cns; x++)
    > categories.put(categoryNames[x], categoryNodes[x]);
    > }
    > Node n = (Node)categories.get(name);
    > if (n != null)
    > return n;
    >
    > return familyError(name, "Unknown character category {");
    > }
    >
    > categories is a HashMap of the known categories. It's a static member.


    Ugh! Unless there's something subtle that I've missed, that code is completely
    broken. It isn't even /nearly/ right (it could at least wait unless the new
    HashMap was populated before assigning it to the 'categories' variable -- which
    would still be technically incorrect).

    That code has been completely replaced in 1.5.0 by something that /is/ correct
    (I think).

    Can you force the Pattern initalisation to happen early (before any of your
    real threads are running) by compiling a throwaway Regex during some sort of
    system initialisation phase ?

    -- chris
     
    Chris Uppal, Nov 18, 2005
    #9
  10. jahhaj coughed up:
    > Roedy Green wrote:
    >> On 18 Nov 2005 04:53:40 -0800, "jahhaj" <>
    >> wrote, quoted or indirectly quoted someone who said :
    >>
    >>> Unknown character category {Digit} near index 8
    >>> \p{Digit}{1,2}

    >>
    >> you did not post your code so I wrote this SSCCE
    >>

    >
    > My real code is a few lines inside a large J2EE application. I know
    > that if I extract the code and run it in a different environment then
    > it will work fine.


    Two ideas pulled out of someplace fairly dark:

    1. Don't run it in a different environment. Extract it and keep it as much
    as possible in the /same/ environment.

    2. Don't "extract" it at all. Instead /pair down/ the problem code as much
    as you can, possibly by putting in the testing code around the issue, and
    keep testing until you remove something and see the problem go away.

    This is a technique that works very well to expose many things. Even if
    your paired down version ends up looking just like the extracted version you
    already attempted, there might be a smidgeon of a detail missing that will
    illuminate the problem.

    I hope this applies to your issue. YMM(ofcourse)V.


    > My interest is in suggestions for what could
    > possibily be going wrong for the JVM not to recognise a perfectly
    > standard character category.
    >
    > If you look at the source for Pattern then the character categories are
    > looked up in a simple map, in a single place in the code. How could
    > this go wrong? That's my question.




    --
    Onedoctortoanother:"Ifthisismyrectalthermometer,wherethehell'smypen???"
     
    Thomas G. Marshall, Nov 18, 2005
    #10
  11. jahhaj

    Roedy Green Guest

    On Fri, 18 Nov 2005 17:09:56 +0100, "Robert Klemme" <>
    wrote, quoted or indirectly quoted someone who said :

    >> If you look at the source for Pattern then the character categories
    >> are looked up in a simple map, in a single place in the code. How
    >> could this go wrong? That's my question.


    did Bea reimplement Regex for speed and simply failed to test
    adequately.
    If you can get my code into Bea and get it to fail, you can submit it
    as a bug report.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Nov 18, 2005
    #11
  12. jahhaj

    Roedy Green Guest

    On 18 Nov 2005 09:01:58 -0800, "jahhaj" <>
    wrote, quoted or indirectly quoted someone who said :

    >As I say, I'm no expert in java. Could someone with more expertise
    >confirm if this is plausible?


    Try some code that "warms up" the Pattern class. You might even sleep.

    Pattern dummy Pattern.compile("a");

    Pattern p = Pattern.compile("(\\p{Digit}){1,2}");

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Nov 18, 2005
    #12
  13. jahhaj

    Ravi Guest

    hi
    u can use "\\d{1,2}"
     
    Ravi, Nov 21, 2005
    #13
  14. jahhaj

    jahhaj Guest

    jahhaj wrote:
    > Here's the message I get from a PatternSyntaxException
    >
    > Unknown character category {Digit} near index 8
    > \p{Digit}{1,2}
    > ^
    >


    Thanks to everyone who replied to my query. Turns out this is a known
    bug

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6238699

    Really, really poor coding by Sun.

    john
     
    jahhaj, Nov 21, 2005
    #14
  15. jahhaj

    Roedy Green Guest

    On 21 Nov 2005 01:46:55 -0800, "jahhaj" <>
    wrote, quoted or indirectly quoted someone who said :

    >Thanks to everyone who replied to my query. Turns out this is a known
    >bug
    >
    >http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6238699
    >
    >Really, really poor coding by Sun.


    So the bug is fixed is JDK 1.5 but not in Bea?
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Nov 21, 2005
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Emilio

    MSFT Why does this happen?

    Emilio, Oct 30, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    351
    Alvin Bruney
    Oct 30, 2003
  2. Ross

    When does Page Load happen?

    Ross, Nov 5, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    556
    Kevin Spencer
    Nov 5, 2003
  3. Yok
    Replies:
    1
    Views:
    371
    Sami Vaaraniemi
    Jan 18, 2004
  4. Claudio Grondi
    Replies:
    1
    Views:
    284
    Roger Upole
    Jun 7, 2005
  5. Joel Hedlund
    Replies:
    9
    Views:
    253
    Joel Hedlund
    Dec 19, 2008
Loading...

Share This Page