Parsing with complex regular expressions

Discussion in 'Java' started by kevin cline, Apr 24, 2007.

  1. kevin  cline

    kevin cline Guest

    I have complex multi-line string to parse, so I created a complex
    regular expression by combining a bunch of simpler regular
    expressions, like this:

    private static final String WS = " +";
    private static final String EOL = " *\n";
    private static final String REST_OF_LINE = ".*\n";
    private static final String REST_OF_BLOCK = REST_OF_LINE + "(?:" +
    WS + REST_OF_LINE + ")*";
    private static final String AMOUNT = "\\d+\\.\\d+";
    private static final String CURRENCY = "[A-Z]{3}" + AMOUNT;

    private static final String FARE = "[A-Z]{3} +\\d*" + EOL
    + WS + CURRENCY + " +" + CURRENCY + EOL
    + WS + AMOUNT + REST_OF_LINE
    + WS + AMOUNT + "[A-Z]*" + EOL
    + " {7}" + REST_OF_LINE;

    ...

    private static final java.util.regex.Pattern PAT =
    Pattern.compile( ... );

    This works great to recognize valid input, but extracting the data
    parsed is not so easy. I wanted to capture it all with capturing
    groups, but I ran into two problems: first, the Matcher only stores
    the last match for each group,
    and second, the groups have to be accessed by index, which would
    require keeping track of them in the whole expression.

    Is there a more powerful regular expression class out there somewhere,
    or a more powerful parsing technology that would help with this
    problem? It would be a trivial matter in either Perl (by attaching
    code to the sub-expressions) or in C++ (using the SPIRIT parsing
    library), but in Java I'm pretty clueless.

    Thanks for the help.
     
    kevin cline, Apr 24, 2007
    #1
    1. Advertising

  2. kevin  cline

    Kai Schwebke Guest

    kevin cline schrieb:
    > I have complex multi-line string to parse, so I created a complex
    > regular expression by combining a bunch of simpler regular
    > expressions, like this:

    ....
    > Is there a more powerful regular expression class out there somewhere,
    > or a more powerful parsing technology that would help with this
    > problem?


    You may have a look at javacc, a parser generater for Java like
    yacc or bison for C (https://javacc.dev.java.net/).



    Kai
     
    Kai Schwebke, Apr 25, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    619
    Jay Douglas
    Aug 15, 2003
  2. Replies:
    12
    Views:
    2,086
    jan V
    Sep 15, 2005
  3. Captain Dondo

    Parsing HTML with Regular Expressions

    Captain Dondo, Jun 15, 2005, in forum: HTML
    Replies:
    7
    Views:
    645
    Gunnar Hjalmarsson
    Jun 15, 2005
  4. Max Adams
    Replies:
    4
    Views:
    113
    Tad McClellan
    Aug 29, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    240
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page