K
kevin cline
I have complex multi-line string to parse, so I created a complex
regular expression by combining a bunch of simpler regular
expressions, like this:
private static final String WS = " +";
private static final String EOL = " *\n";
private static final String REST_OF_LINE = ".*\n";
private static final String REST_OF_BLOCK = REST_OF_LINE + "(?:" +
WS + REST_OF_LINE + ")*";
private static final String AMOUNT = "\\d+\\.\\d+";
private static final String CURRENCY = "[A-Z]{3}" + AMOUNT;
private static final String FARE = "[A-Z]{3} +\\d*" + EOL
+ WS + CURRENCY + " +" + CURRENCY + EOL
+ WS + AMOUNT + REST_OF_LINE
+ WS + AMOUNT + "[A-Z]*" + EOL
+ " {7}" + REST_OF_LINE;
...
private static final java.util.regex.Pattern PAT =
Pattern.compile( ... );
This works great to recognize valid input, but extracting the data
parsed is not so easy. I wanted to capture it all with capturing
groups, but I ran into two problems: first, the Matcher only stores
the last match for each group,
and second, the groups have to be accessed by index, which would
require keeping track of them in the whole expression.
Is there a more powerful regular expression class out there somewhere,
or a more powerful parsing technology that would help with this
problem? It would be a trivial matter in either Perl (by attaching
code to the sub-expressions) or in C++ (using the SPIRIT parsing
library), but in Java I'm pretty clueless.
Thanks for the help.
regular expression by combining a bunch of simpler regular
expressions, like this:
private static final String WS = " +";
private static final String EOL = " *\n";
private static final String REST_OF_LINE = ".*\n";
private static final String REST_OF_BLOCK = REST_OF_LINE + "(?:" +
WS + REST_OF_LINE + ")*";
private static final String AMOUNT = "\\d+\\.\\d+";
private static final String CURRENCY = "[A-Z]{3}" + AMOUNT;
private static final String FARE = "[A-Z]{3} +\\d*" + EOL
+ WS + CURRENCY + " +" + CURRENCY + EOL
+ WS + AMOUNT + REST_OF_LINE
+ WS + AMOUNT + "[A-Z]*" + EOL
+ " {7}" + REST_OF_LINE;
...
private static final java.util.regex.Pattern PAT =
Pattern.compile( ... );
This works great to recognize valid input, but extracting the data
parsed is not so easy. I wanted to capture it all with capturing
groups, but I ran into two problems: first, the Matcher only stores
the last match for each group,
and second, the groups have to be accessed by index, which would
require keeping track of them in the whole expression.
Is there a more powerful regular expression class out there somewhere,
or a more powerful parsing technology that would help with this
problem? It would be a trivial matter in either Perl (by attaching
code to the sub-expressions) or in C++ (using the SPIRIT parsing
library), but in Java I'm pretty clueless.
Thanks for the help.