Finding row number for a Java regexp pattern match

K

ken1

I am using the Java regexp package classes Pattern and Matcher to find
matches in alot of text files i want to process to generate a list of
matches per file and which line numbers they where find on.
But I haven't found any easy way of finding out the line number on
which the match was found using the standard regexp classes.
Is there some way to ontain the row number taken into consideration
both Unix and windows newlines?

Kenneth Ljunggren
 
G

Gordon Beaton

I am using the Java regexp package classes Pattern and Matcher to
find matches in alot of text files i want to process to generate a
list of matches per file and which line numbers they where find on.
But I haven't found any easy way of finding out the line number on
which the match was found using the standard regexp classes. Is
there some way to ontain the row number taken into consideration
both Unix and windows newlines?

The regexp classes are not an appropriate tool for counting lines.

If you are reading your text files with BufferedReader.readLine(),
then the most suitable way to determine what line you are on is to
simply keep track of it in a counter. Reset the counter to zero when
you open the file, and increment it after reading each line.
BufferedReader.readLine() handles both of the line ending styles you
mention.

I get the impression from your question that you are reading in the
entire file before doing your matches. Unless you are looking for
patterns that span more than one line, I'd suggest that your program
would be simpler if you handled each line separately, i.e. read one
line, do the necessary matching, then read the next line. Your line
counting issue will solve itself.

/gordon
 
R

Roedy Green

I am using the Java regexp package classes Pattern and Matcher to find
matches in alot of text files i want to process to generate a list of
matches per file and which line numbers they where find on.
But I haven't found any easy way of finding out the line number on
which the match was found using the standard regexp classes.
Is there some way to ontain the row number taken into consideration
both Unix and windows newlines?

You could read by lines yourself and feed the lines one at a time to
your pattern matcher. Then you would know.

I don't think it will even tell you the offset, so there is no point
in creating a map of offsets to line numbers.

You might do some crude pattern matching yourself on a giant string
representing the entire file to find candidates where you count line
endings as you go, the feed them to regex for confirmation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top