Parsing multiple lines with regex

B

banansol

Hi,
I'm making a program that seraches a html file for special lines, and
I use the
Scanner class with a regex to find those lines. It is working well,
but now I wan't
to extract data that is on lines directly following my searched line.
If I could make big
regex that included several lines it would be no problem, but since
the method
findInLine() only searches one line at a time, I'm not sure how to do
it.
What is a good way to do this? I don't want a full blown html parser,
just a way to
find some 3-4 lines with same structure that are repeated several
times in a document.

As I said, if I could somehow could get a regex to span multiple
lines, that would be
good, I think. Another idea is to nest the searches, so that when I
find the first line
I could call a method or something that manually gets the remaing
lines and extract
the data from them, but that sounds more complicated. I'm not sure.
Any ideas?

Thanks!
 
T

TechBookReport

Hi,
I'm making a program that seraches a html file for special lines, and
I use the
Scanner class with a regex to find those lines. It is working well,
but now I wan't
to extract data that is on lines directly following my searched line.
If I could make big
regex that included several lines it would be no problem, but since
the method
findInLine() only searches one line at a time, I'm not sure how to do
it.
What is a good way to do this? I don't want a full blown html parser,
just a way to
find some 3-4 lines with same structure that are repeated several
times in a document.

As I said, if I could somehow could get a regex to span multiple
lines, that would be
good, I think. Another idea is to nest the searches, so that when I
find the first line
I could call a method or something that manually gets the remaing
lines and extract
the data from them, but that sounds more complicated. I'm not sure.
Any ideas?

Thanks!
Can't you use Pattern.MULTILINE, or am I missing something?
 
J

Jeff Higgins

banansol said:
Hi,

As I said, if I could somehow could get a regex to span multiple
lines, that would be
good, I think.

I recently had good luck using a regex across several lines using
Scanner.findWithinHorizon(Pattern pattern, int horizon).

If horizon is 0, then the horizon is ignored and this method continues
to search through the input looking for the specified pattern without bound.
Another idea is to nest the searches, so that when I
find the first line
I could call a method or something that manually gets the remaing
lines and extract
the data from them, but that sounds more complicated. I'm not sure.
Any ideas?

while (scanner.hasNext()) {
for (int i = 0; i < 4; i++) {
String match = sc.findWithinHorizon(pattern, 0);
if (match.endsWith(",")) {}
if (match.startsWith("\"")) {}
if (match.length() == 0){}
if(i == 0){}
else if(i == 1){}
else if(i == 2){}
else{}
}
list.add(match);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top