RegExp Group Headache

A

Alan

I do not understand why the regular expression string in the code
below is giving me lines of text and not paragraphs. I am trying to
get the start and end of the whole, repeated pattern.

The output I am getting is:

Chunk: one

What I was expecting was:

Chunk: one
two
three

Can anyone explain this to me? Thank you, Alan

import java.util.regex.*;

public class TextProcessor
{

public static void main(String[] args)
{
String TestString = "one \n two \n three \n\n Another Paragraph";
System.out.println ( "Chunk: " + getChunk ( TestString, 0 ) );
System.out.println ("\n");
}

private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
$)+", Pattern.MULTILINE);

public static String getChunk ( String InputString, int
StartPosition )
{
String OutputString = "";

Matcher matcher = PARA_PATTERN.matcher ( InputString );
try
{
if ( matcher.find ( StartPosition ) )
{
OutputString = InputString.substring(matcher.start(),
matcher.end());
}
}
catch ( IndexOutOfBoundsException e ) { e.printStackTrace();}
catch ( IllegalStateException e ) { e.printStackTrace();}

return OutputString;
}

}
 
I

Ingo Menger

I do not understand why the regular expression string in the code
below is giving me lines of text and not paragraphs. I am trying to
get the start and end of the whole, repeated pattern.

The output I am getting is:

Chunk: one

What I was expecting was:

Chunk: one
two
three

Can anyone explain this to me? Thank you, Alan

import java.util.regex.*;

public class TextProcessor
{

public static void main(String[] args)
{
String TestString = "one \n two \n three \n\n Another Paragraph";
System.out.println ( "Chunk: " + getChunk ( TestString, 0 ) );
System.out.println ("\n");

}

private static final Pattern PARA_PATTERN = Pattern.compile("(^.*\\S+.*
$)+", Pattern.MULTILINE);

First, let me say that you gave a very nice problem description.
Therefore, I'll try to answer your question.
The answer "one" makes perfect sense, since you wanted a string that
- consists of one or more substrings that
- start at the beginning of the string OR just after a newline
- contain zero or more characters, followed by one or more nonspace
characters, followed by zero or more characters
- end just before a newline OR the end of the string

Your pattern matches only once, because the anchors ^ and $, according
to the docs, match just before or just after the newline. But the
newline itself is not matched.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top