Keeping the split token in a Java regular expression

L

laredotornado

Hi,

I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

Thanks, - Dave
 
L

Lew

laredotornado said:
I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

Based on what you've shown it looks like you could split on the comma and trim the resulting strings.
 
M

markspace

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?


What Lew said.

String[] dates = dateString.split( ", +" );

for( String date : dates ) {

String temp = date.trim().toUpper();

if( temp.endsWith( "PM" ) ) {
System.out.println( "Good afternoon." );
else if( temp.endsWith( "AM" ) {
System.out.println( "Good morning." );
} else {
System.out.println( "Good whatever." );
}

}
 
L

laredotornado

Hi,

I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

Thanks, - Dave

Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing

Fri 8 PM
Sat 1, 3, and 5 PM

Your continued help is appreciated, - Dave
 
M

markspace

Hi, I don't want to split on the comma because there could be a case
where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this
case, I want the result to be a String array containing

Fri 8 PM Sat 1, 3, and 5 PM


You might be able to do this with clever use of regex look-around:

http://www.regular-expressions.info/lookaround.html

Maybe something like "(?<=M),". Definitely take some time to test that
carefully though.

Otherwise, you'll have to write your own parser (which wouldn't be hard).
 
K

Knute Johnson

Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing

Fri 8 PM
Sat 1, 3, and 5 PM

Your continued help is appreciated, - Dave

public class test {
public static void main(String[] args) {
String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
String token = "PM, |PM";

String[] strs = str.split(token);
for (String s : strs)
System.out.println(s+"PM");

}
}

C:\Documents and Settings\Knute Johnson>java test
Fri 7:30 PM
Fri 8 PM
Sat 1, 3, and 5 PM

If you wanted to get AMs too, you could do a first pass for the PMs and
then do it again for the AMs.
 
M

markspace

String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM"; ....
System.out.println(s+"PM");
^^

What does this print if the "str" string ends with AM instead of PM? I
don't think this actually works....
 
S

Stefan Ram

laredotornado said:
What I would like to do is split the expression wherever I have an

public class Main
{
public static void split
( final java.lang.String text )
{ java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile
( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
java.util.regex.Matcher matcher = pattern.matcher( text );
while( matcher.find() )
java.lang.System.out.println( matcher.group( 0 )); }

public static void main( final java.lang.String[] args )
{ split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
 
L

Lew

Stefan said:
laredotornado said:
What I would like to do is split the expression wherever I have an

public class Main
{
public static void split
( final java.lang.String text )
{ java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile
( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
java.util.regex.Matcher matcher = pattern.matcher( text );
while( matcher.find() )
java.lang.System.out.println( matcher.group( 0 )); }

public static void main( final java.lang.String[] args )
{ split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}

This excellent (except for layout) example deserves to be archived.
 
K

Knute Johnson

^^

What does this print if the "str" string ends with AM instead of PM? I
don't think this actually works....

It won't. He'll have to make a two-pass system if he's going to split
on two different tokens. I think I said that.
 
K

Knute Johnson

Stefan said:
laredotornado said:
What I would like to do is split the expression wherever I have an

public class Main
{
public static void split
( final java.lang.String text )
{ java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile
( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
java.util.regex.Matcher matcher = pattern.matcher( text );
while( matcher.find() )
java.lang.System.out.println( matcher.group( 0 )); }

public static void main( final java.lang.String[] args )
{ split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}

This excellent (except for layout) example deserves to be archived.

I like that too. I tried it but I didn't get this.
 
J

John B. Matthews

laredotornado said:
I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

Instead of split, why not parse and format?
 
M

Martin Gregorie

It won't. He'll have to make a two-pass system if he's going to split
on two different tokens. I think I said that

Then you'd something like the following, semi-pseudo-coded as:

slist = in.split("PM, +|PM")
for (int i=0; i<slist.length; i++)
slist = slist.trim() + "PM";

ArrayList<String> alist = new ArrayList<String>;;
for (s : slist)
sp = s.split("AM, +|AM");
for (int j=0; j < s.length; j++)
alist.add(s.trim() + "AM");


...but its ugly. I think it can be done in one pass using a regex with
capture groups along the lines of

"(.*)([AP]M ,|[AP]M)"

If I got that right, each time expression that the OP needs to split
out is represented by a pair of adjacent capture groups, so just a
single pass along the array of capture groups concatenating adjacent
pairs and applying trim() to each concatenated pair should do the
trick.

Its rather late here, so I'll leave this as an exercise for anybody
who feels keen. If nobody has touched it by mid morning tomorrow I may
see if it works.
 
G

Gene Wirchenko

Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing

Fri 8 PM
Sat 1, 3, and 5 PM

Your continued help is appreciated, - Dave

What about "Sun 9, 11 AM, and 1 PM"? Or "Sun 9 and 11 AM, and 1
and 3 PM"?

I think you had better be quite sure of all of the variants. For
that matter, people often omit the comma before "and" which would give
"Sun 9, 11 AM and 1 PM" for my first example. Such people have
probably not seen
http://www.outsidethebeltway.com/oxford-comma-cartoon/
or other such references.

Sincerely,

Gene Wirchenko
 
A

Arne Vajhøj

I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

A hackish solution:

String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");

Arne
 
L

Lew

Gene said:
What about "Sun 9, 11 AM, and 1 PM"?
Or "Sun 9 and 11 AM, and 1 and 3 PM"?

I think you had better be quite sure of all of the variants. For
that matter, people often omit the comma before "and" which would give
"Sun 9, 11 AM and 1 PM" for my first example. Such people have
probably not seen
http://www.outsidethebeltway.com/oxford-comma-cartoon/
or other such references.

The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.

Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.

It does require a careful, methodical approach.
 
K

Knute Johnson

The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.

Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.

It does require a careful, methodical approach.

You've been awfully poetic lately Lew.
 
D

Daniel Pitts

I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like

Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be

Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM

But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?

A hackish solution:

String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");

Arne
Nice. As far as hackish, using "split" for this purpose at all is
hackish. Stefan Ram had the right algorithm (though strange formatting)

Stefan said:
public class Main
{
public static void split
( final java.lang.String text )
{ java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile
( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
java.util.regex.Matcher matcher = pattern.matcher( text );
while( matcher.find() )
java.lang.System.out.println( matcher.group( 0 )); }

public static void main( final java.lang.String[] args )
{ split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top