Keeping the split token in a Java regular expression

Discussion in 'Java' started by laredotornado, Mar 26, 2012.

  1. Hi,

    I'm using Java 6. I want to split a Java string on a regular
    expression, but I would like to keep part of the string used to split
    in the results. What I have are Strings like

    Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

    What I would like to do is split the expression wherever I have an
    expression matching /(am|pm),?/i . Hopefully I got that right. In
    the above example, I would like the results to be

    Fri 7:30 PM
    Sat 2 PM
    Sun 2:30 PM

    But with String.split, the split token is not kept within the
    results. How would I write a Java parsing expression to do what I
    want?

    Thanks, - Dave
     
    laredotornado, Mar 26, 2012
    #1
    1. Advertising

  2. laredotornado

    Lew Guest

    laredotornado wrote:
    > I'm using Java 6. I want to split a Java string on a regular
    > expression, but I would like to keep part of the string used to split
    > in the results. What I have are Strings like
    >
    > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >
    > What I would like to do is split the expression wherever I have an
    > expression matching /(am|pm),?/i . Hopefully I got that right. In
    > the above example, I would like the results to be
    >
    > Fri 7:30 PM
    > Sat 2 PM
    > Sun 2:30 PM
    >
    > But with String.split, the split token is not kept within the
    > results. How would I write a Java parsing expression to do what I
    > want?


    Based on what you've shown it looks like you could split on the comma and trim the resulting strings.

    --
    Lew
     
    Lew, Mar 26, 2012
    #2
    1. Advertising

  3. On 03/26/2012 09:22 PM, Lew wrote:
    > laredotornado wrote:
    >> I'm using Java 6. I want to split a Java string on a regular
    >> expression, but I would like to keep part of the string used to split
    >> in the results. What I have are Strings like
    >>
    >> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >>
    >> What I would like to do is split the expression wherever I have an
    >> expression matching /(am|pm),?/i . Hopefully I got that right. In
    >> the above example, I would like the results to be
    >>
    >> Fri 7:30 PM
    >> Sat 2 PM
    >> Sun 2:30 PM
    >>
    >> But with String.split, the split token is not kept within the
    >> results. How would I write a Java parsing expression to do what I
    >> want?

    >
    > Based on what you've shown it looks like you could split on the comma and trim the resulting strings.


    And one wouldn't even need a regular expression for that.
    http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html

    Kind regards

    robert
     
    Robert Klemme, Mar 26, 2012
    #3
  4. laredotornado

    markspace Guest

    On 3/26/2012 11:54 AM, laredotornado wrote:

    > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >
    > But with String.split, the split token is not kept within the
    > results. How would I write a Java parsing expression to do what I
    > want?



    What Lew said.

    String[] dates = dateString.split( ", +" );

    for( String date : dates ) {

    String temp = date.trim().toUpper();

    if( temp.endsWith( "PM" ) ) {
    System.out.println( "Good afternoon." );
    else if( temp.endsWith( "AM" ) {
    System.out.println( "Good morning." );
    } else {
    System.out.println( "Good whatever." );
    }

    }
     
    markspace, Mar 26, 2012
    #4
  5. laredotornado

    Guest

    On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
    > Hi,
    >
    > I'm using Java 6. I want to split a Java string on a regular
    > expression, but I would like to keep part of the string used to split
    > in the results. What I have are Strings like
    >
    > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >
    > What I would like to do is split the expression wherever I have an
    > expression matching /(am|pm),?/i . Hopefully I got that right. In
    > the above example, I would like the results to be
    >
    > Fri 7:30 PM
    > Sat 2 PM
    > Sun 2:30 PM
    >
    > But with String.split, the split token is not kept within the
    > results. How would I write a Java parsing expression to do what I
    > want?
    >
    > Thanks, - Dave


    Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing

    Fri 8 PM
    Sat 1, 3, and 5 PM

    Your continued help is appreciated, - Dave
     
    , Mar 26, 2012
    #5
  6. laredotornado

    markspace Guest

    On 3/26/2012 2:21 PM, wrote:

    > Hi, I don't want to split on the comma because there could be a case
    > where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this
    > case, I want the result to be a String array containing
    >
    > Fri 8 PM Sat 1, 3, and 5 PM



    You might be able to do this with clever use of regex look-around:

    http://www.regular-expressions.info/lookaround.html

    Maybe something like "(?<=M),". Definitely take some time to test that
    carefully though.

    Otherwise, you'll have to write your own parser (which wouldn't be hard).
     
    markspace, Mar 26, 2012
    #6
  7. On 3/26/2012 2:21 PM, wrote:
    > On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
    >> Hi,
    >>
    >> I'm using Java 6. I want to split a Java string on a regular
    >> expression, but I would like to keep part of the string used to split
    >> in the results. What I have are Strings like
    >>
    >> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >>
    >> What I would like to do is split the expression wherever I have an
    >> expression matching /(am|pm),?/i . Hopefully I got that right. In
    >> the above example, I would like the results to be
    >>
    >> Fri 7:30 PM
    >> Sat 2 PM
    >> Sun 2:30 PM
    >>
    >> But with String.split, the split token is not kept within the
    >> results. How would I write a Java parsing expression to do what I
    >> want?
    >>
    >> Thanks, - Dave

    >
    > Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
    >
    > Fri 8 PM
    > Sat 1, 3, and 5 PM
    >
    > Your continued help is appreciated, - Dave


    public class test {
    public static void main(String[] args) {
    String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
    String token = "PM, |PM";

    String[] strs = str.split(token);
    for (String s : strs)
    System.out.println(s+"PM");

    }
    }

    C:\Documents and Settings\Knute Johnson>java test
    Fri 7:30 PM
    Fri 8 PM
    Sat 1, 3, and 5 PM

    If you wanted to get AMs too, you could do a first pass for the PMs and
    then do it again for the AMs.

    --

    Knute Johnson
     
    Knute Johnson, Mar 26, 2012
    #7
  8. laredotornado

    markspace Guest

    On 3/26/2012 3:56 PM, Knute Johnson wrote:

    > String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";

    ....
    > System.out.println(s+"PM");

    ^^

    What does this print if the "str" string ends with AM instead of PM? I
    don't think this actually works....
     
    markspace, Mar 27, 2012
    #8
  9. laredotornado

    Stefan Ram Guest

    laredotornado <> writes:
    >What I would like to do is split the expression wherever I have an


    public class Main
    {
    public static void split
    ( final java.lang.String text )
    { java.util.regex.Pattern pattern =
    java.util.regex.Pattern.compile
    ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
    java.util.regex.Matcher matcher = pattern.matcher( text );
    while( matcher.find() )
    java.lang.System.out.println( matcher.group( 0 )); }

    public static void main( final java.lang.String[] args )
    { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
     
    Stefan Ram, Mar 27, 2012
    #9
  10. laredotornado

    Lew Guest

    Stefan Ram wrote:
    > laredotornado writes:
    >>What I would like to do is split the expression wherever I have an

    >
    > public class Main
    > {
    > public static void split
    > ( final java.lang.String text )
    > { java.util.regex.Pattern pattern =
    > java.util.regex.Pattern.compile
    > ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
    > java.util.regex.Matcher matcher = pattern.matcher( text );
    > while( matcher.find() )
    > java.lang.System.out.println( matcher.group( 0 )); }
    >
    > public static void main( final java.lang.String[] args )
    > { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}


    This excellent (except for layout) example deserves to be archived.

    --
    Lew
     
    Lew, Mar 27, 2012
    #10
  11. On 3/26/2012 4:02 PM, markspace wrote:
    > On 3/26/2012 3:56 PM, Knute Johnson wrote:
    >
    >> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";

    > ...
    >> System.out.println(s+"PM");

    > ^^
    >
    > What does this print if the "str" string ends with AM instead of PM? I
    > don't think this actually works....
    >


    It won't. He'll have to make a two-pass system if he's going to split
    on two different tokens. I think I said that.

    --

    Knute Johnson
     
    Knute Johnson, Mar 27, 2012
    #11
  12. On 3/26/2012 4:26 PM, Lew wrote:
    > Stefan Ram wrote:
    >> laredotornado writes:
    >>> What I would like to do is split the expression wherever I have an

    >>
    >> public class Main
    >> {
    >> public static void split
    >> ( final java.lang.String text )
    >> { java.util.regex.Pattern pattern =
    >> java.util.regex.Pattern.compile
    >> ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
    >> java.util.regex.Matcher matcher = pattern.matcher( text );
    >> while( matcher.find() )
    >> java.lang.System.out.println( matcher.group( 0 )); }
    >>
    >> public static void main( final java.lang.String[] args )
    >> { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}

    >
    > This excellent (except for layout) example deserves to be archived.
    >


    I like that too. I tried it but I didn't get this.

    --

    Knute Johnson
     
    Knute Johnson, Mar 27, 2012
    #12
  13. In article
    <>,
    laredotornado <> wrote:

    > I'm using Java 6. I want to split a Java string on a regular
    > expression, but I would like to keep part of the string used to split
    > in the results. What I have are Strings like
    >
    > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >
    > What I would like to do is split the expression wherever I have an
    > expression matching /(am|pm),?/i . Hopefully I got that right. In
    > the above example, I would like the results to be
    >
    > Fri 7:30 PM
    > Sat 2 PM
    > Sun 2:30 PM
    >
    > But with String.split, the split token is not kept within the
    > results. How would I write a Java parsing expression to do what I
    > want?


    Instead of split, why not parse and format?

    --
    John B. Matthews
    trashgod at gmail dot com
    <http://sites.google.com/site/drjohnbmatthews>
     
    John B. Matthews, Mar 27, 2012
    #13
  14. On Mon, 26 Mar 2012 17:33:51 -0700, Knute Johnson wrote:

    > On 3/26/2012 4:02 PM, markspace wrote:
    >> On 3/26/2012 3:56 PM, Knute Johnson wrote:
    >>
    >>> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";

    >> ...
    >>> System.out.println(s+"PM");

    >> ^^
    >>
    >> What does this print if the "str" string ends with AM instead of PM? I
    >> don't think this actually works....
    >>
    >>

    > It won't. He'll have to make a two-pass system if he's going to split
    > on two different tokens. I think I said that


    Then you'd something like the following, semi-pseudo-coded as:

    slist = in.split("PM, +|PM")
    for (int i=0; i<slist.length; i++)
    slist = slist.trim() + "PM";

    ArrayList<String> alist = new ArrayList<String>;;
    for (s : slist)
    sp = s.split("AM, +|AM");
    for (int j=0; j < s.length; j++)
    alist.add(s.trim() + "AM");


    ...but its ugly. I think it can be done in one pass using a regex with
    capture groups along the lines of

    "(.*)([AP]M ,|[AP]M)"

    If I got that right, each time expression that the OP needs to split
    out is represented by a pair of adjacent capture groups, so just a
    single pass along the array of capture groups concatenating adjacent
    pairs and applying trim() to each concatenated pair should do the
    trick.

    Its rather late here, so I'll leave this as an exercise for anybody
    who feels keen. If nobody has touched it by mid morning tomorrow I may
    see if it works.

    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Mar 27, 2012
    #14
  15. On Mon, 26 Mar 2012 14:21:07 -0700 (PDT),
    wrote:

    >On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
    >> Hi,
    >>
    >> I'm using Java 6. I want to split a Java string on a regular
    >> expression, but I would like to keep part of the string used to split
    >> in the results. What I have are Strings like
    >>
    >> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >>
    >> What I would like to do is split the expression wherever I have an
    >> expression matching /(am|pm),?/i . Hopefully I got that right. In
    >> the above example, I would like the results to be
    >>
    >> Fri 7:30 PM
    >> Sat 2 PM
    >> Sun 2:30 PM
    >>
    >> But with String.split, the split token is not kept within the
    >> results. How would I write a Java parsing expression to do what I
    >> want?
    >>
    >> Thanks, - Dave

    >
    >Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
    >
    >Fri 8 PM
    >Sat 1, 3, and 5 PM
    >
    >Your continued help is appreciated, - Dave


    What about "Sun 9, 11 AM, and 1 PM"? Or "Sun 9 and 11 AM, and 1
    and 3 PM"?

    I think you had better be quite sure of all of the variants. For
    that matter, people often omit the comma before "and" which would give
    "Sun 9, 11 AM and 1 PM" for my first example. Such people have
    probably not seen
    http://www.outsidethebeltway.com/oxford-comma-cartoon/
    or other such references.

    Sincerely,

    Gene Wirchenko
     
    Gene Wirchenko, Mar 27, 2012
    #15
  16. laredotornado

    Arne Vajhøj Guest

    On 3/26/2012 4:01 PM, Robert Klemme wrote:
    > On 03/26/2012 09:22 PM, Lew wrote:
    >> laredotornado wrote:
    >>> I'm using Java 6. I want to split a Java string on a regular
    >>> expression, but I would like to keep part of the string used to split
    >>> in the results. What I have are Strings like
    >>>
    >>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >>>
    >>> What I would like to do is split the expression wherever I have an
    >>> expression matching /(am|pm),?/i . Hopefully I got that right. In
    >>> the above example, I would like the results to be
    >>>
    >>> Fri 7:30 PM
    >>> Sat 2 PM
    >>> Sun 2:30 PM
    >>>
    >>> But with String.split, the split token is not kept within the
    >>> results. How would I write a Java parsing expression to do what I
    >>> want?

    >>
    >> Based on what you've shown it looks like you could split on the comma
    >> and trim the resulting strings.

    >
    > And one wouldn't even need a regular expression for that.
    > http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html


    StringTokenizer is somewhat obsoleted by String split.

    So even for a pure literal expression then using split is
    common.

    Arne
     
    Arne Vajhøj, Mar 27, 2012
    #16
  17. laredotornado

    Arne Vajhøj Guest

    On 3/26/2012 2:54 PM, laredotornado wrote:
    > I'm using Java 6. I want to split a Java string on a regular
    > expression, but I would like to keep part of the string used to split
    > in the results. What I have are Strings like
    >
    > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >
    > What I would like to do is split the expression wherever I have an
    > expression matching /(am|pm),?/i . Hopefully I got that right. In
    > the above example, I would like the results to be
    >
    > Fri 7:30 PM
    > Sat 2 PM
    > Sun 2:30 PM
    >
    > But with String.split, the split token is not kept within the
    > results. How would I write a Java parsing expression to do what I
    > want?


    A hackish solution:

    String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");

    Arne
     
    Arne Vajhøj, Mar 27, 2012
    #17
  18. laredotornado

    Lew Guest

    Gene Wirchenko wrote:
    > What about "Sun 9, 11 AM, and 1 PM"?
    > Or "Sun 9 and 11 AM, and 1 and 3 PM"?
    >
    > I think you had better be quite sure of all of the variants. For
    > that matter, people often omit the comma before "and" which would give
    > "Sun 9, 11 AM and 1 PM" for my first example. Such people have
    > probably not seen
    > http://www.outsidethebeltway.com/oxford-comma-cartoon/
    > or other such references.


    The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.

    Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.

    It does require a careful, methodical approach.

    --
    Lew
     
    Lew, Mar 27, 2012
    #18
  19. On 3/26/2012 7:07 PM, Lew wrote:
    > Gene Wirchenko wrote:
    >> What about "Sun 9, 11 AM, and 1 PM"?
    >> Or "Sun 9 and 11 AM, and 1 and 3 PM"?
    >>
    >> I think you had better be quite sure of all of the variants. For
    >> that matter, people often omit the comma before "and" which would give
    >> "Sun 9, 11 AM and 1 PM" for my first example. Such people have
    >> probably not seen
    >> http://www.outsidethebeltway.com/oxford-comma-cartoon/
    >> or other such references.

    >
    > The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.
    >
    > Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.
    >
    > It does require a careful, methodical approach.
    >


    You've been awfully poetic lately Lew.

    --

    Knute Johnson
     
    Knute Johnson, Mar 27, 2012
    #19
  20. laredotornado

    Daniel Pitts Guest

    On 3/26/12 6:58 PM, Arne Vajhøj wrote:
    > On 3/26/2012 2:54 PM, laredotornado wrote:
    >> I'm using Java 6. I want to split a Java string on a regular
    >> expression, but I would like to keep part of the string used to split
    >> in the results. What I have are Strings like
    >>
    >> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
    >>
    >> What I would like to do is split the expression wherever I have an
    >> expression matching /(am|pm),?/i . Hopefully I got that right. In
    >> the above example, I would like the results to be
    >>
    >> Fri 7:30 PM
    >> Sat 2 PM
    >> Sun 2:30 PM
    >>
    >> But with String.split, the split token is not kept within the
    >> results. How would I write a Java parsing expression to do what I
    >> want?

    >
    > A hackish solution:
    >
    > String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
    >
    > Arne
    >

    Nice. As far as hackish, using "split" for this purpose at all is
    hackish. Stefan Ram had the right algorithm (though strange formatting)

    Stefan Ram wrote:
    > public class Main
    > {
    > public static void split
    > ( final java.lang.String text )
    > { java.util.regex.Pattern pattern =
    > java.util.regex.Pattern.compile
    > ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
    > java.util.regex.Matcher matcher = pattern.matcher( text );
    > while( matcher.find() )
    > java.lang.System.out.println( matcher.group( 0 )); }
    >
    > public static void main( final java.lang.String[] args )
    > { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
    >
     
    Daniel Pitts, Mar 27, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Cronus
    Replies:
    1
    Views:
    681
    Paul Mensonides
    Jul 15, 2004
  2. G Fernandes
    Replies:
    1
    Views:
    532
  3. Wessi
    Replies:
    3
    Views:
    867
    Lawrence Kirby
    Aug 11, 2005
  4. =?Utf-8?B?Y2FzaGRlc2ttYWM=?=

    This is an unexpected token. The expected token is 'NAME'

    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=, Jul 13, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    789
    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=
    Jul 13, 2007
  5. Replies:
    1
    Views:
    111
    Scott
    Feb 9, 2007
Loading...

Share This Page