java.util.regex.Pattern.split issue

Discussion in 'Java' started by Ichiro, Jul 12, 2009.

  1. Ichiro

    Ichiro Guest

    Hi there,

    I wrote a test harness for java.util.regex.Pattern.split and found
    that, at least from my point of view, it behaves inconsistently. See
    program and output below.

    In particular, a trailing delimiter does not generate an empty string
    on the right of the delimiter, so I get

    *******************
    'a,' splits into:
    'a'
    *******************

    while I expected

    *******************
    'a,' splits into:
    'a'
    ''
    *******************

    Also, possibly even more bizarrely

    *******************
    ',' splits into:
    *******************

    while (since the empty string splits into an empty string) I expected

    *******************
    ',,' splits into:
    ''
    ''
    *******************

    I tried to modify my pattern to also take beginning and end of string
    into account, like so

    pattern = Pattern.compile("[\\A\\z,]");

    but this generated a PatternSyntaxException.
    Can someone please suggest a way to achieve what I need?

    Finally, is this the right newsgroup for this kind of questions? There
    sure is a lot of noise (=spam) around here...

    Thanks much,
    Ichiro


    import java.util.regex.Pattern;
    public class Main
    {
    public static void main(String[] args)
    {
    String input;

    input = "test";
    printSplit(input);

    input = "";
    printSplit(input);

    input = "a,b";
    printSplit(input);

    input = "a,";
    printSplit(input);

    input = ",b";
    printSplit(input);

    input = ",";
    printSplit(input);

    input = "a,b,c";
    printSplit(input);

    input = "a,b,";
    printSplit(input);

    input = "a,,c";
    printSplit(input);

    input = ",b,c";
    printSplit(input);

    input = "a,,";
    printSplit(input);

    input = ",b,";
    printSplit(input);

    input = ",,c";
    printSplit(input);

    input = ",,";
    printSplit(input);
    }


    private static void printSplit(String input)
    {
    Pattern pattern;
    String[] output;

    pattern = Pattern.compile(",");
    output = pattern.split(input);

    System.out.println("'" + input + "' splits into:");
    for (String s : output)
    {
    System.out.println("'" + s + "'");
    }
    System.out.println("*******************");
    }
    }


    'test' splits into:
    'test'
    *******************
    '' splits into:
    ''
    *******************
    'a,b' splits into:
    'a'
    'b'
    *******************
    'a,' splits into:
    'a'
    *******************
    ',b' splits into:
    ''
    'b'
    *******************
    ',' splits into:
    *******************
    'a,b,c' splits into:
    'a'
    'b'
    'c'
    *******************
    'a,b,' splits into:
    'a'
    'b'
    *******************
    'a,,c' splits into:
    'a'
    ''
    'c'
    *******************
    ',b,c' splits into:
    ''
    'b'
    'c'
    *******************
    'a,,' splits into:
    'a'
    *******************
    ',b,' splits into:
    ''
    'b'
    *******************
    ',,c' splits into:
    ''
    ''
    'c'
    *******************
    ',,' splits into:
    *******************
     
    Ichiro, Jul 12, 2009
    #1
    1. Advertising

  2. Ichiro

    Arne Vajhøj Guest

    Ichiro wrote:
    > Hi there,
    >
    > I wrote a test harness for java.util.regex.Pattern.split and found
    > that, at least from my point of view, it behaves inconsistently. See
    > program and output below.
    >
    > In particular, a trailing delimiter does not generate an empty string
    > on the right of the delimiter, so I get
    >
    > *******************
    > 'a,' splits into:
    > 'a'
    > *******************
    >
    > while I expected
    >
    > *******************
    > 'a,' splits into:
    > 'a'
    > ''
    > *******************
    >
    > Also, possibly even more bizarrely
    >
    > *******************
    > ',' splits into:
    > *******************
    >
    > while (since the empty string splits into an empty string) I expected
    >
    > *******************
    > ',,' splits into:
    > ''
    > ''
    > *******************


    http://www.j2ee.me/javase/6/docs/api/java/lang/String.html#split(java.lang.String)

    <quote>
    Trailing empty strings are therefore not included in the resulting array.
    </quote>

    > I tried to modify my pattern to also take beginning and end of string
    > into account, like so
    >
    > pattern = Pattern.compile("[\\A\\z,]");
    >
    > but this generated a PatternSyntaxException.
    > Can someone please suggest a way to achieve what I need?


    I would suggest either java.util.regex.Pattern or good old
    StringTokenizer.

    > Finally, is this the right newsgroup for this kind of questions?


    Sure is.

    > There
    > sure is a lot of noise (=spam) around here...


    Trim your filters. It is usenet anno 2009.

    Arne
     
    Arne Vajhøj, Jul 12, 2009
    #2
    1. Advertising

  3. Ichiro

    Ichiro Guest

    > I would suggest either java.util.regex.Pattern or good old
    > StringTokenizer.


    Thank you Arne.
    Please note that I was actually using java.util.regex.Pattern.split,
    not String.split, and so I think I need to fine-tune my regular
    expression for the delimiter to achieve what I want - unfortunately my
    attempts have been unsuccessful so far.

    Cheers,
    Ichiro
     
    Ichiro, Jul 12, 2009
    #3
  4. Ichiro

    Arne Vajhøj Guest

    Ichiro wrote:
    >> I would suggest either java.util.regex.Pattern or good old
    >> StringTokenizer.

    >
    > Please note that I was actually using java.util.regex.Pattern.split,
    > not String.split, and so I think I need to fine-tune my regular
    > expression for the delimiter to achieve what I want - unfortunately my
    > attempts have been unsuccessful so far.


    With Pattern I intended to use matcher not split.

    Arne
     
    Arne Vajhøj, Jul 12, 2009
    #4
  5. Ichiro

    Ichiro Guest

    > With Pattern I intended to use matcher not split.

    You mean you can see no other option than to roll my own "split"
    functions using matcher.find() in a loop?
    Seems strange that the power of regexp would not allow me to solve
    this simple problem.

    Thanks
     
    Ichiro, Jul 12, 2009
    #5
  6. Ichiro

    Arne Vajhøj Guest

    Ichiro wrote:
    >> With Pattern I intended to use matcher not split.

    >
    > You mean you can see no other option than to roll my own "split"
    > functions using matcher.find() in a loop?
    > Seems strange that the power of regexp would not allow me to solve
    > this simple problem.


    The power of regex most certainly allows you to solve that.

    But the simplicity of the split method does not.

    But split methods explicit state in their documentation,
    that trailing empty strings are removed.

    If you were able to get it working, then it would be
    a bug that would need to be fixed.

    Arne
     
    Arne Vajhøj, Jul 12, 2009
    #6
  7. Ichiro

    Ichiro Guest

    Can someone please suggest a concrete way of solving the issue, if
    possible with code?
    As a hack, I tried to include beginning (\\A) and end (\\z) of string
    as alternative delimiters (see original post) but I had no luck.

    Thank you
     
    Ichiro, Jul 12, 2009
    #7
  8. Ichiro

    Ichiro Guest

    Actually, cancel that. After RTFM a little closer, I found the
    solution. It's not entirely clear to me why it works, but it does.

    Pattern pattern = Pattern.compile(",");
    String[] output = pattern.split(input, -1);
    // instead of pattern.split(input)

    Thanks
     
    Ichiro, Jul 12, 2009
    #8
  9. Ichiro

    Roedy Green Guest

    On Sat, 11 Jul 2009 20:08:55 -0700 (PDT), Ichiro
    <> wrote, quoted or indirectly quoted someone
    who said :

    >I wrote a test harness for java.util.regex.Pattern.split and found
    >that, at least from my point of view, it behaves inconsistently


    There is a way around that gotcha. See
    http://mindprod.com/jgloss/regex.html#SPLITTING
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    "For reason that have a lot to do with US Government bureaucracy, we settled on the one issue everyone could agree on, which was weapons of mass destruction."
    ~ Paul Wolfowitz 2003-06, explaining how the Bush administration sold the Iraq war to a gullible public.
     
    Roedy Green, Jul 13, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roger Marquis

    util.regex.Pattern anomaly

    Roger Marquis, Aug 1, 2003, in forum: Java
    Replies:
    4
    Views:
    784
    Roger Marquis
    Aug 1, 2003
  2. amy j
    Replies:
    4
    Views:
    901
    Alex Hunsley
    Oct 13, 2004
  3. joes
    Replies:
    2
    Views:
    1,036
    Daniel Pitts
    May 25, 2007
  4. Replies:
    4
    Views:
    593
    Roedy Green
    Mar 26, 2008
  5. Jerry Adair

    split()'s regex pattern parameter

    Jerry Adair, Mar 30, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    90
    Dr.Ruud
    Mar 30, 2006
Loading...

Share This Page