java String split() does not work for delimiter "|" ?

Discussion in 'Java' started by chunji08@gmail.com, Oct 12, 2007.

  1. Guest

    Hi all,

    I have such data in a flat text file,
    "
    106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    t|f|t|"
    "

    And such java code to read this line and split it by "|",

    "
    while ((( rd = in.readLine())!= null)) {
    String delimiter = new String(''|")
    String[] t1 = rd.split(delimiter);
    String[] t2 = rd.split("|");
    }
    "

    Either way, the split does not work! It splits the string per each
    char. Does someone know why ?

    Here is my jdk information on the linux box.
    "
    java version "1.6.0"
    Java(TM) SE Runtime Environment (build 1.6.0-b105)
    Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
    "


    Thanks a lot for any tips.


    Chun
     
    , Oct 12, 2007
    #1
    1. Advertising

  2. wrote:
    > Hi all,
    >
    > I have such data in a flat text file,
    > "
    > 106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    > t|f|t|"
    > "
    >
    > And such java code to read this line and split it by "|",


    `split' uses a regex command, and '|' happens to be a special operator
    in regex. Instead of "|", you want "\\|".

    > Either way, the split does not work! It splits the string per each
    > char. Does someone know why ?


    Your regex specifies either the empty string or the empty string. Since
    there is an empty string between each character, the string is split
    between each character. It's what you told it do.

    For more information:
    <http://java.sun.com/javase/6/docs/api/java/lang/String.html> and
    <http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html>


    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Oct 12, 2007
    #2
    1. Advertising

  3. Guest

    On Oct 12, 1:39 pm, wrote:
    > Hi all,
    >
    > I have such data in a flat text file,
    > "
    > 106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    > t|f|t|"
    > "
    >
    > And such java code to read this line and split it by "|",
    >
    > "
    > while ((( rd = in.readLine())!= null)) {
    > String delimiter = new String(''|")
    > String[] t1 = rd.split(delimiter);
    > String[] t2 = rd.split("|");
    > }
    > "
    >
    > Either way, the split does not work! It splits the string per each
    > char. Does someone know why ?
    >
    > Here is my jdk information on the linux box.
    > "
    > java version "1.6.0"
    > Java(TM) SE Runtime Environment (build 1.6.0-b105)
    > Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
    > "
    >
    > Thanks a lot for any tips.
    >
    > Chun


    Please ignore that, "\\|" works for me, I guess I use perl too much
    -:)


    -cji
     
    , Oct 12, 2007
    #3
  4. wrote:
    > Hi all,
    >
    > I have such data in a flat text file,
    > "
    > 106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    > t|f|t|"
    > "
    >
    > And such java code to read this line and split it by "|",
    >
    > "
    > while ((( rd = in.readLine())!= null)) {
    > String delimiter = new String(''|")
    > String[] t1 = rd.split(delimiter);
    > String[] t2 = rd.split("|");
    > }
    > "
    >
    > Either way, the split does not work! It splits the string per each
    > char. Does someone know why ?
    >


    Because the argument to split() is a regex not a string.

    In regexes, certain characters (metacharacters) have special meanings.
    The vertical bar is such a metacharacter, representing alternation.

    public class MetaChar {
    public static void main(String[] args) {
    String s = "oneXtwoYthreeXfour";
    String[] a = s.split("X|Y");
    for (String w:a)
    System.out.println(w);
    }
    }

    You have to "escape" the vertical bar if you want to treat it as an
    ordinary character and not as a metacharacter.

    http://www.regular-expressions.info/alternation.html
    http://www.regular-expressions.info/characters.html
     
    RedGrittyBrick, Oct 12, 2007
    #4
  5. Roedy Green Guest

    On Fri, 12 Oct 2007 20:39:06 -0000, wrote, quoted
    or indirectly quoted someone who said :

    >Either way, the split does not work! It splits the string per each
    >char. Does someone know why ?


    you mean literal | not the regex command |. See
    http://mindprod.com/jgloss/regex.html
    on quoting.
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Oct 13, 2007
    #5
  6. Roedy Green Guest

    On Fri, 12 Oct 2007 20:39:06 -0000, wrote, quoted
    or indirectly quoted someone who said :

    > String delimiter = new String(''|")


    there is no need for new String.

    See http://mindprod.com/jgloss/newbie.html

    you can write that;

    String delimiter = ''|";

    but of course as others pointed out, you meant:

    String delimiter = ''\\|";
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Oct 13, 2007
    #6
  7. Guest

    On Saturday, 13 October 2007 02:09:06 UTC+5:30, wrote:
    > Hi all,
    >
    > I have such data in a flat text file,
    > "
    > 106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    > t|f|t|"
    > "
    >
    > And such java code to read this line and split it by "|",
    >
    > "
    > while ((( rd = in.readLine())!= null)) {
    > String delimiter = new String(''|")
    > String[] t1 = rd.split(delimiter);
    > String[] t2 = rd.split("|");
    > }
    > "
    >
    > Either way, the split does not work! It splits the string per each
    > char. Does someone know why ?
    >
    > Here is my jdk information on the linux box.
    > "
    > java version "1.6.0"
    > Java(TM) SE Runtime Environment (build 1.6.0-b105)
    > Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
    > "
    >
    >
    > Thanks a lot for any tips.
    >
    >
    > Chun


    You can also do like this :
    StringTokenizer tokenizer = new StringTokenizer(content, "||");
    while(tokenizer.hasMoreTokens()){
    _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    }
     
    , Aug 8, 2013
    #7
  8. Guest

    On Saturday, 13 October 2007 02:09:06 UTC+5:30, wrote:
    > Hi all,
    >
    > I have such data in a flat text file,
    > "
    > 106083|1791||7|73755|48|96|3||01/07/2005 13:04:48.979215 PST|||||t|f||
    > t|f|t|"
    > "
    >
    > And such java code to read this line and split it by "|",
    >
    > "
    > while ((( rd = in.readLine())!= null)) {
    > String delimiter = new String(''|")
    > String[] t1 = rd.split(delimiter);
    > String[] t2 = rd.split("|");
    > }
    > "
    >
    > Either way, the split does not work! It splits the string per each
    > char. Does someone know why ?
    >
    > Here is my jdk information on the linux box.
    > "
    > java version "1.6.0"
    > Java(TM) SE Runtime Environment (build 1.6.0-b105)
    > Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
    > "
    >
    >
    > Thanks a lot for any tips.
    >
    >
    > Chun


    You can also do like this :
    StringTokenizer tokenizer = new StringTokenizer(content, "|");
    while(tokenizer.hasMoreTokens()){
    _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    }
     
    , Aug 8, 2013
    #8
  9. Lew Guest

    wrote:
    > You can also do like this :
    > StringTokenizer tokenizer = new StringTokenizer(content, "|");
    > while(tokenizer.hasMoreTokens()){
    > _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    > }


    "StringTokenizer is a legacy class that is retained for compatibility reasons although
    its use is discouraged in new code. It is recommended that anyone seeking this
    functionality use the split method of String or the java.util.regex package instead."
    http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html

    "Variable names should not start with underscore _ or dollar sign $ characters,
    even though both are allowed."
    http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-135099.html#367

    --
    Lew
     
    Lew, Aug 8, 2013
    #9
  10. In article <>,
    Lew <> wrote:

    > wrote:
    > > You can also do like this :
    > > StringTokenizer tokenizer = new StringTokenizer(content, "|");
    > > while(tokenizer.hasMoreTokens()){
    > > _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    > > }

    >
    > "StringTokenizer is a legacy class that is retained for compatibility reasons
    > although
    > its use is discouraged in new code. It is recommended that anyone seeking
    > this
    > functionality use the split method of String or the java.util.regex package
    > instead."
    > http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html


    Last time I checked, the performance of String.spit() sucked. The
    JavaDoc up to 1.6 even says it sucks. Hopefully they've fixed that
    before calling a simple and effective tool like StringTokenizer "legacy."

    Now if there was only a way to revert String.substring()'s performance
    in Java 1.7, I might try Oracle's version of Java.


    > "Variable names should not start with underscore _ or dollar sign $
    > characters,
    > even though both are allowed."
    > http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-13
    > 5099.html#367
     
    Kevin McMurtrie, Aug 9, 2013
    #10
  11. On 08/09/2013 03:46 AM, Kevin McMurtrie wrote:
    > In article <>,
    > Lew <> wrote:
    >
    >> wrote:
    >>> You can also do like this :
    >>> StringTokenizer tokenizer = new StringTokenizer(content, "|");
    >>> while(tokenizer.hasMoreTokens()){
    >>> _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    >>> }

    >>
    >> "StringTokenizer is a legacy class that is retained for compatibility reasons
    >> although
    >> its use is discouraged in new code. It is recommended that anyone seeking
    >> this
    >> functionality use the split method of String or the java.util.regex package
    >> instead."
    >> http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html

    >
    > Last time I checked, the performance of String.spit() sucked. The
    > JavaDoc up to 1.6 even says it sucks. Hopefully they've fixed that
    > before calling a simple and effective tool like StringTokenizer "legacy."
    >
    > Now if there was only a way to revert String.substring()'s performance
    > in Java 1.7, I might try Oracle's version of Java.
    >
    >
    >> "Variable names should not start with underscore _ or dollar sign $
    >> characters,
    >> even though both are allowed."
    >> http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-13
    >> 5099.html#367


    I had to check that because I didn't remember ever seeing that the
    Javadoc for String.split saying that the performance sucked. Lo and
    behold, I don't see that language.

    What's the basis for assessing the suckage of Java String.split? Doing
    millions of splits? And if the situation calls for industrial text
    processing, why use Java anyway? It's not the first language I'd think
    of for that purpose, it's cumbersome. And you can't ramp up your RAM?

    I don't mind your comments about Java implementation performance, they
    are useful to followup. I just wonder what kind of Java programs you
    write where you find this kind of detail to be that important. Can't say
    I've ever in 15+ years seen a Java SE or EE project be significantly
    impacted by these considerations.

    AHS
    --
    When a true genius appears, you can know him by this sign:
    that all the dunces are in a confederacy against him.
    -- Jonathan Swift
     
    Arved Sandstrom, Aug 9, 2013
    #11
  12. Eric Sosman Guest

    On 8/8/2013 8:06 AM, wrote:
    > On Saturday, 13 October 2007 02:09:06 UTC+5:30, wrote:


    Couldn't you have waited for its sixth birthday?

    --
    Eric Sosman
    d
     
    Eric Sosman, Aug 9, 2013
    #12
  13. In article <i61Nt.55783$>,
    Arved Sandstrom <> wrote:

    > On 08/09/2013 03:46 AM, Kevin McMurtrie wrote:
    > > In article <>,
    > > Lew <> wrote:
    > >
    > >> wrote:
    > >>> You can also do like this :
    > >>> StringTokenizer tokenizer = new StringTokenizer(content, "|");
    > >>> while(tokenizer.hasMoreTokens()){
    > >>> _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    > >>> }
    > >>
    > >> "StringTokenizer is a legacy class that is retained for compatibility
    > >> reasons
    > >> although
    > >> its use is discouraged in new code. It is recommended that anyone seeking
    > >> this
    > >> functionality use the split method of String or the java.util.regex
    > >> package
    > >> instead."
    > >> http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html

    > >
    > > Last time I checked, the performance of String.spit() sucked. The
    > > JavaDoc up to 1.6 even says it sucks. Hopefully they've fixed that
    > > before calling a simple and effective tool like StringTokenizer "legacy."
    > >
    > > Now if there was only a way to revert String.substring()'s performance
    > > in Java 1.7, I might try Oracle's version of Java.
    > >
    > >
    > >> "Variable names should not start with underscore _ or dollar sign $
    > >> characters,
    > >> even though both are allowed."
    > >> http://www.oracle.com/technetwork/java/javase/documentation/codeconventions
    > >> -13
    > >> 5099.html#367

    >
    > I had to check that because I didn't remember ever seeing that the
    > Javadoc for String.split saying that the performance sucked. Lo and
    > behold, I don't see that language.
    >
    > What's the basis for assessing the suckage of Java String.split? Doing
    > millions of splits? And if the situation calls for industrial text
    > processing, why use Java anyway? It's not the first language I'd think
    > of for that purpose, it's cumbersome. And you can't ramp up your RAM?
    >
    > I don't mind your comments about Java implementation performance, they
    > are useful to followup. I just wonder what kind of Java programs you
    > write where you find this kind of detail to be that important. Can't say
    > I've ever in 15+ years seen a Java SE or EE project be significantly
    > impacted by these considerations.
    >
    > AHS


    String.split() delegates to the Pattern class. The Pattern class
    mentions that the form used in String is not efficient because it must
    compile the regular expression on each use.

    Let me test...

    Java 1.6.0_51 on an old Mac gives me these relative times:
    splitNanos= 5341045000
    tokenizerNanos= 1934390000

    I hacked in a copy of 1.7.0_40-ea and got:
    splitNanos= 3299753000
    tokenizerNanos= 1675745000


    It's not HUGE, but don't think you should deprecate a class that's 2
    times faster than the replacement. String.split() is great for utility
    use but the core code should use pre-compiled patterns or
    StringTokenizer.

    Last time I checked, Oracle was still targeting big business. Asking to
    double the datacenter could get a whole Engineering team fired.



    public class Str
    {
    final char testChars[]=
    "\t\n;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    .toCharArray();
    final Random rnd= new Random();

    public static void main(String[] args)
    {
    final Str str= new Str();

    long splitNanos= 0;
    long tokenizerNanos= 0;

    for (int i= 0; i < 100; ++i)
    {
    final String line= str.randomAlphaNumerics();
    String formatBySplit= null, formatByTokenize= null;

    final long startTime= System.nanoTime();
    for (int j= 0; j < 10000; ++j)
    formatBySplit= str.formatSplit(line);
    final long midTime= System.nanoTime();
    for (int j= 0; j < 10000; ++j)
    formatByTokenize= str.formatTokenized(line);
    final long endTime= System.nanoTime();

    splitNanos+= midTime - startTime;
    tokenizerNanos+= endTime - midTime;

    if (!formatBySplit.equals(formatByTokenize))
    throw new RuntimeException("formatBySplit=" + formatBySplit +
    " formatByTokenize=" +formatByTokenize);
    }

    System.out.println ("splitNanos= " + splitNanos);
    System.out.println ("tokenizerNanos= " + tokenizerNanos);
    }

    private String formatSplit (String input)
    {
    final String toks[]= input.split("[ \t\n;]+");
    final StringBuilder buf= new StringBuilder (input.length());

    for (String tok : toks)
    {
    if (tok.length() > 0)
    {
    if (buf.length() > 0)
    buf.append('\n');
    buf.append(tok);
    }
    }
    return buf.toString();
    }

    private String formatTokenized (String input)
    {
    final StringTokenizer tok= new StringTokenizer(input, " \t\n;", false);
    final StringBuilder buf= new StringBuilder (input.length());

    if (tok.hasMoreElements())
    buf.append(tok.nextElement());

    while (tok.hasMoreElements())
    buf.append('\n').append(tok.nextElement());

    return buf.toString();
    }

    private String randomAlphaNumerics ()
    {
    final char buf[]= new char[rnd.nextInt(200)];
    for (int i= 0; i < buf.length; ++i)
    buf= testChars[rnd.nextInt(testChars.length)];
    return new String (buf);
    }
    }
     
    Kevin McMurtrie, Aug 10, 2013
    #13
  14. Michael Jung Guest

    Kevin McMurtrie <> writes:
    > In article <i61Nt.55783$>,
    > Arved Sandstrom <> wrote:
    >> On 08/09/2013 03:46 AM, Kevin McMurtrie wrote:
    >> > In article <>,
    >> > Lew <> wrote:
    >> >
    >> >> wrote:
    >> >>> StringTokenizer tokenizer = new StringTokenizer(content, "|");
    >> >>> while(tokenizer.hasMoreTokens()){
    >> >>> _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    >> >>> }
    >> >> "StringTokenizer is a legacy class that is retained for compatibility
    >> >> reasons although
    >> >> its use is discouraged in new code. It is recommended that anyone seeking
    >> >> this
    >> >> functionality use the split method of String or the java.util.regex
    >> >> package instead."
    >> >> http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html
    >> > Last time I checked, the performance of String.spit() sucked. The
    >> > JavaDoc up to 1.6 even says it sucks. Hopefully they've fixed that
    >> > before calling a simple and effective tool like StringTokenizer "legacy.
    >> > Now if there was only a way to revert String.substring()'s performance
    >> > in Java 1.7, I might try Oracle's version of Java.

    >> I had to check that because I didn't remember ever seeing that the
    >> Javadoc for String.split saying that the performance sucked. Lo and
    >> behold, I don't see that language.
    >> What's the basis for assessing the suckage of Java String.split? Doing
    >> millions of splits? And if the situation calls for industrial text
    >> processing, why use Java anyway? It's not the first language I'd think
    >> of for that purpose, it's cumbersome. And you can't ramp up your RAM?
    >> I don't mind your comments about Java implementation performance, they
    >> are useful to followup. I just wonder what kind of Java programs you
    >> write where you find this kind of detail to be that important. Can't say
    >> I've ever in 15+ years seen a Java SE or EE project be significantly
    >> impacted by these considerations.

    > String.split() delegates to the Pattern class. The Pattern class
    > mentions that the form used in String is not efficient because it must
    > compile the regular expression on each use.
    > Let me test...
    > Java 1.6.0_51 on an old Mac gives me these relative times:
    > splitNanos= 5341045000
    > tokenizerNanos= 1934390000
    > I hacked in a copy of 1.7.0_40-ea and got:
    > splitNanos= 3299753000
    > tokenizerNanos= 1675745000
    > It's not HUGE, but don't think you should deprecate a class that's 2
    > times faster than the replacement. String.split() is great for utility
    > use but the core code should use pre-compiled patterns or
    > StringTokenizer.
    > Last time I checked, Oracle was still targeting big business. Asking to
    > double the datacenter could get a whole Engineering team fired.


    I can confirm that this does matter in business code. We got a 10%-20%
    performance boost by avoiding split for certain use cases that used it a
    lot, not just in micro-optimizing tests. The numbers from Kevin are
    about what we had (although I personally wouldn't show that many decimal
    places that suggest a higher degree of accuracy than is actually
    reasonable).

    Michael
     
    Michael Jung, Aug 10, 2013
    #14
  15. Joerg Meier Guest

    On Fri, 09 Aug 2013 23:25:52 -0700, Kevin McMurtrie wrote:

    > String.split() delegates to the Pattern class. The Pattern class
    > mentions that the form used in String is not efficient because it must
    > compile the regular expression on each use.


    There is really no way around that with .split(), short of some convoluted
    internal chaching system where the last x patterns compiled by .sort are
    stored for y time. You call a method with a String as a parameter twice,
    how are you going to avoid having to compile the String to a Pattern other
    than through that ?

    The .split syntax is convenient, but slow. There is really no sensible way
    to speed it up while keeping the convenient method signature. Of course,
    simply using Pattern is not terribly hard at all.

    With all that being said: StringTokenizer obviously can only handle very
    simple splitting due to the lack of regex support, and thus is naturally
    faster, but if your splitting is simple enough not to need regex, it might
    be simple enough to use indexOf, which is almost a magnitude faster than
    even Tokenizer.

    Liebe Gruesse,
    Joerg

    --
    Ich lese meine Emails nicht, replies to Email bleiben also leider
    ungelesen.
     
    Joerg Meier, Aug 10, 2013
    #15
  16. On 08/10/2013 07:37 AM, Michael Jung wrote:
    > Kevin McMurtrie <> writes:
    >> In article <i61Nt.55783$>,
    >> Arved Sandstrom <> wrote:
    >>> On 08/09/2013 03:46 AM, Kevin McMurtrie wrote:
    >>>> In article <>,
    >>>> Lew <> wrote:
    >>>>
    >>>>> wrote:
    >>>>>> StringTokenizer tokenizer = new StringTokenizer(content, "|");
    >>>>>> while(tokenizer.hasMoreTokens()){
    >>>>>> _log.info("tokenizer.nextToken() : "+tokenizer.nextToken());
    >>>>>> }
    >>>>> "StringTokenizer is a legacy class that is retained for compatibility
    >>>>> reasons although
    >>>>> its use is discouraged in new code. It is recommended that anyone seeking
    >>>>> this
    >>>>> functionality use the split method of String or the java.util.regex
    >>>>> package instead."
    >>>>> http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html
    >>>> Last time I checked, the performance of String.spit() sucked. The
    >>>> JavaDoc up to 1.6 even says it sucks. Hopefully they've fixed that
    >>>> before calling a simple and effective tool like StringTokenizer "legacy.
    >>>> Now if there was only a way to revert String.substring()'s performance
    >>>> in Java 1.7, I might try Oracle's version of Java.
    >>> I had to check that because I didn't remember ever seeing that the
    >>> Javadoc for String.split saying that the performance sucked. Lo and
    >>> behold, I don't see that language.
    >>> What's the basis for assessing the suckage of Java String.split? Doing
    >>> millions of splits? And if the situation calls for industrial text
    >>> processing, why use Java anyway? It's not the first language I'd think
    >>> of for that purpose, it's cumbersome. And you can't ramp up your RAM?
    >>> I don't mind your comments about Java implementation performance, they
    >>> are useful to followup. I just wonder what kind of Java programs you
    >>> write where you find this kind of detail to be that important. Can't say
    >>> I've ever in 15+ years seen a Java SE or EE project be significantly
    >>> impacted by these considerations.

    >> String.split() delegates to the Pattern class. The Pattern class
    >> mentions that the form used in String is not efficient because it must
    >> compile the regular expression on each use.
    >> Let me test...
    >> Java 1.6.0_51 on an old Mac gives me these relative times:
    >> splitNanos= 5341045000
    >> tokenizerNanos= 1934390000
    >> I hacked in a copy of 1.7.0_40-ea and got:
    >> splitNanos= 3299753000
    >> tokenizerNanos= 1675745000
    >> It's not HUGE, but don't think you should deprecate a class that's 2
    >> times faster than the replacement. String.split() is great for utility
    >> use but the core code should use pre-compiled patterns or
    >> StringTokenizer.
    >> Last time I checked, Oracle was still targeting big business. Asking to
    >> double the datacenter could get a whole Engineering team fired.

    >
    > I can confirm that this does matter in business code. We got a 10%-20%
    > performance boost by avoiding split for certain use cases that used it a
    > lot, not just in micro-optimizing tests. The numbers from Kevin are
    > about what we had (although I personally wouldn't show that many decimal
    > places that suggest a higher degree of accuracy than is actually
    > reasonable).
    >
    > Michael
    >

    I don't doubt that use of String.split is not always the optimal
    approach. From the sounds of it it's not often the optimal approach. But
    I'll bet that the large majority of the time using it is a "good enough"
    approach, because very often that extra 10-20 percent speed bump isn't
    actually needed.

    Funny thing is, I can think of one ESB application of mine right now
    that needs to process a high volume of messages, and each message is
    composed of 10-20 lines each one of which may have multiple fields
    delimited by slashes...and I've been using String.split without
    problems. Having said that, this is a 24/7 "don't fail or shit rains
    down from the heavens" application, so I might try swapping out
    ..split(), since it's not complicated logic and I know exactly what the
    delimiter is.

    But I wouldn't eschew String.split as a rule. I doubt most apps care.

    AHS

    --
    When a true genius appears, you can know him by this sign:
    that all the dunces are in a confederacy against him.
    -- Jonathan Swift
     
    Arved Sandstrom, Aug 11, 2013
    #16
  17. Michael Jung Guest

    Arved Sandstrom <> writes:
    > On 08/10/2013 07:37 AM, Michael Jung wrote:

    [...]
    >> I can confirm that this does matter in business code. We got a 10%-20%
    >> performance boost by avoiding split for certain use cases that used it a
    >> lot, not just in micro-optimizing tests. The numbers from Kevin are
    >> about what we had (although I personally wouldn't show that many decimal
    >> places that suggest a higher degree of accuracy than is actually
    >> reasonable).

    > I don't doubt that use of String.split is not always the optimal
    > approach. From the sounds of it it's not often the optimal
    > approach. But I'll bet that the large majority of the time using it is
    > a "good enough" approach, because very often that extra 10-20 percent
    > speed bump isn't actually needed.

    [...]
    > But I wouldn't eschew String.split as a rule. I doubt most apps care.


    I use split myself often enough. You can read my response as a case for
    optimzation surprises. The micro benchmark shows around a 200% boost
    (3:10), the overall gain was 15%, but the code in question as to the
    amount of (user-level) code run through was far less than 1% (big "fat"
    EE application).

    Michael
     
    Michael Jung, Aug 11, 2013
    #17
  18. Joerg Meier Guest

    On Sun, 11 Aug 2013 11:12:38 +0200, Michael Jung wrote:

    > I use split myself often enough. You can read my response as a case for
    > optimzation surprises. The micro benchmark shows around a 200% boost
    > (3:10), the overall gain was 15%, but the code in question as to the
    > amount of (user-level) code run through was far less than 1% (big "fat"
    > EE application).


    Well, odds are, not many applications spend 25% of their CPU time doing
    ..split(), so I would say that your application speeding up that much is an
    extreme edge case. What on Earth do you do that requires millions of
    ..split() calls per second, and why did you think that would even remotely
    be a representative example ?

    Liebe Gruesse,
    Joerg

    --
    Ich lese meine Emails nicht, replies to Email bleiben also leider
    ungelesen.
     
    Joerg Meier, Aug 11, 2013
    #18
  19. Michael Jung Guest

    Joerg Meier <> writes:
    > On Sun, 11 Aug 2013 11:12:38 +0200, Michael Jung wrote:
    >> I use split myself often enough. You can read my response as a case for
    >> optimzation surprises. The micro benchmark shows around a 200% boost
    >> (3:10), the overall gain was 15%, but the code in question as to the
    >> amount of (user-level) code run through was far less than 1% (big "fat"
    >> EE application).

    > Well, odds are, not many applications spend 25% of their CPU time doing
    > .split(), so I would say that your application speeding up that much is an
    > extreme edge case. What on Earth do you do that requires millions of
    > .split() calls per second, and why did you think that would even remotely
    > be a representative example ?


    Odds are that the rest of the application was already highly
    optimized. (I already said this was for certain use cases.) Whether this
    is representative of something, I don't know, everybody has to judge for
    himself what to do with split. But string manipulation is omnipresent in
    many applications these days. This was just some light.

    Michael
     
    Michael Jung, Aug 11, 2013
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Fox

    Delimiter Split

    Mark Fox, Aug 11, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    3,352
    Chris R. Timmons
    Aug 11, 2003
  2. Kevin Spencer
    Replies:
    5
    Views:
    1,239
    =?Utf-8?B?UENL?=
    Jan 21, 2004
  3. Replies:
    9
    Views:
    379
    Paul McGuire
    Nov 16, 2006
  4. rewonka
    Replies:
    10
    Views:
    701
    M.-A. Lemburg
    Mar 19, 2009
  5. basi
    Replies:
    8
    Views:
    132
Loading...

Share This Page