How can you make idle processors pick up java work?

Discussion in 'Java' started by qwertmonkey@syberianoutpost.ru, Jul 31, 2012.

  1. Guest

    ~
    > How slow is the NL processing?

    ~
    > Does it make any sense to read lines in one thread and pass each off

    to one of the iPrx-1 other threads that might run on separate processors?
    ~
    I don't think this would make sense. All sentences are short and all I
    need to do is basically scan them and use look-up tables to do some tinkering
    with the code points. The scheduling of threads and constant context switching
    will most probably make things slower
    ~
    OK this is the piece of the code I am trying to optimize and the results
    I get, using a large enough file with sentences:
    ~
    http://corpora.informatik.uni-leipzig.de/download.html
    ~
    http://corpora.uni-leipzig.de/downloads/deu_news_2008_10M-text.tar.gz
    ~
    inside of the tar ball there is a file with just sentences:
    ~
    $ ls -l deu_news_2008_10M-sentences.txt
    -rw-r--r-- 1 knoppix knoppix 1235804164 May 28 2011
    deu_news_2008_10M-sentences.txt

    $ md5sum -b deu_news_2008_10M-sentences.txt
    23041587b6414d1a1a56c9c389d3c18f *deu_news_2008_10M-sentences.txt

    $ wc -l deu_news_2008_10M-sentences.txt
    10000000 deu_news_2008_10M-sentences.txt
    ~
    Again, do you know of any faster way to go about reading the sentences of
    such large files and getting their code points?
    lbrtchx
    ~
    import java.nio.file.FileSystems;
    import java.nio.file.Path;
    import java.nio.file.Files;
    import java.nio.charset.Charset;

    import java.io.BufferedReader;
    import java.io.IOException;

    // __
    public class NIO2_newBufferedReader02Test{
    private static final String aNWLn = System.getProperty("line.separator");
    // __
    public static void main(String[] aArgs){

    if((aArgs != null) && (aArgs.length == 1)){
    long lTm00 = System.currentTimeMillis();
    long lLns = 0;
    int iTtlRdKdPnts = 0;
    BufferedReader BfR = null;
    Path IFlPth = FileSystems.getDefault().getPath(aArgs[0]);
    long lIFlL = IFlPth.toFile().length();
    int iKdPnt, iSxL;

    StringBuilder aBldr = new StringBuilder(1024);
    // __
    try{
    BfR = Files.newBufferedReader(IFlPth, Charset.forName("UTF-8"));
    String aSx = BfR.readLine();
    while(aSx != null){
    iSxL = aSx.length();
    if(iSxL > 0){
    for(int i = 0; (i < iSxL); ++i){
    iKdPnt = aSx.codePointAt(i); ++iTtlRdKdPnts;
    aBldr.appendCodePoint(iKdPnt);
    }
    // __
    aBldr.delete(0, aBldr.length());
    }// (iSxL > 0)
    ++lLns;
    aSx = BfR.readLine();
    }// (aSx != null)

    BfR.close();
    // __
    System.err.println("// __ reading |" + lIFlL + "| bytes long text file
    with |" + lLns + "| lines took |" + (System.currentTimeMillis() - lTm00) +
    "| (ms)");
    System.err.println("// __ iTtlRdKdPnts: |" + iTtlRdKdPnts + "|");
    }catch(IOException IOX) { IOX.printStackTrace(System.err); }
    }
    else{ System.err.println("// __ usage:" + aNWLn + aNWLn +
    " java NIO2_newBufferedReader02Test \"<text file>\"" + aNWLn); }
    }
    }

    ~
    $ java -version
    java version "1.7.0_02"
    Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
    Java HotSpot(TM) Server VM (build 22.0-b10, mixed mode)
    ~
    $ free
    total used free shared buffers cached
    Mem: 4051236 719224 3332012 0 22008 408260
    -/+ buffers/cache: 288956 3762280
    Swap: 3038424 0 3038424
    ~
    $ javac -encoding utf8 NIO2_newBufferedReader02Test.java
    ~
    $ date; java -Xms256m -Xmx1024m -Xincgc -Dfile.encoding=utf8
    NIO2_newBufferedReader02Test /media/sdb1/tmp/eng_news_2006_10M-sentences.txt;
    date;
    ~
    Tue Jul 31 02:05:04 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |41922| (ms)
    Tue Jul 31 02:05:46 UTC 2012
    ~
    Tue Jul 31 02:05:51 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |27299| (ms)
    Tue Jul 31 02:06:19 UTC 2012
    ~
    Tue Jul 31 02:06:22 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |28180| (ms)
    Tue Jul 31 02:06:50 UTC 2012
    ~
    Tue Jul 31 02:26:43 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |35388| (ms)
    Tue Jul 31 02:27:18 UTC 2012
    ~
    Tue Jul 31 02:27:21 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |38155| (ms)
    Tue Jul 31 02:28:00 UTC 2012
    ~
    Tue Jul 31 02:30:40 UTC 2012
    // __ reading |1280939143| bytes long text file with |10000000| lines took
    |41099| (ms)
    Tue Jul 31 02:31:21 UTC 2012
     
    , Jul 31, 2012
    #1
    1. Advertising

  2. Joerg Meier Guest

    You might have more luck reading the whole file at once and then looping
    through it once it's read. IO is generally slow, and COULD be your bottle
    neck. But really, your first step should be using a profiler so you don't
    have to guess as to what's slow.

    Liebe Gruesse,
    Joerg

    --
    Ich lese meine Emails nicht, replies to Email bleiben also leider
    ungelesen.
     
    Joerg Meier, Jul 31, 2012
    #2
    1. Advertising

  3. On 7/31/2012 3:14 AM, wrote:
    > I don't think this would make sense. All sentences are short and all I
    > need to do is basically scan them and use look-up tables to do some tinkering
    > with the code points. The scheduling of threads and constant context switching
    > will most probably make things slower


    In this case, the limiting factor is probably not going to be your CPU
    but your disk drive.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Jul 31, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    225
    Roedy Green
    Jul 31, 2012
  2. Replies:
    3
    Views:
    223
    John B. Matthews
    Jul 31, 2012
  3. Replies:
    0
    Views:
    169
  4. Replies:
    0
    Views:
    127
  5. lbrtchx
    Replies:
    3
    Views:
    175
    Roedy Green
    Jul 31, 2012
Loading...

Share This Page